The production vCenter is version 2.5 U5 and the two non-production vCenter is version 4.0.0 U1. In all cases we are using Microsoft SQL Server 2005 Enterprise Edition x64. The prodution databases are on clustered bare-metal Windows servers and the non-production databases are on non-MSCS virtual machines. In all cases we are using named instances of SQL Server (http://msdn.microsoft.com/en-us/library/ms143531(SQL.90).aspx), rather than depending on the default instance. All instances are patched with SP2.
First the database growth problem. Based on the internal database size calculator in vCenter 2.5 & 4.0 (Administration menu > vCenter Settings ... > Statistics) the the factors that determine size are: number of virtual machines, detail, sample rate, retention and number of hosts. Of these the number of virtual machines, detail and sample rate seem to have the biggest impact. Additionally there is the record of all events & tasks which impacted more by changes to the environment (creating, changing, deleting virtual machines and hosts, etc) which the calculator obviously can not predict. If you have a relatively static environment then the growth of the events & tasks talbles will be smaller than for an environment with constant changes.
But it seems that there is more to the size of the database than just the number of records stored in each of its tables. Our default configuration for databases is to not configure a hard size limit, but let it grow indefinately. More on that later. Obviously the more records that are stored then the bigger the size of the database and more of the file system is consumed. What was not obvious to me was what happens when record(s) are deleted: the database does not automatically shrink.
If you look at the calculator window you will see that statistical data passes through four intervals. The level of detail in the statistics data can be selected so long as the following intervals have the same or lower level of detail. There is also the real-time statistical data collected and displayed in the Performance tab for each host and virtual machine. Using deductive reasoning, some meandering through the database and some information from VMware support the life cycle of the statistical data is like this:
1) Statistical data is gathered by the host for both ESX/ESXi and all virtual machines running on that host. This is what you see in the Performance tab for each host or virtual machine. These data points are 20 seconds apart based on the resolution of the charts in vCenter and vSphere Client. I will call this interval 0. I'm not sure where this is stored.
2) The interval 0 data points are summarized into data at a definable number of minutes (1, 2, 3 or 5) and definable level of detail and definable retention period (1 to 5 days). I will call this interval 1. This appears to be stored in table VPX_HIST_STAT1.
3) The interval 1 data points (1, 2, 3 or 5 minutes) are summarized into data points that are 30 minutes apart with a definable level of detail. I will call this interval 2. This appears to be stored in table VPX_HIST_STAT2.
4) The interval 2 data points (30 minutes) are summarized into data points that are 2 hours apart with a definable level of detail. I will call this interval 3. This appears to be stored in table VPX_HIST_STAT3.
5) The interval 3 data points (2 hours) are summarized into data points that are 24 hours apart with a definable level of detail and definable retention (1 to 5 years). This appears to be stored in table VPX_HIST_STAT4.
This begs the question: what is doing all of the data summarization at each interval? It is not vCenter, its the database. The rest of this database explanation will be focused on SQL Server 2005 as we do not use Oracle for vCenter. All of our databases have "SQL Server Management Studio" installed, or "SQL Server Management Studio Express". In SSMS, navigate to the "SQL Server Agent" and then "Jobs" there you *should* see a number of jobs: "Past Day stats rollup
These jobs will read data from one interval table, summarize the data, write it to the next interval table and delete from the original interval table. This constant creating and deleting of records fragments the database occupies more space in the file system than there is data. But that's not all, in addition to the statistical data there is historical data in the form of events and tasks which are stored in tables VPX_EVENT and VPX_EVENT_ARG. In the previous paragraph I mentioned that vCenter 4.0 has the additional job "Event Task Cleanup
What I learned from all of this is that the database size in the file system can be no where near what the vCenter calculator would tell you. Consequently you can encounter problems with the database filling up its file system (drive letter) or it becomes sluggish.
So, how to fix this? Here is a list of KB articles that VMware support sent me and that I forwarded to my DBA who found them very helpful.
How to increase the performance of the VirtualCenter Database
http://kb.vmware.com/kb/1002825
Investigating the health of a VirtualCenter database server
http://kb.vmware.com/kb/1003979
Troubleshooting transaction logs on a Microsoft SQL database server
http://kb.vmware.com/kb/1003980
Defragmenting VirtualCenter performance data indexes on a Microsoft SQL database
http://kb.vmware.com/kb/1003990
Reducing the size of the VirtualCenter database when the rollup scripts take a long time to run
http://kb.vmware.com/kb/1007453
Purging old data from the database used by vCenter Server
http://kb.vmware.com/kb/1000125
My next post will talk about the size and growth of the database logs. Once we have modified our SQL Server maintenance plans for the production and non-production vCenter databases I will post details.