Setting Up & Configuring Alarms in vCenter 5 Part 2

In the previous post setting up and configuring alarms in vCenter 5 Part 1 we looked at the initial configuration.  We are now going to run through some of the default alarms, with some suggested thresholds.

Cannot Connect To Storage why would we want to configure this? Well essentially this is a per host setting.  If the host loses connection to the storage then the VM’s will be restarted using HA.  Big deal you say, I can see that in vCentre.  Well it also manages ‘lost storage path redundancy’ and ‘degraded storage path redundancy’ so if you have an if your ESXi host has multiple connections to it’s storage, you will be notified if one of these is lost.

Datastore Usage On Disk quite an important one.  From the presented LUN how much space has been provisioned as a Datastore.  I recommend always asking for slightly more than need e.g. if you need 1TB for a Datastore, ask for an extra 25%.  Then when the Datastore is provisioned only use 1TB so you have room for expansion quickly and easily if needed.  With this in mind, I set the Warning to 90% and Critical to 95% so I have some room to either more VM’s around either by Storage vMotion or Cold Migration.

Host CPU Usage with this alarm, I generally alert at Warning 75% for 15 mins and then Critical for 10 mins.  The rational behind this is that I would want to investigate the VM’s CPU utilisation to see if it is a one off event causing the high usage or if we need to look at introducing more processing power into the cluster.

Host Error perhaps the most important one, this is what vCentre relies on to monitor host alarms!

Host Memory Usage similar to CPU usage, I generally set Warning to 90% for 15 mins and Critical for 10 mins.  Again I would want to investigate the host memory usage to ensure that we have sufficient resources for a host failure.

Host Memory Status not be confused with ‘Host Memory Usage’ this monitors the physical DIMMS.

Host Process Status again not to be confused with ‘Hot CPU Usage’ this monitor the physical processor hardware.

License Capacity Monitor I like this alarm, it’s great for items such as Site Recovery Manager or Operations Manager.  It lets you know if you are trying to protect or manage more VM’s than you are licensed for.

Virtual Machine CPU Usage I use the same alarms settings for ‘Host CPU Usage’ so that if a VM is using more than 75% of it’s CPU capacity for over 15 minutes, I would want to identify if this is a one off or if extra resources are required.

vSphere HA Failover In Progress this resides on the nice to have.  If for some reason none of your other alarms work then at least you know that a VM has been restarted by HA.

vSphere HA Virtual Machine Monitoring Error this alarm works in conjunction with Virtual Machine Monitoring.  I tend to leave VM Monitoring Only and Medium and then change individual VM’s monitoring to High if required.  If you have this set to high for all servers then it can cause alarms when backup software rolls back snapshots depending on how big the VM is.

Hopefully these alarms shouldn’t need any explanation, as they should ALWAYS be enabled.

Host Battery Status
Host Connection And Power State
Host Connection Failure
Host Hardware Fan Status
Host Hardware Power Status
Host Hardware System Board Status
Host Hardware Temperature Status
Insufficient vSphere HA Failover Resources
Network Connectivity Lost
Network Uplink Redundancy Degraded
Network Uplink Redundancy Lost

Naturally, this isn’t a complete list of alarms, however it is the default alarms that I would configure in most, if not all environments.  Every environment is different and you may use more or less alarms than I have mentioned.

Don’t forget that depending on which vSphere licenses you have might see extra default alarms for items such as FT.  Also when you install additional components e.g. SRM you will get even more alarms to have a play around with.

7 thoughts on “Setting Up & Configuring Alarms in vCenter 5 Part 2

  1. how does ‘host error’ and ‘virtual machine error’ work? are they catchall definitions? it would be nice to set up some general alarm that just forwards anything significant to my email address with the fewest clicks possible.

  2. The network loss connectivity alerts seem to require advanced conditions to be configure, and yes I understand these advanced conditions can be used as filters, but I am only interested in getting alerted if any hosts have lost connectivity, not some subset or condition of. So what isn’t clear to me is if any actions are actually required within the advanced conditions to get the network loss of connectivity alert working. can I leave it at the default and the alarm will still work?

  3. Is there an alert when a task is completed? Say I want to be notified when a “Cloning” job is done. I don’t see any options for that. Do you have any ideas? Thanks –S

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s