Availability Sets are used within Microsoft Azure to ensure that virtual machines are deployed into different Update Domains and different Fault Domains. This allows Microsoft Azure to provide an SLA of 99.95% for the service provided by the virtual machines within the availability set.
- Updates are planned events. For example when the underlying Azure fabric is patched it may require the restart of your guest VM’s. This is defined as an Update Domain (UD).
- Failures are unexpected events such as hardware failure whether this is physical or logical that impacts the availability of your Guest VM. This is defined as an Fault Domain (FD)
As a minimum two virtual machines are required to be in an Availability Set for Microsoft to provide the 99.95% SLA.
- An Availability Set can be created with a single virtual machine, however Microsoft will not provide an uptime SLA
- Virtual machines within an Availability Set must reside in the same Cloud Service
- A maximum of 100 virtual machines can reside in an Availability Set
- Five update domains are available per Availability Set
That’s all very well and good, but this now means that our application needs to be able to cope with being restarted when a failure or update occurs. This point for me needs careful consideration when you are trying to meet your business SLA’s within the framework that Microsoft provides in Azure.
We also need to consider how traffic is directed to each of the Guest VM’s within an Availability Set. This is where a load balancer comes in to direct traffic to the most appropriate Guest VM within the availability set.
You might think that you could put a single VM into an Availability Set to meet SLA requirements. However this is where Microsoft have a get out of jail free card, a single VM does not receive the 99.95% SLA. I guess this is because Microsoft don’t know how long it will take that Guest VM to become available in another update or fault domain.
Availability Sets require consideration around the design of the application layer to ensure that not only the Guest VM is available but also access to shared data services between them. It’s also worth noting that application or service failures are not included within the 99.95% SLA.