vSphere Replication has been embedded in the ESXi kernel for quite sometime now. When a virtual machine performs a storage ‘write’ this is mirrored by the vSCSI filter at the ESXi Host level before it is committed to disk. The vSCSI filter sends its mirrored ‘write’ to the vSphere Replication Appliance which is responsible for transmitting the ‘writes’ to it’s target. normally in a DR site.
The process is shown at a high level in the diagram below.
I’m often asked by customer if they shoud consider using it given the benefits which it provides, which include:
- Simplified management using hypervisor based replication
- Multi-point in time retention policies to store more than one instance of a protected virtual desktop
- Application consistency using Microsoft Windows Operation System with VMware Tools installed
- VM’s can be replicated from and to any storage
- An initial seed can be performed
As a impartial adviser, I have to provide the areas in which vSphere Replication isn’t as strong. These are the points, I suggested are considered as part of any design:
- vSphere Replication relies on the vRA, if this is offline or unavailable then replication stops for all virtual machines.
- vSphere Replication requires the virtual machine to be powered on for replication to occour
- vSphere Replication is not usually as efficient as array based replication which often have compression and intelligence built into the replication process. If you have limited bandwidth you may violate restore point objectives
- vSphere Replication will reduce the bandwidth available to other services/functions if you are using logically separated networks over 10GbE
- Note that Network IO Control can be used to prioritise access to bandwidth in times of contention, but required Enterprise Plus licenses
- vSphere Replication requires manual routing to send traffic across a replication VLAN which increases the complexity of the environment
- vSphere Replication is limited to 200 virtual machines per Replication Appliance and 2000 virtual machines overall as detailed in VMware KB2102453
- After an unplanned failover and reprotect, vSphere Replication uses an algorithm to perform a checksum, this can result in a full sync depending on length of separation and amount of data being changed.
- vSphere Replication only provides replication for powered on virtual machines
- In a HA event on an ESXi Host at the Production site will trigger a full synchronisation of the virtual machines that resided on the failed host. See vSphere Replication FAQ’s
The last point which for me is a deal breaker. Let’s consider that last point again, if we have an ESXi Host that has a PSOD then all of the VM’s will require a full synchronisation.
What’s The Impact?
If we have an inter-site link of 100Mbps which has an overhead of 10%, this gives us an effect throughput of 90Mbps.
We have an average sized VMware environment with a couple of VM’s which hold 2TB of data each which are being replicated across a 100Mbps inter-site link then you are looking at over 4 days to perform a full synchronisation.
We also need to consider the impact on the rest of your VM’s who will have their restore point objective violated as the bandwidth is being consumed by the 2 x 2TB VM’s. Not exactly where you want to be!
The Maths Per 2TB VM
8Mb equals 1MB
2TB equals = 16,777,216 Mbps
16,777,216 Mbps / 90 Mbps = 186,414 Seconds
186,414 seconds / 60 seconds = 3,107 Minutes
3,107 minutes / 60 hours = 51 Hours 47 Minutes