VMware on AWS My Thoughts

vmware-and-amazon-web-services-extending-vmware-into-aws-1As VMworld 2017 has just finished I have been giving VMware on AWS some thought.  Lot’s of questions have been running through my head, so I thought I would try and transcribe some here.

What Is It?

It’s a minimum of 4 x of vSphere Hosts running VMware’s SDDC (ESXi, NSX and vSAN) which is dedicated to a customer.  VMware manage the availability, patching and maintenance whilst the customer consumes the resources.

Each ESXi Host provides 36 x CPU Cores, 512GB RAM and 8 NVMe drives.  Some of this space is dedicated to management items such as vCenter and NSX VM’s so overall usable resources will be less.

Why Would I Use It?

This is a question I have been pondering on, my initial thoughts are:

  • A customers infrastructure lifecycle is at the point of refresh and they are moving to an ‘opex model’
  • A customer needs to exit a datacentre quickly and this could be one of a number of options
  • A customer is deploying a remote office and doesn’t want to invest in on-premises infrastructure for their VM estate
  • Target for disaster recovery to reduce on-premises secondary datacentre footprint (not sure if SRM is supported yet)

Even though I’m not convinced by this one, a potential candidate for a use case is to extend your on-premises operational model to AWS.

Another one which I’m not convinced by is reducing your on-premises operational costs by having someone else manage maintenance by patching your storage, ESXi Hosts and vCenter.  Are companies really going to make Dave redundant? Nope they are just going to get Dave doing something different for that one day a month (or Dave gets to chill out).

Would I Recommend It?

The concise answer is potentially.  The customers that I work with are reviewing their application estate and looking to either keep, kill, consolidate or transform them.

  • The keep category often fall into ‘that’s too difficult to tackle basket’ or we have only just invested in a new application or release
  • Kill generally means that the application will be ‘withered on the vine’
  • Consolidate generally means a number of applications will be collapsed into a single master
  • Transform usually means from on-premises to a SaaS type offering for example Exchange On-Premises to Office 365 Exchange Online

Out of these, which are the use cases for VMware on AWS?  The answer is simple anything heritage AKA Virtual Machine, as PaaS and SaaS will go somewhere else.

Infrastructure Applications such as Active Directory Domain Services, Certificate Services, File, Print and SQL are either highly available natively or can be designed and deployed on IaaS in a highly available fashion and as such aren’t great candidates for VMware on AWS.

Whats The Cost?

The monthly cost of an one year reserved ESXi Host (30% discount) is $4,332.00 of which we need four which makes the monthly cost roughly $17,328.00 which is circa £13,500 per month or £162,000 per year for compute and storage.  Note network charges and Operating System licenses are not included.

Using the same 30% discount level on Microsoft Azure you could run:

  • 268 x A2 v2 VM continuously for 12 months
  • 143 x D2 v2 VM continuously for 12 months

Taking into account that a single ESXi Host is used for tolerate failures.  We have 1,536GB of RAM minus circa 10% of management cluster and general overhead gives circa 1,382GB of useable RAM.

Using the same RAM metrics as the above Azure VM’s you could run the equivalent of:

  • 346 x A2 VM’s using VMware on AWS
  • 197 x D2 VM’s using VMware on AWS

Final Thought

Generally I’m seeing customers moving to a PaaS or SaaS offering for low hanging fruit and then dealing with the more complex applications on a case by case basis with a view to transforming these into a PaaS or SaaS model.

If customers are migrating 100 plus heritage VM’s to a cloud platform and they cannot be re-architected to be natively highly or have an SLA that simple backup and restore routines will not cater for then VMware on AWS is a viable option.

I do see that VMware on AWS has a place in the market, however the place is for heritage systems and I wonder how long it will be until the earnings from VMware on AWS start to dwindle?

vSphere Replication – Consider These Points Before Using It

vSphere Replication has been embedded in the ESXi kernel for quite sometime now.  When a virtual machine performs a storage ‘write’ this is mirrored by the vSCSI filter at the ESXi Host level before it is committed to disk.  The vSCSI filter sends its mirrored ‘write’ to the vSphere Replication Appliance which is responsible for transmitting the ‘writes’ to it’s target. normally in a DR site.

The process is shown at a high level in the diagram below.

vSphere Replication v0.1

I’m often asked by customer if they shoud consider using it given the benefits which it provides, which include:

  • Simplified management using hypervisor based replication
  • Multi-point in time retention policies to store more than one instance of a protected virtual desktop
  • Application consistency using Microsoft Windows Operation System with VMware Tools installed
  • VM’s can be replicated from and to any storage
  • An initial seed can be performed

As a impartial adviser, I have to provide the areas in which vSphere Replication isn’t as strong.  These are the points, I suggested are considered as part of any design:

  • vSphere Replication relies on the vRA, if this is offline or unavailable then replication stops for all virtual machines.
  • vSphere Replication requires the virtual machine to be powered on for replication to occour
  • vSphere Replication is not usually as efficient as array based replication which often have compression and intelligence built into the replication process.  If you have limited bandwidth you may violate restore point objectives
  • vSphere Replication will reduce the bandwidth available to other services/functions if you are using logically separated networks over 10GbE
    • Note that Network IO Control can be used to prioritise access to bandwidth in times of contention, but required Enterprise Plus licenses
  • vSphere Replication requires manual routing to send traffic across a replication VLAN which increases the complexity of the environment
  • vSphere Replication is limited to 200 virtual machines per Replication Appliance and 2000 virtual machines overall as detailed in VMware KB2102453
  • After an unplanned failover and reprotect, vSphere Replication uses an algorithm to perform a checksum, this can result in a full sync depending on length of separation and amount of data being changed.
  • vSphere Replication only provides replication for powered on virtual machines
  • In a HA event on an ESXi Host at the Production site will trigger a full synchronisation of the virtual machines that resided on the failed host. See vSphere Replication FAQ’s

The last point which for me is a deal breaker.  Let’s consider that last point again, if we have an ESXi Host that has a PSOD then all of the VM’s will require a full synchronisation.

What’s The Impact?

If we have an inter-site link of 100Mbps which has an overhead of 10%, this gives us an effect throughput of 90Mbps.

We have an average sized VMware environment with a couple of VM’s which hold 2TB of data each which are being replicated across a 100Mbps inter-site link then you are looking at over 4 days to perform a full synchronisation.

We also need to consider the impact on the rest of your VM’s who will have their restore point objective violated as the bandwidth is being consumed by the 2 x 2TB VM’s.  Not exactly where you want to be!

The Maths Per 2TB VM

8Mb equals 1MB

2TB equals = 16,777,216 Mbps

16,777,216 Mbps / 90 Mbps = 186,414 Seconds

186,414 seconds / 60 seconds = 3,107 Minutes

3,107 minutes / 60 hours = 51 Hours 47 Minutes