Microsoft Azure – Auto Scaling

Autoscaling v0.1The ability to dynamically scale to a public cloud was one of the mantra’s I used to hear a couple of years ago.  When reality struck and customers realised that there monolithic applications wouldn’t be suitable for this construct they realised they would need to re-architect.

Wind forward a couple of years and the use of Microsoft Azure Auto Scaling has become a reality, so with this in mind  I thought it would be a good idea to share a blog post on the subject.

What Is Auto Scaling?

Auto Scaling is the process of increasing either the number of instances (scale out/in) or the compute power (scale up/down) when a level of demand is reached.

Scale Up/Down

Scale Up or Down is targeted at increasing or decreasing the compute assigned to a VM.  Microsoft have a number of ways in which you can ‘scale up’ on the Azure platform.  To vertically scale you can use any of the following:

  • Manual Process – Simple keep your VHD and deploy a new VM with greater resources.
  • Azure Automation – For VM’s which are not identical you can use Azure Automation with Web Hooks to monitor conditions e.g. CPU over ‘x’ time greater than ‘x’ and then scale up the VM within the same series of VM
  • Scale Sets – For VM’s which are identical you can use Scale Sets which is a PaaS offering which ensures that fault, update domains and load balancing is built in.

Note that using a Manual Process, Azure Automation and Scale Sets to resize a VM will require a VM restart

The diagram below provides a logical overview of a Scale Set.

Scale Set v0.1

Scale Out/In

Scale Out or In is targeted at increasing or decreasing the number of instances, which could be made up of VM’s, Service Fabric, App Service or Cloud Service.

Common approaches are to use VM’s for applications which will support Scale Out/In.  Typically a piece of middleware that performs number crunching but holds no data or perhaps a worker role that is used to transport data from point a to be.

For websites it is more common to use App Service ‘Web Apps’ which in a nutshell provides a PaaS service and depending on the hosting option chosen Standard, Premium or Isolated will dictate the maximum number of instances and Auto Scale support.

Considerations

Auto Scaling requires time to scale up or out, it doesn’t respond to a single spike in CPU usage, it looks at averages over a 45 minute period.  Therefore it is suggested that if you know when a peak workload is likely it could be more efficient to deploy Auto Scaling using a schedule.

To ensure that a run away process doesn’t cause costs to spiral out of control, use tags, a different Azure subscription, email alerting or perhaps even limit the number of instances on Auto Scale.

Azure Migrate – Initial Thoughts

When Microsoft announced Azure Migrate at Ignite, I was enticed and signed up for the limited preview.  Having being accepted to the program I thought I would share my initial thoughts.

What Is Azure Migrate?

It a set of tools provided by Microsoft to enable you to provide a high level overview of your on-premises virtual machines and a possible migration approach to Microsoft Azure.

It’s components are as follows:

  • OVA file which is Windows Server 2012 R2 that runs a Collector that connects to vCenter to extract information.
  • Unique credentials that are entered into the Collector to securely report back information to Migrate PaaS within your Azure Subscription
  • Assessment that enables you to group virtual machines into readiness for Azure along with expected monthly costs

  • Azure Readiness assessment per virtual machine with data customization

  • Data export to Microsoft Excel to enable further information manipulation
  • Integration with OMS solution pack Service Map to provide application dependency mapping, communication paths, performance data and update requirements

Azure Migrate 04

On-Premises Support

In the limited preview, the support for on-premises systems in limited to vCenter 5.5 and 6.0.  However, I ran the Collector against a vCenter Server Appliance 6.5 without any issues.

The guest operating system extends to those supported by Microsoft Azure, which makes sense.

Known Issues

As this is a limited preview, I’m sure that these issues will be resolved in due course.

  • Windows Server 2016 showing as an ‘Unsupported OS’ in Azure Readiness report
  • SQL not providing a link to the Azure Database Migration Service
  • For 182 VMs the suggested tool is always ‘Requires Deep Discovery’

Final Thought

Azure Migrate will be a good starting point (when it is generally available) to provide a high level overview of readiness for Azure.  It will require human intervention to overlay application considerations to ensure they are natively highly available to meet customer SLA’s.

 

 

Storage Spaces Direct Overview

Storage Spaces Direct is an area which I have been meaning to look into, but for one reason or another it has slipped through the gaps until now.

What Is Storage Spaces Direct

Storage Spaces Direct is a shared nothing software defined storage which is part of the Windows Server 2016 operating system.  It creates a pool of storage by using local hard drives from a collection (two or more) individual servers.

The storage pool is used to create volumes which have in built resilience, so if a server or hard drive fails, data remains online and accessible.

What Is The Secret Sauce?

The secret sauce is within the ‘storage bus’ which is essentially the transport layer that provides the interaction between the physical disks across the network using SMB3. It allows each of the Hosts to see all disks as if they where it’s own local disk using Cluster Ports and Cluster Block Filter.

The Cluster Ports is like an initiator in iSCSI terms and Cluster Block Filter is the target, this allows each disk to presented to each Host as if it was it’s own.

Storage Bus v0.1

For a Microsoft supported platform you will need a 10GbE network with RDMA compliant HBA’s with either iWARP or RoCE for the Storage Bus.

Disks

When it comes to Storage Spaces Direct, all disks are not equal and you have a number of disk configurations which can be used.   Drive choices are as follows:

  • All Flash NVMe
  • All Flash SSD
  • NVMe for Cache and SSD for Capacity (Writes are cached and Reads are not Cached)
  • NVMe for Cache and HDD for Capacity
  • SSD for Cache and HDD for Capacity (could look at using more expensive SSD for cache and cheaper SSD for capacity)
  • NVMe for Cache and SSD and HDD for Capacity

In a SSD and HDD configuration the Storage Bus Layer Cache binds SSD to HDD to create a read/write cache.

Using NVMe based drives will provide circa 3 x times performance at typically 50% lower CPU cycles versus SSD, but come at a far greater cost point.

It should be notes that as a minimum 2 x SSD and 4 x HDD are needed for a supported Microsoft configuration.

Hardware

In relation to the hardware it must be on Windows Server Catalog and Certified for Windows Server 2016.  Both HPE DL380 Gen10 and Gen9 are supported along with HPE DL360 Gen10 and Gen9.  When deploying Storage Spaces Direct you need to ensure that the Cluster creation passes all validate tests to be supported by Microsoft.

  • All servers need to be the same make and model
  • Minimum of Intel Nehalem process
  • 4GB of RAM per TB of cache drive capacity on each server to store metadata e.g. 2 x 1TB SSD per Server then 8GB of RAM dedicated to Storage Spaces Direct
  • 2 x NICS that are RDMA capable with either iWARP or RoCE dedicated to the Storage Bus.
  • All servers must have the same drive configuration (type, size and firmware)
  • SSDs must have power loss protection (enterprise grade)
  • Simple pass through SAS HBA for SAS and SATA drives

Things to Note

  • The cache layer is completely consumed by Cluster Shared Volume and is not available to store data on
  • Microsoft recommendation is to make the cache drives a multiplier of capacity drives e.g. 2 x SSD per server then either 4 x HDD or 6 x HDD PER SERVER
  • Microsoft recommends a single Storage Pool per cluster e.g. all the disks across A 4 x Hyper-V Hosts contribute to a single Storage Pool
  • For a 2 x Server deployment the only resilience choice is a two way mirror.  Essentially data is written to two different HDD in two different servers, meaning your capacity layer is reduced by 50%.
  • For a 3 + Server deployment Microsoft recommends a three way mirror.  Essentially three copies of data across 3 x HDD on 3 x Servers reducing capacity to 33%.  You can undertake single parity (ALA RAID5) but Microsoft do not recommend this.
  • Typically a 10% cache to capacity scenario is recommended e.g. 4 x 4TB SSD is 16TB capacity then 2 x 800GB SSD should be used.
  • When the Storage Pool is configured Microsoft recommend leaving 1 x HDD worth of capacity for immediate in-place rebuilds of failed drives.  So with 4 x 4TB you would leave 4TB un allocated in reserve
  • Recommendation is to limit storage capacity per server to 100TB, to reduce resync of data after downtime, reboots or updates
  • Microsoft recommends using ReFS for Storage Spaces Direct for performance accelerations and built in protection against data corruption, however it does not support de-duplication yet.  See more details here https://docs.microsoft.com/en-us/windows-server/storage/refs/refs-overview

Azure Announcements September 2017

We are only two days into Microsoft Ignite and I thought I would share the announcements which I believe will become ‘heavy hitters’  in the near future.

Planned Maintenance (Preview)

One of the biggest deal breakers when migrating to public cloud, is the shear amount of single instance VMs in a customer estate which relay upon infrastructure availability to meet business SLA’s.   The cost to translate this into cloud native applications to place them into an Availability Group to receive an SLA from Microsoft and to minimise the impact of planned maintenance is often too burdensome, so they are left to wither on the vine on-premises.

Microsoft have recognised this issue and have announced ‘Planned Maintenance’ which means that you will be notified of when maintenance is going to occur and you will have up to four weeks to schedule in a reboot of your virtual machine.

This is a game changer for customers, and I would encourage you to read more here.

Azure Migrate (Preview)

To start the journey to public cloud services, you need to understand your application estate.  This is a process which should not be under estimated as many customer environments are poorly documented, application owners have left the business, operations and IT don’t really understand how an application is coupled together so trying to migrate anything but low hanging fruit often gets placed into the ‘too hard to deal with bucket’.

To counter act this, Microsoft have announced Azure Migrate which uses an application based approach for the following:

  • Discovery and assessment for on-premises virtual machines
  • Inbuilt dependency mapping for high-confidence discovery of multi-tier applications
  • Intelligent rightsizing to Azure virtual machines
  • Compatibility reporting with guidelines for remediating potential issues
  • Integration with Azure Database Management Service for database discovery and migration

I wondering if this will be a PaaS offering of Microsoft Assessment and Planning Toolkit? Any how read more here.

Azure File Sync (Preview)

You would have thought with the advent of SharePoint and OneDrive for Business that the traditional file server would be on the way out,  however file storage still continues to be an issue for many companies.  Microsoft have announced Azure File Sync which enables you to replicate file data across the globe and tier data from on-premises to Microsoft Azure without a StoreSimple device.

When more details are announced, I will be interested to understand how Microsoft deal with file locking and if this will be dealt with using Optimistic Concurrency, Pessimistic Concurrent or Last Writer Wins.  Also backup of data needs to be addressed as well.

For more information see here.

Azure DDoS Protection Service (Preview)

Security is always a hot topic when discussing public cloud services, figuring how you protect the ‘crown jewels’ is difficult and can be difficult to get Information Security Risk officers to agree on your approach.

To counter act this Microsoft have announced Azure DDoS Protection Service which in a nutshell protects a virtual network and everything behind it.  The service understands your normal application traffic profiles using machine learning and detects malicious traffic attacks.  Azure DDoS Protection can also be combined with Web Application Firewalls to provide protection from:

  • Request rate-limiting
  • HTTP Protocol Violations
  • HTTP Protocol Anomalies
  • SQL Injection
  • Cross site scripting

For more information see here.

 

Microsoft Azure Enterprise Cost Management

azureMicrosoft have announced the preview of Enterprise Cost Management for Azure, which is great news for Enterprise Agreement customers.

Until now gaining visibility of spend on an Azure Enterprise Agreement has been difficult to manage even when combined with Tags and Resource Groups.

It should also be noted that an Enterprise Agreement doesn’t provide spending limits (see offer details), quotas or even billing alerts (see prevent unexpected costs) so customers are often wary of migrating services to Microsoft Azure and/or providing access to their Azure Portals due to fear of being stung by large bills.

It is understandable that Microsoft do not want to ‘turn off’ customers workloads, however their could be a case for this in a development environment where a person leaves a ‘monster VM’ up and running of a month by mistake.

This is a step in the right direction, hopefully we will see billing alerts added in the not to distant future.