Storage Spaces Direct Overview

Storage Spaces Direct is an area which I have been meaning to look into, but for one reason or another it has slipped through the gaps until now.

What Is Storage Spaces Direct

Storage Spaces Direct is a shared nothing software defined storage which is part of the Windows Server 2016 operating system.  It creates a pool of storage by using local hard drives from a collection (two or more) individual servers.

The storage pool is used to create volumes which have in built resilience, so if a server or hard drive fails, data remains online and accessible.

What Is The Secret Sauce?

The secret sauce is within the ‘storage bus’ which is essentially the transport layer that provides the interaction between the physical disks across the network using SMB3. It allows each of the Hosts to see all disks as if they where it’s own local disk using Cluster Ports and Cluster Block Filter.

The Cluster Ports is like an initiator in iSCSI terms and Cluster Block Filter is the target, this allows each disk to presented to each Host as if it was it’s own.

Storage Bus v0.1

For a Microsoft supported platform you will need a 10GbE network with RDMA compliant HBA’s with either iWARP or RoCE for the Storage Bus.

Disks

When it comes to Storage Spaces Direct, all disks are not equal and you have a number of disk configurations which can be used.   Drive choices are as follows:

  • All Flash NVMe
  • All Flash SSD
  • NVMe for Cache and SSD for Capacity (Writes are cached and Reads are not Cached)
  • NVMe for Cache and HDD for Capacity
  • SSD for Cache and HDD for Capacity (could look at using more expensive SSD for cache and cheaper SSD for capacity)
  • NVMe for Cache and SSD and HDD for Capacity

In a SSD and HDD configuration the Storage Bus Layer Cache binds SSD to HDD to create a read/write cache.

Using NVMe based drives will provide circa 3 x times performance at typically 50% lower CPU cycles versus SSD, but come at a far greater cost point.

It should be notes that as a minimum 2 x SSD and 4 x HDD are needed for a supported Microsoft configuration.

Hardware

In relation to the hardware it must be on Windows Server Catalog and Certified for Windows Server 2016.  Both HPE DL380 Gen10 and Gen9 are supported along with HPE DL360 Gen10 and Gen9.  When deploying Storage Spaces Direct you need to ensure that the Cluster creation passes all validate tests to be supported by Microsoft.

  • All servers need to be the same make and model
  • Minimum of Intel Nehalem process
  • 4GB of RAM per TB of cache drive capacity on each server to store metadata e.g. 2 x 1TB SSD per Server then 8GB of RAM dedicated to Storage Spaces Direct
  • 2 x NICS that are RDMA capable with either iWARP or RoCE dedicated to the Storage Bus.
  • All servers must have the same drive configuration (type, size and firmware)
  • SSDs must have power loss protection (enterprise grade)
  • Simple pass through SAS HBA for SAS and SATA drives

Things to Note

  • The cache layer is completely consumed by Cluster Shared Volume and is not available to store data on
  • Microsoft recommendation is to make the cache drives a multiplier of capacity drives e.g. 2 x SSD per server then either 4 x HDD or 6 x HDD PER SERVER
  • Microsoft recommends a single Storage Pool per cluster e.g. all the disks across A 4 x Hyper-V Hosts contribute to a single Storage Pool
  • For a 2 x Server deployment the only resilience choice is a two way mirror.  Essentially data is written to two different HDD in two different servers, meaning your capacity layer is reduced by 50%.
  • For a 3 + Server deployment Microsoft recommends a three way mirror.  Essentially three copies of data across 3 x HDD on 3 x Servers reducing capacity to 33%.  You can undertake single parity (ALA RAID5) but Microsoft do not recommend this.
  • Typically a 10% cache to capacity scenario is recommended e.g. 4 x 4TB SSD is 16TB capacity then 2 x 800GB SSD should be used.
  • When the Storage Pool is configured Microsoft recommend leaving 1 x HDD worth of capacity for immediate in-place rebuilds of failed drives.  So with 4 x 4TB you would leave 4TB un allocated in reserve
  • Recommendation is to limit storage capacity per server to 100TB, to reduce resync of data after downtime, reboots or updates
  • Microsoft recommends using ReFS for Storage Spaces Direct for performance accelerations and built in protection against data corruption, however it does not support de-duplication yet.  See more details here https://docs.microsoft.com/en-us/windows-server/storage/refs/refs-overview

Azure Announcements September 2017

We are only two days into Microsoft Ignite and I thought I would share the announcements which I believe will become ‘heavy hitters’  in the near future.

Planned Maintenance (Preview)

One of the biggest deal breakers when migrating to public cloud, is the shear amount of single instance VMs in a customer estate which relay upon infrastructure availability to meet business SLA’s.   The cost to translate this into cloud native applications to place them into an Availability Group to receive an SLA from Microsoft and to minimise the impact of planned maintenance is often too burdensome, so they are left to wither on the vine on-premises.

Microsoft have recognised this issue and have announced ‘Planned Maintenance’ which means that you will be notified of when maintenance is going to occur and you will have up to four weeks to schedule in a reboot of your virtual machine.

This is a game changer for customers, and I would encourage you to read more here.

Azure Migrate (Preview)

To start the journey to public cloud services, you need to understand your application estate.  This is a process which should not be under estimated as many customer environments are poorly documented, application owners have left the business, operations and IT don’t really understand how an application is coupled together so trying to migrate anything but low hanging fruit often gets placed into the ‘too hard to deal with bucket’.

To counter act this, Microsoft have announced Azure Migrate which uses an application based approach for the following:

  • Discovery and assessment for on-premises virtual machines
  • Inbuilt dependency mapping for high-confidence discovery of multi-tier applications
  • Intelligent rightsizing to Azure virtual machines
  • Compatibility reporting with guidelines for remediating potential issues
  • Integration with Azure Database Management Service for database discovery and migration

I wondering if this will be a PaaS offering of Microsoft Assessment and Planning Toolkit? Any how read more here.

Azure File Sync (Preview)

You would have thought with the advent of SharePoint and OneDrive for Business that the traditional file server would be on the way out,  however file storage still continues to be an issue for many companies.  Microsoft have announced Azure File Sync which enables you to replicate file data across the globe and tier data from on-premises to Microsoft Azure without a StoreSimple device.

When more details are announced, I will be interested to understand how Microsoft deal with file locking and if this will be dealt with using Optimistic Concurrency, Pessimistic Concurrent or Last Writer Wins.  Also backup of data needs to be addressed as well.

For more information see here.

Azure DDoS Protection Service (Preview)

Security is always a hot topic when discussing public cloud services, figuring how you protect the ‘crown jewels’ is difficult and can be difficult to get Information Security Risk officers to agree on your approach.

To counter act this Microsoft have announced Azure DDoS Protection Service which in a nutshell protects a virtual network and everything behind it.  The service understands your normal application traffic profiles using machine learning and detects malicious traffic attacks.  Azure DDoS Protection can also be combined with Web Application Firewalls to provide protection from:

  • Request rate-limiting
  • HTTP Protocol Violations
  • HTTP Protocol Anomalies
  • SQL Injection
  • Cross site scripting

For more information see here.

 

Microsoft Azure Enterprise Cost Management

azureMicrosoft have announced the preview of Enterprise Cost Management for Azure, which is great news for Enterprise Agreement customers.

Until now gaining visibility of spend on an Azure Enterprise Agreement has been difficult to manage even when combined with Tags and Resource Groups.

It should also be noted that an Enterprise Agreement doesn’t provide spending limits (see offer details), quotas or even billing alerts (see prevent unexpected costs) so customers are often wary of migrating services to Microsoft Azure and/or providing access to their Azure Portals due to fear of being stung by large bills.

It is understandable that Microsoft do not want to ‘turn off’ customers workloads, however their could be a case for this in a development environment where a person leaves a ‘monster VM’ up and running of a month by mistake.

This is a step in the right direction, hopefully we will see billing alerts added in the not to distant future.

 

Azure Updates – Enhancement Summary April 2017 to July 2017

azureOver the past three months, I have been leading a delivery engagement which has meant that I’m not as up to speed as I perhaps should have been on the latest enhancements to Microsoft Azure.

With this in mind, I thought I would share with you , the feature enhancements over the past few months that have had the biggest impact to the customers I work with.

Azure Service Health (Preview)

Planned and unplanned maintenance events are always a hot topic when educating customers on the use of cloud for IaaS as it’s a paradigm shift from the on-premises operating model.

Rather than having an email letting you know that West Europe is going to be patched in the future or checking the Azure Status URL, Microsoft have rolled this up into Service Health.

In a nutshell this lets you know what ongoing issues in Azure services are impacting your resources, provides you with a PDF summary of the issue for problem management.

Read more here.

Azure VM Configuration Changes (Private Preview)

Let’s face it a significant proportion of operational outages are caused by people making changes without following the correct internal procedures.  To circumvent this, Microsoft have introduced Azure VM Configuration Changes which can track all Windows Services, Linus Daemons, Software by default.

Azure VM Configuration Changes also allows you to view changes in the last 30 minutes, hour, six hours, 7 days or 30 days so you can pinpoint when changes occurred to the VM.

See more here.

Azure Large Disks

One of the challenges around IaaS VMs was trying to fit existing file structures into or across multiple 1TB hard drives.  This caused a few challenges for customers who had to rework GPO’s or migrate data to enable the use of file services within Azure.

Another significant challenge was using Azure Site Recovery to protect a VM with a hard drive larger than 1TB.  To address both of these issues Microsoft have launchged 4TB for Azure IaaS VM’s,

See more here and here.

Azure Application Gateway

Security is always a hot topic when it comes to cloud and Microsoft has fixed the gap it had between DNS based Global Site Load Balancing using Traffic Manager and Azure Load Balancer which worked at Layer 4 (TCP/UDP).

Azure Application Gateway acts as a Web Application Firewall to protect from common web attacks such as SQL injection, cross site scripting and session hijacks.

Read more here.

Faster Azure VPN Gateway

When customers embark on their cloud journey, it normally starts with a Site to Site VPN whilst ExpressRoute is put in place.  A previous limiting factor with Site to Site VPN’s was the bandwidth limit and SLA.

Microsoft have resolved this by introducing a new series of VPN gateways appropriately titled VpnGw1, VpnGw2 and VpnGw3 which will provide an SLA of 99.95% with up to 1.25Gbsp throughput at the same cost as the previous gateways.

Read more here.

 

Azure Network Watcher

This is a guest blog post by one of my Cisco CCIE colleagues Adam Stuart on his view of Azure Network Watcher.

What is it?

Azure Network Watcher is a feature within Microsoft Azure to make consumption of network data/troubleshooting easier.

How much does it cost?

Free until August 1st.

Then

  • 5GB free network logs p/m, with small overcharge for extra GB
  • 1000 checks p/m, small overcharge per extra 1000 checks

Plus storage costs for log retention.

What Does It Do?

  • Monitoring Topology – Shows a very basic network topology diagram, no further drill down is possible.  To exactly useful but better than nothing.

Topology v0.1

  • Diagnostics – IP Flow Verify, simple packet trace function to test is a source/destination is allowed via a NSG policy.  Equivalent of packet trace in Cisco land.  Quite useful if you have lots of NSG.  Overall a good sanity check.

IP Flow Verify

  • Diagnostics – Next Hop, simple utility to verify next hop as per effective routing table.  This would be useful for a customer using Network Virtual Appliance (NVA) Firewalls and complex UDR.  It provides insight into the Azure routing service which is otherwise tricky to obtain.

Next Hop

  • Diagnostics – Provides details of NVA specific to network interface of VM.  Not useful unless you have overlapping NSG on a NIC and subnet and wanted to see the result of an aggregate policy
  • Diagnostics – Packet Capture, this is essentially an easier way to run tcpdump and get pcap files form virtual machines. Note, you need to install a VM extension* to get this to work see here.
  • Logs – NSG Flow Logs, the equivalent of checking Access-list log on a normal firewall. This is the primary function that most customer will be after. To answer the question “is the firewall blocking it”? Enabled on a per NSG basis, logs to a container in blob storage, which you export as JSON format.  This is probably quite powerful, but the default output is not very accessible. JSON format logs require another parser to provide any real value.