Storage Spaces Direct is an area which I have been meaning to look into, but for one reason or another it has slipped through the gaps until now.
What Is Storage Spaces Direct
Storage Spaces Direct is a shared nothing software defined storage which is part of the Windows Server 2016 operating system. It creates a pool of storage by using local hard drives from a collection (two or more) individual servers.
The storage pool is used to create volumes which have in built resilience, so if a server or hard drive fails, data remains online and accessible.
What Is The Secret Sauce?
The secret sauce is within the ‘storage bus’ which is essentially the transport layer that provides the interaction between the physical disks across the network using SMB3. It allows each of the Hosts to see all disks as if they where it’s own local disk using Cluster Ports and Cluster Block Filter.
The Cluster Ports is like an initiator in iSCSI terms and Cluster Block Filter is the target, this allows each disk to presented to each Host as if it was it’s own.
For a Microsoft supported platform you will need a 10GbE network with RDMA compliant HBA’s with either iWARP or RoCE for the Storage Bus.
When it comes to Storage Spaces Direct, all disks are not equal and you have a number of disk configurations which can be used. Drive choices are as follows:
- All Flash NVMe
- All Flash SSD
- NVMe for Cache and SSD for Capacity (Writes are cached and Reads are not Cached)
- NVMe for Cache and HDD for Capacity
- SSD for Cache and HDD for Capacity (could look at using more expensive SSD for cache and cheaper SSD for capacity)
- NVMe for Cache and SSD and HDD for Capacity
In a SSD and HDD configuration the Storage Bus Layer Cache binds SSD to HDD to create a read/write cache.
Using NVMe based drives will provide circa 3 x times performance at typically 50% lower CPU cycles versus SSD, but come at a far greater cost point.
It should be notes that as a minimum 2 x SSD and 4 x HDD are needed for a supported Microsoft configuration.
In relation to the hardware it must be on Windows Server Catalog and Certified for Windows Server 2016. Both HPE DL380 Gen10 and Gen9 are supported along with HPE DL360 Gen10 and Gen9. When deploying Storage Spaces Direct you need to ensure that the Cluster creation passes all validate tests to be supported by Microsoft.
- All servers need to be the same make and model
- Minimum of Intel Nehalem process
- 4GB of RAM per TB of cache drive capacity on each server to store metadata e.g. 2 x 1TB SSD per Server then 8GB of RAM dedicated to Storage Spaces Direct
- 2 x NICS that are RDMA capable with either iWARP or RoCE dedicated to the Storage Bus.
- All servers must have the same drive configuration (type, size and firmware)
- SSDs must have power loss protection (enterprise grade)
- Simple pass through SAS HBA for SAS and SATA drives
Things to Note
- The cache layer is completely consumed by Cluster Shared Volume and is not available to store data on
- Microsoft recommendation is to make the cache drives a multiplier of capacity drives e.g. 2 x SSD per server then either 4 x HDD or 6 x HDD PER SERVER
- Microsoft recommends a single Storage Pool per cluster e.g. all the disks across A 4 x Hyper-V Hosts contribute to a single Storage Pool
- For a 2 x Server deployment the only resilience choice is a two way mirror. Essentially data is written to two different HDD in two different servers, meaning your capacity layer is reduced by 50%.
- For a 3 + Server deployment Microsoft recommends a three way mirror. Essentially three copies of data across 3 x HDD on 3 x Servers reducing capacity to 33%. You can undertake single parity (ALA RAID5) but Microsoft do not recommend this.
- Typically a 10% cache to capacity scenario is recommended e.g. 4 x 4TB SSD is 16TB capacity then 2 x 800GB SSD should be used.
- When the Storage Pool is configured Microsoft recommend leaving 1 x HDD worth of capacity for immediate in-place rebuilds of failed drives. So with 4 x 4TB you would leave 4TB un allocated in reserve
- Recommendation is to limit storage capacity per server to 100TB, to reduce resync of data after downtime, reboots or updates
- Microsoft recommends using ReFS for Storage Spaces Direct for performance accelerations and built in protection against data corruption, however it does not support de-duplication yet. See more details here https://docs.microsoft.com/en-us/windows-server/storage/refs/refs-overview