Storage Spaces Direct Overview

Storage Spaces Direct is an area which I have been meaning to look into, but for one reason or another it has slipped through the gaps until now.

What Is Storage Spaces Direct

Storage Spaces Direct is a shared nothing software defined storage which is part of the Windows Server 2016 operating system.  It creates a pool of storage by using local hard drives from a collection (two or more) individual servers.

The storage pool is used to create volumes which have in built resilience, so if a server or hard drive fails, data remains online and accessible.

What Is The Secret Sauce?

The secret sauce is within the ‘storage bus’ which is essentially the transport layer that provides the interaction between the physical disks across the network using SMB3. It allows each of the Hosts to see all disks as if they where it’s own local disk using Cluster Ports and Cluster Block Filter.

The Cluster Ports is like an initiator in iSCSI terms and Cluster Block Filter is the target, this allows each disk to presented to each Host as if it was it’s own.

Storage Bus v0.1

For a Microsoft supported platform you will need a 10GbE network with RDMA compliant HBA’s with either iWARP or RoCE for the Storage Bus.

Disks

When it comes to Storage Spaces Direct, all disks are not equal and you have a number of disk configurations which can be used.   Drive choices are as follows:

  • All Flash NVMe
  • All Flash SSD
  • NVMe for Cache and SSD for Capacity (Writes are cached and Reads are not Cached)
  • NVMe for Cache and HDD for Capacity
  • SSD for Cache and HDD for Capacity (could look at using more expensive SSD for cache and cheaper SSD for capacity)
  • NVMe for Cache and SSD and HDD for Capacity

In a SSD and HDD configuration the Storage Bus Layer Cache binds SSD to HDD to create a read/write cache.

Using NVMe based drives will provide circa 3 x times performance at typically 50% lower CPU cycles versus SSD, but come at a far greater cost point.

It should be notes that as a minimum 2 x SSD and 4 x HDD are needed for a supported Microsoft configuration.

Hardware

In relation to the hardware it must be on Windows Server Catalog and Certified for Windows Server 2016.  Both HPE DL380 Gen10 and Gen9 are supported along with HPE DL360 Gen10 and Gen9.  When deploying Storage Spaces Direct you need to ensure that the Cluster creation passes all validate tests to be supported by Microsoft.

  • All servers need to be the same make and model
  • Minimum of Intel Nehalem process
  • 4GB of RAM per TB of cache drive capacity on each server to store metadata e.g. 2 x 1TB SSD per Server then 8GB of RAM dedicated to Storage Spaces Direct
  • 2 x NICS that are RDMA capable with either iWARP or RoCE dedicated to the Storage Bus.
  • All servers must have the same drive configuration (type, size and firmware)
  • SSDs must have power loss protection (enterprise grade)
  • Simple pass through SAS HBA for SAS and SATA drives

Things to Note

  • The cache layer is completely consumed by Cluster Shared Volume and is not available to store data on
  • Microsoft recommendation is to make the cache drives a multiplier of capacity drives e.g. 2 x SSD per server then either 4 x HDD or 6 x HDD PER SERVER
  • Microsoft recommends a single Storage Pool per cluster e.g. all the disks across A 4 x Hyper-V Hosts contribute to a single Storage Pool
  • For a 2 x Server deployment the only resilience choice is a two way mirror.  Essentially data is written to two different HDD in two different servers, meaning your capacity layer is reduced by 50%.
  • For a 3 + Server deployment Microsoft recommends a three way mirror.  Essentially three copies of data across 3 x HDD on 3 x Servers reducing capacity to 33%.  You can undertake single parity (ALA RAID5) but Microsoft do not recommend this.
  • Typically a 10% cache to capacity scenario is recommended e.g. 4 x 4TB SSD is 16TB capacity then 2 x 800GB SSD should be used.
  • When the Storage Pool is configured Microsoft recommend leaving 1 x HDD worth of capacity for immediate in-place rebuilds of failed drives.  So with 4 x 4TB you would leave 4TB un allocated in reserve
  • Recommendation is to limit storage capacity per server to 100TB, to reduce resync of data after downtime, reboots or updates
  • Microsoft recommends using ReFS for Storage Spaces Direct for performance accelerations and built in protection against data corruption, however it does not support de-duplication yet.  See more details here https://docs.microsoft.com/en-us/windows-server/storage/refs/refs-overview

HP 3PAR Streaming Remote Copy Replication

The replication in 3PAR Arrays has always been mediocre.  In the older versions of 3PAR Inform OS if you choose ‘sync replication’ for a single remote copy group, you could not use ‘a sync’ for any other remote copy groups.

This feature was addressed in a newer version of 3PAR Inform OS, however your lowest RPO using ‘a sync’ was bottlenecked at 15 minutes regardless of available bandwidth.

With the release of the HP 3PAR 20000 Series, comes a new feature which is streaming ‘a sync’ replication.

What Is Streaming ‘A Sync’ Replication

Essentially, if you have the bandwidth and cache available the source 3PAR will stream replication across to the target 3PAR reducing your RPO below 15 minutes.  I like to think of it as a best endeavours.

Replication Modes

When designing a replication infrastructure it’s important to know the transport method as well as the thresholds in terms of bandwidth and latency between source and target arrays.  This ensures that you are not only within a supported SLA, but also to ensure that write performance of the source array is not effected.

The table below shows supported thresholds.

Replication Modes

Architecture

The source array uses a local cache to maintain host write transactions in memory.  A concept known as ‘delta sets’ are used.

Source 3PAR Array

  • I/Os are transferred from primary array to secondary array as part of a delta sets
  • I/Os on the primary array that belong to a particular remote copy group are grouped together into delta sets
    • A delta set is made up of sub-set of I/Os, where sub-set represents I/Os owned by a remote copy group on each given node

Target 3PAR Array

  • A delta set is applied on the secondary RC volume group only after:
    • The entire delta set has been received in the secondary array cache
    • And the previous sets that this delta set depends upon have completed.
  • A secondary RC volume group is always in a crash consistent state, before or after the application of a delta set. It is not crash consistent during the application of a delta set.
    • If the delta set fails to apply on the secondary volume then the group stops and a fail back to the last coordinated snapshot is required

Remote Copy Architectuer

What About Write Bursts?

A write burst is when the array receives a significant number of writes which could last for a few minutes.  If the inter site link between source and target array is sufficient this has no impact.

It is when the inter site link cannot cope or the write cache gets filled then the source 3PAR will choose a random remote copy group to stop and a snapshot is taken.

Note: You have no control over which remote copy group

Once stopped these groups will start again at the next sync period.

Final Thoughts

This is a great feature set being added to the 3PAR 20,000 Series.  I’m sure when the next .1 release update is received you will be able to select which remote copy groups you would want to stop either due to a write burst or cache overflow.

With most 3PAR updates, I expect the streaming ‘a sync’ replication to find its way into the 7×00 series within a short period of time.

How To: Map HP StoreVirtual Volumes to Datastores

Problem Statement

You have created numerous datastores on your HP StoreVirtual of the same size and presented these to your ESXi Hosts.  However, you have since forgotten how the datastores map back to the volumes.

When you check the Runtime Name of your devices (Storage > Devices) to find out the LUN number, you see that each LUN has is ‘0’ as per the screenshot below.

LUN 0

This can be confirmed in HP StoreVirtual Centralised Management Console under Servers > Select Server > Volumes & Snapshots

LUN 0 HP SV

Not very helpful at all!

Resolution

Each datastores has a unique iSCSI Target string which can be used to identify how they are mapped to volumes.

To find out what they are select the Datastore > Properties > Manage Paths

Device Properties

At the bottom we can see the Target, this shows tells us the following details:

  • DC02-MG01
    • Denotes the Management Group the volume is in
  • 39 is the hexadecimal representation of 27 which is the VMware NAA (thanks to Jonathan Reid for this information)
    • Denotes the unique target identifier for the volume
  • DC01-DR01SRM
    • Denotes the volume name on the HP StoreVitual

Target Name

So we now know this datastore corresponds to the volume called DC01-DR01SRM in Management Group DC02-MG01.

3PAR StoreServ Zoning Best Practice Guide

This is an excellent guide which has been written by Gareth Hogarth who has recently implemented a 3PAR StoreServ and was concerned about the lack of information from HP in relation to zoning.  Being a ‘stand up guy’ Gareth decided to perform a lot of research and has put together the ‘3PAR StoreServ Zoning Best Practice Guide’ below.

This article focuses on zoning best practices for the StoreServ 7400 (4 node array), but can also be applied to all StoreServ models including the StoreServ 10800 8-node monster.

3PAR StoreServ Zoning Best Practice Guide

Having worked on a few of these, I found that a single document on StoreServ zoning BP doesn’t really exist. There also appears to be conflicting arguments on whether to use Single Initiator – Multiple Target zoning or Single Initiator – Single Target zoning. The information herein can be used as a guideline for all 3PAR supported host presentation types (VMware, Windows, HPUX, Oracle Linux, Solaris etc…).

Disclaimer:  Please note that this is based on my investigation, engaging with HP Storage Architects and Implementation Engineers. Several support cases were opened in order to gain a better understanding of what is & isn’t supported. HP recommendations change all the time, therefore it’s always best to speak with HP or your fabric vendor to ensure you are following latest guidelines or if you need further clarification.

Right, let’s start off with Fabric Connectivity

In terms of host connectivity options the StoreServ 7000 (specifically the 7400) provides us with the following:

  • 4x built-in 8 Gb/s Fibre Channel ports per node pair.
  • Optional 8 Gb/s Quad Port Fibre Channel HBA (Host Bus Adapter) per node (we will be focusing on this configuration option).
  • Optional 10 Gb/s Dual Port FCOE (Fibre Channel over Ethernet) converged network adapter per node.

StoreServ target ports are identified in the following manner Node:Slot:Port.

StoreServ target ports located on the on-board HBA’s will always assume the slot identity of 1, respectively StoreServ targets ports located on the optional expansion slot will always assume the identity of slot 2.

StoreServ nodes are grouped in pairs, it’s important to pay particular attention to this when zoning host initiators (server HBA ports) to the StoreServ Target ports.

StoreServ7000-HostPorts

Recommendations

  • Each HP 3PAR StoreServ node should be connected to two fabric switches.
  • Ports of the same pair of nodes with the same ID (value) should be connected to the same fabric.
  • General rule – odd ports should be connect to fabric 1 and even ports should be connected to fabric 2.

Figure 1a below identifies physical cabling techniques, mitigating against single points of failure using a minimum of two fabric switches, which are separated from each other.

The example below illustrates StoreServ nodes with supplementary quad port HBA’s:

figure 1a_StoreServ_nPcabling

Moving on to Port Persistence

As already covered by Craig in this blog post, a host port would be connected and zoned on the fabric switch via one initiator (host HBA port) to one HP 3PAR StoreServ target port (one-to-one zoning). The pre-designated HP 3PAR StoreServ backup port must be connected to the same fabric as its partner node port.

It is best practice that a given host port sees a single I/O path to HP 3PAR StoreServ. As an option, a backup port can be zoned to the same host port as the primary port, which would result in the host port seeing two I/O paths to the HP 3PAR StoreServ system. This would also result in the configuration where a HP 3PAR StoreServ port can serve as the primary port for a given host port(s) and backup port for host port(s) connected to its partner node port.

Persistent ports leverage SAN fabric NPIV functionality (N_Port ID Virtualization) for transparent migration of a host’s connection, to a predefined partner port on the HP 3PAR StoreServ array during software upgrades or node failure.

One of the ways this is accomplished is by having a predefined host facing port on the 3PAR StoreServ array, so that in the event of upgrade (node shutdown) or node down status the partner port will assume the identity of its partner port. The whole process is transparent to the host. When the node returns to normal I/O is failed back to the original target port.

Although unconfirmed I have heard that in in future releases of Inform OS we will get this level of protection at the port level.

Essentially for this to work Port Persistence requires that corresponding ‘native’ and ‘guest’ StoreServ ports on a node pair, be connected to the same fibre channel fabric.

Requirements for 3PAR Port Persistence:

  • The same host ports on the host facing HBA’s in the nodes in a node pair must be connected to the same fabric switch.
  • The host facing ports must be set to target mode.
  • The host facing ports must be configured for point-to-point connections.
  • The Fibre Channel fabric must support NPIV and have NPIV enabled on the switch ports.

Checking and enabling NPIV

Brocade Fabric OS (ensure you have the appropriate license which enables NPIV)

admin> portcfgshow ‘port#’

If the NPIV capability is enabled, the results of the portcfgshow command will identify this, i.e NPIV capability ON.

If the NPIV capability is not enabled, you can turn it on with the following command:

admin> portCfgNPIVPort ‘port#’ 1   (1 = on, 0 = off)

 Cisco MDS Series Switches

fabSwitch # conf t

fabSwitch(config) # feature npiv (Enables NPIV for all VSANs on the switch)

QLogic SANbox 3800, 5000 and 9000 Switches

Don’t require a license, it’s enabled by default (just ensure you are using firmware version 6.8.0.0.3 or above).

Now let’s cover Switch Zoning (Fibre Channel)

SAN zoning is used to logically group hosts and storage devices together in a physical SAN, so that authorised devices can only communicate with each other if they are in the same SAN zone.

The function of zoning is to:

  • Restrict access so that hosts can only see the data they are authorised to see.
  • Prevent RSCN (Registered State Change Notification) broadcasts.

What are ‘RSCNs’ ? RSCNs are a feature of fabric switches.  It’s a service of the fabric that notifies devices of changes on the state of other attached devices. For example if a device is reset, removed or otherwise undergoes a significant change in status.

These broadcasts are made to all members in the configured SAN zone. As hosts and storage targets can be grouped in a zone its best practice to reduce the impact of these types of broadcasts (Note: an argument against RSCN’s causing issues in zoning tables is that newer HBA’s do a good job limiting the impact of these types of broadcasts).  Nevertheless, I prefer limiting the number of initiators and targets in a fabric zone to a minimum.

Zoning Types

  • Domain, Port zoning uses switch domain id’s and port numbers to define zones.
  • Port World Wide Name or pWWN zoning uses port World Wide Names to define zones. Every port on a HBA has a unique pWWN. (A host HBA comprises of a – nWWN & a pWWN, the nWWN refers to the whole device whereas the pWWN refers to the individual port.

The preferred zoning unit for the 3PAR StoreServ is pWWN. If you are currently using Domain, Port migrating to pWWN is very easy. Simply create new zones based on the pWWN of the host and the pWWN of the storage target, add these new zones to your fabric switches, zoning-out the references to Domain, Port for that respective HBA port. Some fabric vendor’s support mixing both Domain, Port and pWWN in the same zone. I prefer using one or the other explicitly.

The following command outputs the StoreServ ports and partner ports, which can be used to identify the node pWWN’s for zoning.

3PAR01 cli% showport 

HP 3PAR StoreServ supports the following zoning configurations:

  • Single initiator – Single Target per zone (recommended)
  • Single initiator – Multiple Targets per zone

Use Single Initiator – Single Target per zone over Single Initiator Multiple Targets per zone to reduce RSCN’s as previously discussed.

At the time of writing, HP 3PAR OS implementation documentation references Single Initiator Multiple Targets as the recommended zoning type. However, when I queried this I was directed to use Single Initiator – Single Target Zoning.  HP support pointed me in the direction of this document which identifies Single Initiator – Single Target zoning as best practice: http://www8.hp.com/h20195/v2/GetDocument.aspx?docname=4AA4-4545ENW

HP will support Single Initiator – Multiple Target, but you should not have a single host initiator attached to more than two StoreServ target ports!

Host port WWN’s should be zoned in partner pairs. For example if a host is zoned to node port 0:2:1, then it should be zoned to node port 1:2:1 (I’m speculating here, but I guess this is because controller nodes mirror cache I/O, so that in the event of node failure write operations in cache are not lost – hence we zone in node pairs and not across nodes from different pairs).

After you have zoned the host pWWN to the StoreServ node pWWN, you can use the 3PAR CLI showhost command to ensure that each host initiator is zoned to the correct StoreServ target ports (ensuring initiators go to different targets over different fabrics).

Figure 1b represents a staggered approach where you would have odd numbered VMware hosts connecting to nodes 0 & 1, and even numbered hosts connecting to nodes 2 & 3 (Note: currently the StoreServ is designed to tolerate a single node failure only, this includes the 8-node StoreServ 10800 array).

The example depicts Single Initiator – Single Target zoning, so a host with two HBA ports connecting over two fabrics will have a total of four zones (two per fabric). In case you were wondering the maximum allowed is eight (also known as fan-in limitation which is four per fabric).

figure 1b_host_zoning

Here are some additional points to be aware of

 Fan-in/Fan-out ratios:

  • Fan-in refers to a host server port connected to several HP 3PAR storage ports via Fibre Channel switch.
  • Fan-out refers to the HP 3PAR StoreServ storage port that is connected to more than one host HBA port via Fibre Channel switch.

Note: Fan-in over subscription represents the flow of data in terms of client initiator to StoreServ target ports. HP/3PAR documentation states that a maximum of four HP 3PAR storage system ports can fan-in to a single host server port (if you are thinking great, I’ll connect my VMware host to 8 ports [four per fabric] think again.  Using this approach when you have hundreds of hosts can quickly reach the maximum StoreServ port connection limitation which is 64!) it’s just not necessary.

StoreServ Target Port Maximums (As per 3PAR InForm OS 3.1.1 please observe the following):

  • Maximum of 16 hosts initiators per 2Gb HP 3PAR StoreServ Storage Port
  • Maximum of 32 hosts initiators per 4Gb HP 3PAR StoreServ Storage Port
  • Maximum of 32 hosts initiators per 8Gb HP 3PAR StoreServ Storage Port
  • Maximum total of 1,024 host initiators per HP 3PAR StoreServ Storage System

HP documentation states that these recommendations are guidelines, adding more than the recommended hosts should only be attempted, when the total expected workload is calculated and shown not to overrun either the queue depth or throughput of the StoreServ node port.

Note: StoreServ storage ports irrespective of speed, will negotiate at the lowest speed of the supporting fabric switch (keep this in mind when calculating the number of host connections).

The following focuses on changing the target port queue depth on a VMware ESX environment.

The default setting for target port queue depth on the ESX host can be modified to ensure that the total workload of all servers will not overrun the total queue depth of the target HP StoreServ system port. The method endorsed by HP is to limit the queue depth on a per-target basis. This recommendation comes from limiting the number of outstanding commands on a target (HP 3PAR StoreServ system port), per ESX host.

The following values can be set on the HBA running VMware vSphere. These values limit the total number of outstanding commands the operating system routes to one target port:

  • For Emulex HBA target throttle = tgt_queue_depth
  • For Qlogic HBA target throttle = ql2xmaxqdepth
  • For Brocade HBA target throttle = bfa_lun_queue_depth

(Note: for instructions on how to change these values follow VMware KB1267‎, these values are also adjustable on Linux Redhat & Solaris).

The Formula used to calculate these values is as follows:

(3PAR port queue depth [see below]) / (total number of ESX severs attached) = recommended value

The I/O queue depth for each HP 3PAR StoreServ storage system HBA mode is shown below:

Note: The I/O queues are shared among the connected host server HBA ports on a first come first serve basis.

HP 3PAR StoreServ Storage HBA I/O queue depth values
Qlogic 2Gb 497
LSI 2Gb 510
Emulex 4Gb 959
HP 3PAR HBA 4Gb 1638
HP 3PAR HBA 8Gb 3276

Well, hopefully you found the above information useful. Here is a high level summary of what we have discussed:

  • Identify and enable NPIV on your fabric switches (Fibre Channel only feature – NPIV-Port Persistence is not present in iSCSI environments)
  • Use Single Initiator -> Single Target zoning (HP will support Single Initiator – Multiple Target, but you should not have a single host initiator attached to more than two StoreServ target ports).
  • A maximum of four HP 3PAR Storage System ports can fan-in to a single host server port.
  • Zoning should be done using pWWN. You should not use switch port/Domain ID or nWWN.
  • A host (non-hypervisors) should be zoned with a minimum of two ports from the two nodes of the same pair. In addition, the ports from a host zoning should be mirrored across nodes.
  • Hosts need to be zoned to node pairs. For example, zoned to nodes 0 and 1 or to nodes 2 and 3. Hosts should NOT be zoned to non-mirrored nodes such as 0 and 3.
  • When using hypervisors, avoid connecting more than 16 initiators per 4 Gb/s port or more than 32 initiators per 8 Gb/s port.
  • Each HP 3PAR StoreServ system has a maximum number of initiators supported, that depends on the model and configuration.
  • A single HBA zoned with two FC ports will be counted as two initiators. A host with two HBA’s, each zoned with two ports, will count as four initiators.
  • In order to keep the number of initiators below the maximum supported value, use the following recommendations:
    • Hypervisors: four paths maximum.
    • Other hosts (non-hypervisors): two paths to two different nodes of the same port pairs.
  • Hypervisors can be zoned to four different nodes but the hypervisor HBAs must be zoned to the same Host Port on HBAs in the nodes for each Node Pair.

Reference Documents

HP SAN Design Reference Zoning Recommendations

HP 3PAR InForm® OS 3.1.1 Concepts Guide

The HP 3PAR Architecture

HP UX 3PAR Implementation Guide

HP 3PAR Red Hat Enterprise Linux and Oracle Linux Implementation Guide

HP 3PAR VMware ESX Implementation Guide

HP 3PAR StoreServ Storage and VMware vSphere 5 best practices

HP 3PAR Windows Server 2012, Server 2008 Implementation Guide

HP Brocade Secure Zoning Best Practises

HP 3PAR Peer Persistence Whitepaper

An introduction to HP 3PAR StoreServ for the EVA Administrator

Building SANs with Brocade Fabric Switches by Syngress

Part 3 – Configuring Site Recovery Manager (SRM) With HP StoreVirtual VSA

This is where things start to get exciting! We are going to replicate Volumes between Production and DR and then check to ensure that SRM can see the replicated Volumes.

Replication can occur on two different levels, ‘synchronously’ and ‘asynchronously’ naturally it is only used for write’s and not read’s, so what’s the difference?

Synchronous written blocks are sent to the replication SAN, until this is committed by the replication SAN and confirmation received by the replication SAN, no further block’s are allowed to be written by either SAN.  This means that you would have potentially one block of data loss in the event of a SAN failure. This type of replication should only be used in low latency environments, and is the basis for network RAID on the HP StoreVirtual VSA. As a general rule of thumb the latency normal needs to be less than <2ms to achieve this.

Asynchronous written blocks are sent to the replication SAN and no confirmation is required.  The originating SAN just keeps sending more and more blocks on a predefined schedule e.g. 30 minutes.  If you have a SAN failure than your potential data loss is up to last block that the replication SAN had chance to commit.  This is the most commonly used replication type and is supported with the HP StoreVirtual VSA and SRM.

Replicating Volumes

In my lab, I have created two volumes at the Production site called PR_SATA_TEST01 and PR_SATA_TEST02 these are thinly provisioned and contain the VMDK files for VMF-TEST01 AND VMF-TEST02 respectively.

Before we start replicating the volumes, we need to check that we have only assigned the ESXi Hosts at the Production site to the volume.  Look under Assigned Servers to make doubly sure.

Why’s this important Craig, I hear you ask.  Well SRM is responsible for failing over the replicated volume and also presenting it too the ESXi Hosts in the DR site.  If we assign ESXi Hosts to the volume at both sites, we are manually interfering with the SRM process and we also potentially can expose the replicated volume to read/write conditions.

We want to Right Click the Volume we want to replicate, in this case it’s PR_SATA_TEST01 and select ‘New Schedule to Remote Snapshot a Volume’

We need to give the schedule a name, mines going to be PR_SATA_TEST01_RS with a description Replicated Volume.  We are going to replicate every 30 minutes which is the fastest period supported by SAN iQ 9.5.  We are going to retain only 1 snapshot at the Primary site.

For the Remote Snapshot Setup, we are going to use SSDMG01 which is the Management Group at the DR site, and we are giong to retain only 1 copy of the snapshot in DR

TOP TIP: Do NOT tick Include Primary Volumes, if you do then fail back will be a manual process.

We are going to create a New Remote Volume at the DR site.  To do this click on New Remote Volume and select Add a Volume to an Existing Cluster

Double check that your Cluster is at the DR site and click Next

Give the Volume a name, is this case we are rolling with DR_SATA_TEST01 and the description is Replication Volume

Click Finish and Close. We should now be back to the Schedule to Remote Snapshot a Volume screen, but OK is greyed out.  That’s because we haven’t chosen a time for replication to start.

To do this click Edit

Then either select a date/time you want it to start or click OK for it to start immediately.  It has been known that I’m pretty impatient, so I’m going to click OK to start now!

Excellent news, we now have the OK button available to Click, so let’s do that.

You should now see a DR_SATA_TEST01 appear in your DR Cluster and little icons showing the Volume is being replicated to the DR site.

You may have noticed that original Volume PR_SATA_TEST01_RS has (1) at the end and also the replication is happening between PR_SATA_TEST01_TS_Pri.1 and PR_SATA_TEST01_RS_Rmt.1

Let’s take a moment, to explore this as it’s quite an important concept.  Essentially the original Volume PR_SATA_TEST01 has had snapshot taken of it.  This has been renamed with Pri.1 at the end which stands for Primary Volume Snapshot 1.  At the DR site we have an extension Rmt.1 this means Remote Site Snapshot 1.  Make sense?

If we click PR_SATA_TEST01_RS_Pri.1 and select Remote Snapshots we can see the time it’s taken to replicate the volume and the transfer rate as well.

Side note, did you know that Under Remote Snapshot Tasks (at the bottom of the screen) we can even set the bandwidth to be used, pretty cool eh?

Back on track, we now need to do the same for PR_SATA_TEST02

Cool, that’s the replication now all set up, let’s jump back into SRM and check out the Array Managers

Array Managers

Back in SRM, click on Array Managers and then onto Production – StoreVirtual and finally click on Array Pairs and you see, an awesome amount of nothing.  Err Craig what’s going on, I thought I was meant to see Volumes being replicated?

Never fear, hit the Refresh button and click Yes to the Discover Array Pairs operation

Now we should see the Remote Array which is in this case is SSDMG01.  Click Enable

You might have noticed that we you clicked on Enable, it kicked off a load of tasks.  Essentially, SRM is discovering replicated volumes.   Let’s click on Devices and we should now see PR_SATA_TEST01 and PR_SATA_TEST02 being replicated.

Boom, we are cooking on gas now!

TOP TIP: You need to refresh Array Manager devices manually every time you introduce a replicated Volume

Protection Groups

Protection Groups are based on Volumes being replicated.  SRM will automatically look into the Volume and establish which virtual machines are being replicated.  The way I think about it is that all a Protection Group really is, is a replicated Volume.

So we can configure two Protection Groups as we have two replicated Volumes, that should hopefully make sense.

Click on Protection Groups from the left hand menu and then on Create Protection Group

Choose your Protected site, in this case Production (Local) and click Next

Select the Datastore Group which in this case is PR_SATA_TEST01 and you will notice that VMF-TEST01 has automatically been added as a protected VM.

Give the Protection Group a name and description.  Using my creativity I have opted for PG_SATA_TEST01

Click Next and then finally finish.

As always, we now need to repeat the process for PR_SATA_TEST02.  Once done, you will have two Protection Groups like this.

How do we know that what we have done is rock solid? Well if we go onto VMF-ADMIN02 which is our vCenter in DR, we should see VMF-TEST01 and VMF-TEST02 protected by superman, err I mean SRM.

That’s it for this post, in the next one, we are going to get involved with some Recovery Plans!