This is where things start to get exciting! We are going to replicate Volumes between Production and DR and then check to ensure that SRM can see the replicated Volumes.
Replication can occur on two different levels, ‘synchronously’ and ‘asynchronously’ naturally it is only used for write’s and not read’s, so what’s the difference?
Synchronous written blocks are sent to the replication SAN, until this is committed by the replication SAN and confirmation received by the replication SAN, no further block’s are allowed to be written by either SAN. This means that you would have potentially one block of data loss in the event of a SAN failure. This type of replication should only be used in low latency environments, and is the basis for network RAID on the HP StoreVirtual VSA. As a general rule of thumb the latency normal needs to be less than <2ms to achieve this.
Asynchronous written blocks are sent to the replication SAN and no confirmation is required. The originating SAN just keeps sending more and more blocks on a predefined schedule e.g. 30 minutes. If you have a SAN failure than your potential data loss is up to last block that the replication SAN had chance to commit. This is the most commonly used replication type and is supported with the HP StoreVirtual VSA and SRM.
In my lab, I have created two volumes at the Production site called PR_SATA_TEST01 and PR_SATA_TEST02 these are thinly provisioned and contain the VMDK files for VMF-TEST01 AND VMF-TEST02 respectively.
Before we start replicating the volumes, we need to check that we have only assigned the ESXi Hosts at the Production site to the volume. Look under Assigned Servers to make doubly sure.
Why’s this important Craig, I hear you ask. Well SRM is responsible for failing over the replicated volume and also presenting it too the ESXi Hosts in the DR site. If we assign ESXi Hosts to the volume at both sites, we are manually interfering with the SRM process and we also potentially can expose the replicated volume to read/write conditions.
We want to Right Click the Volume we want to replicate, in this case it’s PR_SATA_TEST01 and select ‘New Schedule to Remote Snapshot a Volume’
We need to give the schedule a name, mines going to be PR_SATA_TEST01_RS with a description Replicated Volume. We are going to replicate every 30 minutes which is the fastest period supported by SAN iQ 9.5. We are going to retain only 1 snapshot at the Primary site.
For the Remote Snapshot Setup, we are going to use SSDMG01 which is the Management Group at the DR site, and we are giong to retain only 1 copy of the snapshot in DR
TOP TIP: Do NOT tick Include Primary Volumes, if you do then fail back will be a manual process.
We are going to create a New Remote Volume at the DR site. To do this click on New Remote Volume and select Add a Volume to an Existing Cluster
Double check that your Cluster is at the DR site and click Next
Give the Volume a name, is this case we are rolling with DR_SATA_TEST01 and the description is Replication Volume
Click Finish and Close. We should now be back to the Schedule to Remote Snapshot a Volume screen, but OK is greyed out. That’s because we haven’t chosen a time for replication to start.
To do this click Edit
Then either select a date/time you want it to start or click OK for it to start immediately. It has been known that I’m pretty impatient, so I’m going to click OK to start now!
Excellent news, we now have the OK button available to Click, so let’s do that.
You should now see a DR_SATA_TEST01 appear in your DR Cluster and little icons showing the Volume is being replicated to the DR site.
You may have noticed that original Volume PR_SATA_TEST01_RS has (1) at the end and also the replication is happening between PR_SATA_TEST01_TS_Pri.1 and PR_SATA_TEST01_RS_Rmt.1
Let’s take a moment, to explore this as it’s quite an important concept. Essentially the original Volume PR_SATA_TEST01 has had snapshot taken of it. This has been renamed with Pri.1 at the end which stands for Primary Volume Snapshot 1. At the DR site we have an extension Rmt.1 this means Remote Site Snapshot 1. Make sense?
If we click PR_SATA_TEST01_RS_Pri.1 and select Remote Snapshots we can see the time it’s taken to replicate the volume and the transfer rate as well.
Side note, did you know that Under Remote Snapshot Tasks (at the bottom of the screen) we can even set the bandwidth to be used, pretty cool eh?
Back on track, we now need to do the same for PR_SATA_TEST02
Cool, that’s the replication now all set up, let’s jump back into SRM and check out the Array Managers
Back in SRM, click on Array Managers and then onto Production – StoreVirtual and finally click on Array Pairs and you see, an awesome amount of nothing. Err Craig what’s going on, I thought I was meant to see Volumes being replicated?
Never fear, hit the Refresh button and click Yes to the Discover Array Pairs operation
Now we should see the Remote Array which is in this case is SSDMG01. Click Enable
You might have noticed that we you clicked on Enable, it kicked off a load of tasks. Essentially, SRM is discovering replicated volumes. Let’s click on Devices and we should now see PR_SATA_TEST01 and PR_SATA_TEST02 being replicated.
Boom, we are cooking on gas now!
TOP TIP: You need to refresh Array Manager devices manually every time you introduce a replicated Volume
Protection Groups are based on Volumes being replicated. SRM will automatically look into the Volume and establish which virtual machines are being replicated. The way I think about it is that all a Protection Group really is, is a replicated Volume.
So we can configure two Protection Groups as we have two replicated Volumes, that should hopefully make sense.
Click on Protection Groups from the left hand menu and then on Create Protection Group
Choose your Protected site, in this case Production (Local) and click Next
Select the Datastore Group which in this case is PR_SATA_TEST01 and you will notice that VMF-TEST01 has automatically been added as a protected VM.
Give the Protection Group a name and description. Using my creativity I have opted for PG_SATA_TEST01
Click Next and then finally finish.
As always, we now need to repeat the process for PR_SATA_TEST02. Once done, you will have two Protection Groups like this.
How do we know that what we have done is rock solid? Well if we go onto VMF-ADMIN02 which is our vCenter in DR, we should see VMF-TEST01 and VMF-TEST02 protected by superman, err I mean SRM.
That’s it for this post, in the next one, we are going to get involved with some Recovery Plans!