Part 5 – Configuring Site Recovery Manager (SRM) With HP StoreVirtual VSA

This is the final post on my blog series Configuring Site Recovery Manager (SRM) with HP StoreVirtual VSA.

If you have missed any of the previous posts, they are available here:

Part 1 – Configuring Site Recovery Manager (SRM) With HP StoreVirtual VSA

Part 2 – Configuring Site Recovery Manager (SRM) With HP StoreVirtual VSA

Part 3 – Configuring Site Recovery Manager (SRM) With HP StoreVirtual VSA

Part 4 – Configuring Site Recovery Manager (SRM) With HP StoreVirtual VSA

As promised we are going to failover, reprotect and failback. Is it slightly wrong, that I’m excited about this blog post?

Pre Failover

As we are good boy/girl scouts, we wouldn’t just jump straight in and try and failover would we? No, never instead we are going to check everything is ‘tickety boo’ with our environment.  This means going over the following checklist:

  • Check CMC to ensure no degraded volumes
  • Check CMC to ensure that remote copy is working correctly
  • Check vCenter to ensure that you have connectivity between sites
  • Check SRM Array Managers and refersh your Devices
  • Check Protection Groups
  • Check Recovery Plan

Once you have gone over the above list, the last thing to do is test and clean up.

Look’s like we are cooking on gas.

Failover

We have two types of failover, planned and unplanned.

Planned Failover is when you know of impending works which will make your Production site non operable for a period of time, this could be planned  maintenance work or site relocation.  Imagine you are building a new Head Office, you configure all of your network, storage and vSphere infrastructure and then just use SRM to failover over a weekend.

Unplanned Failover this is when, you earn your ‘bacon’ as a vSphere Administrator, as you have a man down situation and no Production site left.

In this instance we are going to do a planned failover, as you can see VMF-TEST01 is running in our Production site.

VMF-TEST01 is in a good place, as it’s being replicated to our DR site

Let’s get it on, into SRM, then click on Recovery Plans, then onto Recovery Steps (so that we can see what’s going on) and then click on Recovery!

The Red Stop Sign cracks me up, it’s SRM’s way of saying are you really sure you want to do this? We are sure, so we want to put a tick in the ‘I understand that this process will permanently alter the virtual machines and infrastructure of both the protected and recovery datacenters.’

We are going to perform a ‘Planned Migration’ and then click Next

We are now at the point of no return, click Start

OK, what’s going on? Well the let’s have a closer look.

Step 1 SRM takes a snapshot of the replicated volume PR_SATA_TEST01 before it tries to failover, this is for safety.

Step 2 SRM shuts down the VM’s at Protected Site, in this case VMF-TEST01 to avoid any data loss

Step 3 SRM restores any hosts from standby at the DR Site

Step 4 SRM takes another snapshot and syncronizes the storage

Step 5 Epic Fail!

OK what happened? Well we have the error message ‘Error: Failed to promote replica devices. Failed to promote replica device ‘1266d2456f’ This means that for some reason SRM wasn’t able to promote the DR volume DR_SATA_TEST01 to Read/Write from Read. To be perfectly honest, I have tried many times to get this to work and for some reason it always fails on this step.  Strange really as when we before a test it takes a snapshot of the volume DR_SATA_TEST01 and promotes this to Read/Write without any issues. So in this situation we are going to need to give SRM a hand.

Go into the CMC and expand your Management Groups and Clusters until you get this view.

We are going to Right Click DR_SATA_TEST01 and Select Failover/Failback Volume

Click Next and then Select ‘to fail over the primary volume, PR_SATA_TEST01, to this remote volume, DR_SATA_TEST01 and click Next

Good news that we haven’t got any iSCSI sessions in place, so we can click Next

Double check your provisioning is correct, and then click Finish

Awesome, we should now have the volume DR_SATA_TEST01 acting as a Primary Read/Write Volume, you can tell this as it should be in dark blue

I think we should try the Recovery again now, let’s hop back into SRM and click on Recovery.

Select the ‘I understand that this process will permanently alter the virtual machines and infrastructure of both the protected and recovery datacenters.’ tick box again and click Next and Start.

Hopefully you should see that SRM jumps straight to Step 8, Change Recovery Site Storage to Writeable and this time it has been a Success!

Time for a quick brew, whilst SRM finishes off bringing VMF-TEST01 up at our DR site.

Boom, the man from Delmonte he say yes!

So let’s see what’s going on shall we.  First of all at our Production site.  As you can see SRM now knows that the VMF-TEST01 is not live.

At DR, VMF-TEST01 is up and running and it’s IP Address has been successfully changed.

The question is can we ping it by DNS, as this should have been updated.

Boom, all working as expected.

Last of all, let’s check CMC to see what’s going on with our HP StoreVirtual VSA.

Now you may be thinking, it’s not really the best situation to be in as we have two Primary Volumes which are PR_SATA_TEST01 and DR_SATA_TEST01.  But don’t fear SRM has changed PR_SATA_TEST01 to ‘read’ only access for ESXi02

Also, if we check the Datastores on ESXi02, we see that PR_SATA_TEST01 has disappeared.

Cool, I think we are now in a position to Reprotect.

Reprotect

Reprotection reverses the process, so that the DR site becomes the protected site and Production becomes the DR site, simples.

So let’s jump back into SRM and click Reprotect

Select ‘I understand that this operation cannot be undone.’ and click Next

Let’s click Start and watch the process in action.

OK, what’s going on then Craig?

Step 1 SRM realises it can’t have two Primary Volumes and demotes PR_SATA_TEST01 to a Remote Volume and then deletes it

Step 2 SRM takes a snapshot of DR_SATA_TEST01 and before it starts the reverse protection as a safety measure

Step 3 SRM takes a further snapshot and invokes the replication schedule

Step 4 SRM cleans up the storage to make sure everything is ‘tickety boo’

If everything was a success you should see that your Recovery Plan has gone back to normal.

From HP StoreVirtual VSA perspective everything looks good, DR is the Primary Volume and Production is the Remote Volume

Right then, I think we should think about failing back then.  Before we do so, we need to run over that checklist again.

  • Check CMC to ensure no degraded volumes
  • Check CMC to ensure that remote copy is working correctly
  • Check vCenter to ensure that you have connectivity between sites
  • Check SRM Array Managers and refersh your Devices
  • Check Protection Groups
  • Check Recovery Plan

Once you have gone over the above list, the last thing to do is test and clean up.

Good times, everything was a success, I think we are ready to failback.

Failback

Failback is actually just a Recovery as far as SRM is concerned.  So I won’t bother waffling on about it again, so let’s hit Recovery

I wanted to show you that this time round, SRM was able to promote the Remote Volume to Primary Read/Write without any issues.

Nice one, we have another success and VMF-TEST01 is running back at Production.

Let’s do the obligatory ping test via DNS, again success.

Quick look at our DR site and you can see SRM now sees VMF-TEST01 as being protected

Lastly, a look at CMC to check on our HP StoreVirtual VSA, as you can see we still have two Primary copies, but again DR_SATA_TEST01 is now read only

A couple of final thoughts for you.

  1. It’s quite normal to see a ‘ghost’ datastores at either your Production or DR site after you have failed over or back. Just perform a ‘Rescan’ and it will disappear
  2. Check your path policies for the Datastore, as these don’t always go back to your preferred choice.

Thank’s for reading what probably feels like war and peace to you on SRM, I hope you agree it’s an amazing product that makes our life as the IT administrator that much easier!

One thought on “Part 5 – Configuring Site Recovery Manager (SRM) With HP StoreVirtual VSA

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s