How To: Perform a SRM Unplanned Failover & Maintain ‘Business As Usual’ Operations

SRM LogicalPurpose

The purpose of this blog post is to provide the steps required to perform a Site Recovery Manager unplanned failover and maintain business as usual operations.  I performed these steps twice on a clients live production environment with users accessing production virtual machines at the ‘source’ site.  The users noticed no impact to their daily work activities.

Pre-Requisites

The pre-requisites listed below had been discussed with the client and change control invoked for the following items:

  • vCenter and Site Recovery Manager would not be accessible during the unplanned failover
  • vSphere Client 5.5 U2 is used to enable editing of virtual machines with hardware level 10
  • Source vCenter and Site Recovery Manager ‘pinned’ to an ESXi Host using DRS Groups Manager ‘should’ rules to enable easy location of virtual machines
  • Replication stopped for the production remote copy virtual volumes for the duration of the test
  • Test virtual volume created and presented to ESXi Hosts using an existing Host Set
  • Test virtual machine created using Mike Brown’s Tiny VM to minimise inter site link bandwith consumption.  Note this doesn’t have VMware Tools installed.
  • Remote Copy IP and Management Interfaces for 3PAR StoreServ had been located on upstream switch

Steps One – Isolate Storage

Isolation of the 3PAR StoreServ at the ‘source’ site by issuing ‘shutdown’ command on the Management and Remote copy IP interfaces on the upstream switch.

If RCIP traffic and Management traffic are on the same subnet, RCIP traffic will traverse Management interfaces

Verify that you can no longer ping the RCIP interfaces and that your Remote Copy Group are in a ‘Stopped’ status.

Step Two – vCenter & SRM

Connect to the ESXi Host that runs the vCenter and Site Recovery Manager virtual machines and manually disconnect their virtual NIC’s

Result

Using the above process, we have isolated the 3PAR StoreServ, vCenter and Site Recovery Manager virtual machines.  This simulates having an inter site link failure, but enables users to continue to access virtual machines at the source site.

Perform your unplanned failover on the Test Virtual Volume and then issue the ‘no shutdown’ command against your 3PAR StoreServ Remote Copy and Management interfaces.  Then finally reconnect the virtual NICs on your vCenter and Site Recovery Manager virtual machines.

2 thoughts on “How To: Perform a SRM Unplanned Failover & Maintain ‘Business As Usual’ Operations

    1. A test scenario is completely different, you aren’t simulating the total loss of your production SAN, vCenter, SRM and test virtual machines in that scenario.

      You are only testing your arrays capabilities of taking a snapshot of your DR virtual volumes and promote these to read write and power on your virtual machines.

      Not really an end to end scenario. The use cases are as follows:

      1. To provide sign off that SRM meets the original design criteria of test, planned and unplanned failover.
      2. To meet business requirements of an unplanned failover once per annum. I have worked with some companies where a test failover is not deemed to be sufficient.

Leave a Reply