Part 3 – Automating HP StoreVirtual VSA Failover

In part two we installed and configured HP StoreVirtual VSA on vSphere 5.1 in this blog post we are going to look at automating failover.

I think a quick recap is in order.  If you remember we received a warning when adding SATAVSA01 and SATAVSA02 to the Management Group SATAMG01.  Which was:

‘to continue without installing a FOM, select the checkbox below acknowledging that a FOM is required to provide the highest level of data availability for a 2 storage system management group configuration. Then click next’.

This error message is about quorum, a term that I’m sure alot of you are familiar with when working with Windows clusters.  Each VSA run’s whats known as a ‘manager’ which is really a vote.  When we have two VSA’s we have two votes, which is a tie.  Let’s say that one VSA has an issue and goes down, how does the the remaining VSA know that? Well it doesn’t.  It could be that both VSA’s are up and they have lost’s the network between them.  This then result’s in split brain scenario.

This is where the Failover Manager comes into play.  So what exactly is a Failover Manager? Well it’s specialized version of the SAN/iQ software which runs under ESXi, VMware Player or the elephant in the room (Hyper V).  It’s purpose in life is to be a ‘manager’ and maintain quorum by introducing a third vote ensuring access to volumes in the event of a StoreVirtual VSA failure.  The Failover Manager is downloaded as an OVF and the good news is we already have a copy which we have extracted.

A few things to note about the Failover Manager.

  • Do not install the Failover Manager on a StoreVirtual VSA you want to protect,as if you have a failure the Failover Manager will loose connection.
  • Ideally it should be installed at a third physical site.
  • Bandwidth requirements to the Failover Manager should be 100 Mb/s
  • Round trip time to the Failover Manager should be no more than 50ms

In this environment we will be installing the Failover Manager on the local storage of ESXi02 and placing it into a third logical subnet.  I think a diagram and a reminder of the subnets are in order.

Right then, let’s crack on shall we.

Installing Failover Manager

We are going to deploy SATAFOM onto ESXi02 local hard drive which is called ESXi02HDD (I should get an award for my naming conventions).

The Failover Manager or FOM from now on, is an OVF so we need to deploy it from vSphere Client.  To do this click File > Deploy OVF Template.

Browse to the location of your extracted HP StoreVirtual VSA files ending in FOM_OVF_9.5.00.1215FOM.ovf

Click Next on the OVF Template Details screen and Accept the EULA followed by Next.  Give the OVF a Name in this case SATAFOM and click Next.  When you get to the storage section you need to select the local storage on a ESXi Host which is NOT running your StoreVirtual VSA.  In this case it is ESXi02HDD

Click next and select your Network Mapping and click Finish.

TOP TIP, don’t worry if you cannot select the correct network mapping during deployment. Edit the VM settings and change it manually before powering it on.

If all is going well you should see a ‘Deploying SATAFOM′ pop up box.

Whilst the FOM is deploying let’s talk networking for a minute.

On ESXi02, I have a subnet called FOM which is on VLAN 40.  We are going to pop the vNIC;s of SATAFOM into this.  The HP v1910 24G is the layer three default gateway between all the subnets and is configured with VLAN Access Lists to allow the traffic to pass (I will do a VLAN Access List blog in the future!)

Awesome let’s power the badboy on.

We need to use use the same procedure we used to set the IP address’s on the FOM as we did on the VSA.  Hopefully you should be cool with this, but if you need a helping hand refer back to How To Install & Configure HP StoreVirtual VSA On vSphere 5.1

The IP address’s I’m using are:

  • eth0 – 10.37.40.1
  • eth1 – 10.37.40.2

Failover Manager Configfuration

Time to fire up the HP Centralized Management Console (CMC) and add the IP Address into  Find Systems.

Log into view SATAFOM and it should appear as follows.

Let’s Rich Click SATAFOM and ‘Add to an Existing Management Group’ SATAMG01

Crap, Craig that didn’t work, I got a popup about a Virtual Manager. What’s that all about?

Nows a good time as any to talk about two other ways to failover the StoreVirtual VSA.

Virtual Manager this is automatically added to a Management Group that contains an even number of StoreVirtual VSA’s.  If in the event you have a VSA failure you can start the Virtual Manager manually on the VSA which is working.  Does it work? Yes like a treat but you will have downtime until the Virtual Manager is started and you nerd to also stop it manually when the failed VSA is returned to action.  Would I use it? If you know your networking ‘onions’ you should be able configure the FOM in a third logical site to avoid this scenario.

Primary Site in a two manager configuration you can designate one manager (StoreVirtual VSA) as the Primary Site.  So if the secondary VSA goes offline you maintain quorum.  The question is why would you do this? Honestly I don’t know, because unless you have some proper ninja skills, how do you know which VSA is going to fail? Also you need to manually recover quorum, which isn’t for the feint heated.  My recommendation, simples, avoid.

OK back on topic.  We need to remove the Virtual Manager from SATAMG01, which is straight forward.  Right Click > Delete Virtual Manager.

Let’s try adding the SATAFOM back into Management Group SATAMG01.  Voila it works!  You might get a registration is required notice, we can ignore that as I’m assuming you have licensed your StoreVirtual VSA.

(I know I have some emails, they are to do with feature registration and Email settings)

Let’s Try & Break It!

Throughout this configuration we have used the following logic:

  • SATAHDD01 runs SATAVSA01
  • SATAHDD02 runs SATAVSA01
  • SATAVSA01 and SATAVSA02 are in Management Group SATAMG01
  • SATAVSA01 and SATAVSA02 have a volumes called SATAVOL01 and SATAVOL02 in Network RAID 10

In my lab I have a VM called VMF-DC01 which you guessed it is my Domain Controller, it resides on SATAVOL02.

Power Off SATAVSA01

We are going to power off SATAVSA01 which will mimic it completely failing, no shutdown guest for us!  Fingers crossed we should still maintain access to VMF-DC01.

Crap we lost connection for about 10 seconds to VMF-DC01 and then it returned whys that Craig you ask?

Well if you remember all the connections go to a Virtual IP Address in this case 10.37.10.1 This is just mask as even though the connections hit the VIP, they are directed to one of the StoreVirtual VSA, in this case SATAVSA01.

So when we powered off SATAVSA01 all the iSCSI connections had to be ceased and then represented back via the VIP to SATAVSA02.

Power Off SATAVSA02

To prove this, let’s power on SATAVSA01 and wait for quorum to be recovered.  OK let’s power off SATAVSA02 this time and see what happens.

I was browsing through folders and received a momentary pause of about one second which to be fair on a home lab environment is pretty fantastic.

So what have we learned? We can have Network RAID  1 with Hardware RAID 0 and make our infrastructure fully resilient.  To sum up, I refer back to my opening statement which was the HP StoreVirtual VSA is sheer awesomeness!

18 thoughts on “Part 3 – Automating HP StoreVirtual VSA Failover

  1. Dear Craig,

    I am following your topic step by step, but ican’t find out where and how did you create SATAVOL02.and how did you make a RAID10 between it and SATAVOL01.

    Thank You

    1. Each hard drive had a dedicated HP StoreVirtual managing it, so in total I have 4 x HP StoreVirtuals.

      When you create a Cluster, you select the HP StoreVirtuals which form the cluster e.g. SATA 1 and SATA 2. When a volume is created you choose the data protection level either RAID 0 or Network RAID 10.

  2. Craig,
    I am trying to set up a test environment
    I will ultimately have about 10 sites
    can I use the same FOM running on the main office VMware to act as FOM for every site or will I need a different FOM for each site?

    Thanks
    LB

    1. Hi Lawrence, thanks for reading.

      The FOM is only required when you have an even number of votes for the StoreVirtual to maintain quorum. For example if you have 4 x StoreVirtual in each site, then 3 of them can run a ‘manager’.

      A FOM can only participate in one Management Group and a cluster can only span three sites. Not sure if you have seen the HP P4000 Multi Site HA & DR Solution Guide, but I highly recommend reading it http://h10032.www1.hp.com/ctg/Manual/c03041871.pdf

      Hope that helps and good luck with your design!

  3. Hi Craig,
    Now I must test VSA solution and today I found your howto, it’s great articles 🙂 In my configuration I have:
    2 physical servers with vmware 5.5 and one vsa per servers, so I need FOM or not?

    1. Hi Albert, thanks for reading.

      If you are running the StoreVirtual in a seperate clusters then no you don’t need a FOM.

  4. Is there any change to bring up manager by force? – got split brain scenario – Lost FDM completly – one Storage system is with Manager on 2nd off for maintenace – no way to bring this up because no quorum, lost 3TB of data?

    1. Sorry to hear about the situation. You could try adding a virtual manager to one of the nodes, however I can’t guarantee that would work and if you would have any data loss.

      HP are able to assist in this situations, I would recommend logging a support call.

      Hope it works out

  5. Hi Craig,

    Great article! I’m trying to understand the function of the failover manager. As far as I understand, the failover manager provides for automated failover for a 2 node cluster, whereas the virtual manager provides for manual failover for a 2 node cluster. Is this correct? If a 3 node cluster is configured, is a failover manager still required? If so, would the failover manager purposefully not be involved since its manager adds another vote, which would result in 4 total votes. Generally trying to understand how the failover manager works in various failure cases with 2 and 3 node clusters.

    Thanks!

  6. Not sure if you’re still replying to comments on this thread Craig, but I have a question…

    I have inherited an environment comprising of three ESXi hosts, with a volume provisioned from two VSA’s on two of those hosts and a FOM on the third.

    Each ESXi host has two VMkernel ports (so two IP addresses) connected to the iSCSI LAN.

    Everything is up and running, however when I view the Manage Paths dialogue for the datastore in VMware, I am seeing only 2 connections to the VSA Volume, both referencing the IP address of the same VSA node. Firstly, should I see the VIP here and not the specific IP address of the active VSA node?

    I would have thought that I would see 4 paths (2 x IP addresses per host x 2 VSAs) in order for full multipathing redundancy to be operational.

    Is this the case? What do you see in your lab environment? If I was to shut down the VSA node that the ESXi hosts are currently connected to, how will ESXi know to connect to the other VSA node without a HBC rescan?

  7. Hi Graig
    Thanks for you article

    I have some quiestion about failovers

    The environment contains four esxi hosts
    VSA01 on Esxi1
    VSA02 on Esxi2
    2-Way mirror volume is formed by these appliance
    FOM1 on ESXi3
    ESXi4 with iscsi connected datastore

    Situation
    Esxi1 is off because of a power failure but Esxi2 still work.
    Datastore on Esxi4 available for VMs
    After 1 hours work Esxi2 is off by a power failure, but after 15 minute Esxi1 is power on and VSA1 is start up.

    Question 1: Whats happend? connection to the datastore is restored with not relevant data? with FOM up? With FOM off?

    Question 2: In previous situation, if connection to the datastore is restored, whats happend if after 1 hours of work, power on a ESXi2 and start up a VSA2 ?

    1. Hi Vadim,

      Not sure I completely understand the questions. My best attempt to answer is below:

      Q1. VSA01 is off. Assume VSA02 can speak to FOM, then it will handle all iSCSI requests. An hour later VSA02 is off. When VSA01 comes back, it only know about the data at the point it was originally powered off. Therefore you loose all data since you lost VSA01.

      Q2. Not 100% sure, but would imagine that VSA01 would be the primary and VSA02 the secondary.

  8. Craig,

    Great article, I wondered what the sticker “The first TB is on us” meant on my G9 servers.

    I noticed in the article you created SSD VSA’s, but never went into detail about what you were using them for.

    Can you provide some additional information.

    Thanks

    Bob

  9. OK so I have 2 serverrooms, and I have a StoreVirtual cluster. In which room should the dedicated server be hosting the failover manager? I want to withstand a full serverroom outage. It seems like that is impossible, both setups will not be able to recover from a random serverroom outage? When the failover manager is inside the crashing room, the other room will not take over. And we have no ninja skills to know which room is going to crash. So HP Storevirtual VSA redundancy is completely useless?

Leave a Reply to Craig Cancel reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s