Gotcha: vSphere Metro Storage Cluster (VMSC) & HP StoreVirtual

So you have put together an epic vSphere Metro Storage Cluster using your HP StoreVirtual SAN (formerly Lefthand) using the following rules:

  • Creating volumes for each site to access it’s datastore locally rather than going across the inter site link
  • Creating DRS ‘host should’ rules so that VM run on the ESXi Hosts local to the volumes and datastores they are accessing.

The gotcha occurs when you have a either a StoreVirtual Node failure or a StoreVirtual Node is rebooted for maintenance, let me explain why.

In this example we have a Management Group called SSDMG01 which contains:

  • SSDVSA01 which is in Site 1
  • SSDVSA02 which is in Site 2
  • SSDFOM which is in a Site 3

We have a single volume called SSDVOL01 which is located at Site 1

StoreVirtual uses a ‘Virtual IP’ Address to ensure fault tolerance for iSCSI access, you can view this under your Cluster then iSCSI within the Centralized Management Console.  In my case it’s 10.37.10.2

Even though iSCSI connections are made via the Virtual IP Address, each Volume goes via a ‘Gateway Connection’ which is essentially just one of the StoreVirtual Nodes.  To check which gateway your ESXi Hosts are using to access the volumes, select your volume and then choose iSCSI Sessions.

In my case the ESXi Hosts are using SSDVSA01 to access the volume SSDVOL01 which is correct as they are at Site 1.

Let’s quickly introduce a secondary a second Volume called SSDVOL02 and we want this to be in Site 1 as well.  Let’s take a look at the iSCSI sessions for SSDVOL02

Crap, they are going via SSDVSA02 which is at the other site, causing latency issues.  Can I do anything about this in the CMC? Not that I can find.

HP StoreVirtual is actually very clever, what it has done is load balance the iSCSI connections for the volumes across both nodes in case of a node failure.  In this case SSDVOL01 via SSDVSA01 and SSDVOL02 via SSDVSA02.  If you have ever experienced a StoreVirtual node failure you know that it takes around 5 seconds for the iSCSI sessions to be remapped, leaving your VM’s without access to there HDD for this time.

What can you do about this? Well when creating your volumes make sure you do them in the order for site affinity to the ESXi Hosts, we know that the HP StoreVirtual just round robins the Gateway Connection.

That’s all very well and good, what happens when I have a site failure, let’s go over this now.  I’m going to pull the power from SSDVSA01 which is the Gateway Connection for SSDVOL01.  It actually has a number of VM’s running on it.

Man down! As you can see we have a critical event against SSDVSA01 and the volume SSDVOL01 status is ‘data protection degraded.

Let’s take a quick look at the iSCSI sessions for SSDVOL01, they should be using the Gateway Connection SSDVSA02

Yep all good, it’s what we expected.  Now let’s power SSDVSA01 back up again and see what happens.  You will notice that the HP StoreVirtual re syncs the volume between the Nodes and then it’s shown as Status: Normal.

Here’s the gotcha, the iSCSI sessions will continue to use SSDVSA02 in Site 2 even though SSDVSA01 is back online at Site 1.

After around five minutes StoreVirtual will automatically rebalance the iSCSI Gateway Connections.  Great you say, ah but we have a gotcha.  As SSDVOL02 has now been online the longest, StoreVirtual will use SSDVSA01 as the gateway connection meaning we are going across the intersite link.  So to surmise our current situation:

  • SSDVOL01 using Site2 SSDVSA01 as it’s Gateway Connection
  • SSDVOL02 using Site1 SSDVSA02 as it’s Gateway Connection

Not really the position we want to be in!

Rebalance 2Rebalance

We can get down and dirty using the CLIQ to manually rebalance the SSDVOL01 onto SSDVSA01 perhaps? Let’s give it a whirl shall we.

Login to your VIP address using SSH but with the Port 16022 and enter your credentials.

Then we need to run the command ‘rebalanceVIP volumeName=SSDVOL01’

Rebalance 3

If your quick and flick over to the CMC you will see the Gateway Connection status as ‘failed’ this is correct don’t panic.

Rebalance 4

Do we have SSDVOL01 using SSDVSA01? Nah!

Rebalance 2

The only way to resolve this is to either Storage vMotion your VM’s onto a volume with enough capacity at the correct site or reboot your StoreVirtual Node in Site 2.

In summary, even though HP StoreVirtual uses a Virtual IP Address this is tied to a Gateway Connection via a StoreVirtual Node, you are unable to change the iSCSI connections manually without rebooting the StoreVirtual Nodes.

Hopefully, HP might fix this with the release of LeftHand OS10.1

Is VMware Site Recovery Manager Really Worth It?

Following on from yesterdays post ‘10 Questions With Craig Kilborn‘ VMware have posted my first article on the Bloggers Bench

It’s not a ‘true’ technical article, more along the lines of why use technology to met your business objects.

From the Bloggers Bench: Is VMware Site Recovery Manager Really Worth It?

Let’s start off with a cheery fact ‘the U.S. Department of Labor estimates over 40% of businesses never reopen following a disaster. Of the remaining companies, at least 25% will close within 2 years. Over 60% of businesses confronted by a major disaster close by two years, according to the Association of Records Managers and Administrators (information source).

A question I’m asked a lot is do I really need DR? Well reading the above statement, I hope the answer is yes, but in all reality the actual answer is, it depends.  OK that is probably the most ‘woolly’ thing anyone in IT can say, we like hard and fast, black and white rules as engineers dammit!

For example, you may work for a company that has no on premise IT, you use a cloud based platform for your accounts, CRM and HR packages and you use hosted Exchange, SharePoint and Lync as your communication pieces, would you need DR, well the answer is probably not.

What about if you work for a company with a vSphere environment which can cater for two host failures and has redundancy on every level.  This is then housed in a Tier 5 Datacenter offering 99.999% uptime, with the usual battery backed generators, diverse internet links, fire suppression systems and environmental monitoring.  Connectivity is provided by diverse links to the datacentre, would you need DR then? Possibly as it depends on how the company views risk, if I was a betting man, I would say in most scenarios DR wouldn’t be necessary.

Read the rest of the article here

vCenter Server Appliance (VSA) 5.1 – Error: Invalid Active Directory Name/Enabling Active Directory Failed

Today, I decided to change my vCenter from being Windows based to the vCenter Server Appliance (VSA) 5.1.0a and when I tried to enter the LDAP details for Active Directory Authentication, I received various error messages:

  • Error: Invalid Active Directory domain
  • Error: Enabling Active Directory failed
  • Error: Invalid SRV Records

The first thing to always check is your DNS settings to make sure you have forward and reverse look up records set up correctly, check these are all OK.

Next, I did a basic ping test to VMF-VSA01 which is the name of my VSA, again all working.

Ah, I thought, perhaps I have entered in something wrong on the VSA network settings, so I double checked these, again all looked good.

Then I remembered, that I should be using FQDN’s (Fully Qualified Domain Names) for my VSA, so rather than using VMF-VSA01 I should use VMF-VSA01.vmfocus.local

Another try at authenticating, and it still failed with ‘Error: Invalid Active Directory domain’.

One more try, this time I changed the domain to vmfocus.local and boom, we have success!

So to summarise:

  • Make sure you use a FQDN for your vCenter Server Appliance
  • Make sure you have forward and reverse look up record for your vCenter Server Appliance
  • Make sure your domain is entered as a FQDN

vCenter 5.1 Upgrade

I have been meaning to perform a vCenter 5.1 upgrade for some time now.  The good news is, I have a few space minutes to get the vmFocus lab upgraded.

First of all, you need to decide on how you are going to upgrade, are you going to perform:

In Place Upgrade this is where you install straight over the existing vCenter, this is supported for 64 bit systems on vCenter 4.0 and 5.0

New Install   this is where you install a new vCenter 5.1 server and then add you hosts to it.

I’m going to go for a new install, as my existing vCenter 5.0 server has taken some battering, with SRM being added on and taken off numerous times.

vCenter 5.1 has much higher resource requirements, so it might be worth a quick flirt past the Upgrading to vCenter Server 5.1 Best Practices KB to make sure your environment is up to scratch.

One thing that is worth mentioned is your DNS entries, I suggest you make sure these are spot on.  In my environment I have a Windows Server 2008 Active Directory Integrated Forward Lookup and Reverse Lookup Zone for vmfocus.local

I have DNS records for the following entries, both forward and reverse:

  • ESXi01 192.168.37.1
  • ESXi02 192.168.37.2
  • ESXi03 192.168.37.3
  • VMF-APP01 192.168.37.205

You probably guessed that ESXi01, ESXi02 and ESXi03 are all vSphere Hosts and VMF-APP01 is a Windows 2008 R2 Standard Server.  Before this upgrade all of the vSphere Hosts are attached to another vCenter called VMF-ADMIN01.

What I really like about vCenter is you can install another instance and then just attach the hosts to the new vCenter.  You do loose historical performance data, but if you have a baseline already, that’s not such a big issue.

OK then let’s crack on.

TOP TIP: Don’t forget to install Adobe Flash Player

Installation

Fire up the installation media, if you haven’t downloaded it already, it can be obtained from here

Select VMware vCenter Simple Install and then Click Install

The installation will install vCenter Single Sign On first, so click Next to this

I’m not going to insult your intelligence, Hit Next again, and Accept the terms of the license

This is where things start to get interesting, we need to give a password to the account admin@System-Domain which is used to administer the Single Sign On service.

In this instance, I’m going to opt for a Microsoft SQL Server 2008 R2 Express installation

Cool, something new! The vCenter 5.1 installation is going to create two users RSA_DBA and RSA_USER in the SQL database, pop a password in that complies with your policies.

This part is proper important, make sure that you verify your FQDN of your vCenter Server and give it a ping for good measure.

For Security reasons, I always specify an account for vCenter services to run under, you don’t have to do this, but if you want to tick the ‘best practices’ box it’s best too.

We can now change the default install path, I recommend you don’t change this, unless you have a compelling reason to do so.

We can also change the port used for the Single Sign, I’m happy with the defaults on mine.

Not sure why, but it does seem like an age since we began the installation.  Finally, we can click Install.

Probably a good idea to make yourself a tea or coffee as this is going to take a while.

Once Single Sign On has installed, you will see the Inventory Service, install and then finally vCenter itself.

We need to perform a little bit of interaction with the vCenter install, the first question is a License Key, if you don’t have one, click Next and you can use the free trial version.

We now get the choice of using Microsoft SQL Server 2008 Express or another database.  I’m going to roll with SQL Server 2008 Express (partly because I’m cheap)

Again, we have another question on the System Account, I’m going to use my VMware.Service account for this

Ports, we can change the default ports used by all of vCenter’s services.  I’m going to leave mine at default.

Time to select your deployment size, unless you have a super lab, then I’m sure you and I will be OK with Small

Then finally, click on Install.  Don’t be alarmed if after you click Install, the installer package disappears for a few seconds, this is quite normal (yes it did freak me out).

Boom, job done!

Web Client

Probably be a good idea to install the web client as well, so from our vCenter Installer, select VMware vSphere Client and hit Install

Choose your language, (still no United Kingdom version for English)

Don’t be alarmed everything will disappear for a while.  Once the install is back click on Next

Hit Next, and then agree to the terms of the license and Hit Next again.

You can change the default install folder if you like, however as always I recommend leaving it as default unless you have a valid reason not too.

We can not change the vSphere Web Client Ports, I’m going to leave mine at the default HTTP 9090 and HTTPS 9443

This is where thing start to get interesting, we need to specify the vCenter Single Sign On administrator password which we entered during the Single Sign On installation.

Hopefully, you should now be at the Install screen, hit Install

Happy days, we are all done, well nearly!

Cool, we can now launch either the vSphere Web Client from Start > Program Files or we can browse to https:\vcentername:9443

At the login screen, we want to ‘Download the Client Integration Plug-in’

Run the file VMware-ClientIntegrationPlugin-5.1.0.exe

At this point, you will need to close your web browser otherwise you can’t install the plug in!

Cool, click on Next and let the magic happen

All installed click on Finish

Let’s give it a whirl, browse to https://localhost:9443 and place a tick in ‘Use Windows Session Authentication’

You should get an Client Integration Access Control which is confirming you are allowed access, click Allow

Voila we are in! Now time to familiarise myself with the new GUI