vExpert – New Awardees, Free Stuff & How To Obtain The Title

vmw_logo_vmware-expertThe vExpert 2015 Announcement has been made by VMware and I’m pleased be to say that I have been nominated for the third year.

A big thank you to @vCommunityGuy for all his hard work and efforts compiling the list and vetting the people who apply.

Why Do It?

Let’s be honest blogging takes time and effort especially on a consistent basis with relevant things that haven’t already been covered hundreds of times.  So why do we put ourselves through the ‘mill’ to create content?  The answer is to give back.  How many times have you been grateful to that ‘unsung hero’ who created an article, that resolved the problem you were having.

Take a moment and say thank you, bloggers like and appreciate comments/feedback

The main part is giving back but I use VMFocus.com as a repository for my general day to day work, I have forgotten how many times I refer back to my own articles to point me in the right direction on a topic or on an installation.

What Did I Do To Obtain vExpert 2015?

When I wasn’t a vExpert I always wanted to know what was required to become one, I’m sure some people awarded the title do more and some less.

On top of this I ran the VCDX EMEA Study Group for people who defended in July 2014 and contribute on a regular basis to Twitter about all things VMware.

Free Stuff

Strange as it may sound you don’t really get anything from VMware apart from the title (which to me is more than enough).  You get access to some vSphere NFR’s put most of us have these via work anyway.  The interesting items come from vendors who recognise the vExpert contribution to the community and as such want to reward it.

A number of blogs have lists of things you can register for, the ones I normally refer to are:

Newly Minted vExpert

To add yourself to the vExpert Directory, first of all login to VMTN and make sure that your handle has the vE symbol beside it

VMTN

Then go back to the vExpert Directory and click on Create new vExpert Entry

VMTN 2

Fill out your details and if you want a vExpert Appreciation Gift, give the box a tick!

SRM: Reprotect Unsupported

When VMware Site Recovery Manager 5.0 was launched back September 2011 a new feature set was added to give you the ability to perform ‘automated re-protection’ and ‘automated failback’ using array based replication.

The release notes for Site Recovery Manager 5.0 describe this feature set in more detail.

  • Automated Re-Protection.
    • Re-protection is a new extension to recovery plans for use only with array-based replication. Automated re-protect enables the environment at the recovery site to establish replication and protection of the environment back to the original protected site through a single click.
  • Automated Failback
    • Automated failback returns the entire environment to the originally protected primary site. This can only happen after re-protection has ensured that data replication and synchronization have been established to the original primary site. Failback will run the same workflow that was used to migrate the environment to the protected site, ensuring that the critical systems encapsulated by the recovery plan are returned to their original environment. Automated failback, like re-protection, is only available for use with array-based replication protected virtual machines.

SRM Conceptual Diagram v0.1

Background

Since the release of SRM 5.0 I have performed a number of production installations using ‘array based replication’.  As part of the verification of the platform, the clients has requested the following functional tests be performed with ‘test virtual machines’

  1. Test Failover
    • Provide documented evidence that in a planned or unplanned event that the business should be able to recover within defined SLA’s.
  2. Planned Failover and Failback
    • Verify that an upcoming known event such as office refurbishment or other maintenance work a planned failover to the disaster recovery site and planned failback to the original protected site will work within SLA’s.
  3. Unplanned Failover and Failback
    • Verify that an unknown event such as a power outage or WAN failure that an unplanned failover to the disaster recovery site and a planned failback to the original protected site (once service had been restored) could be achieved within SLA’s.

All of the these tests have past with a number of minor issues which are resolved along the way.  That’s the point of the test right!

Reprotect Warning

During a recent installation of SRM using HP 3PAR StoreServ 7200 ‘a synchronous’ protection across two remote copy groups.  The first and second test passed without issue.  It was when we performed the ‘unplanned failover and failback’ that the issue arose.

Unplanned Failover Process

The first step is to sever the intersite link between protected and unprotected site.  Once complete you perform a Disaster Recovery Failover in SRM at the Recovery Site.  This leaves the following tasks unresolved which is shown in the screenshot below.

  • Pre-Synch Storage
    • Replicate recent changes
  • Shutdown VM’s at Protected Site
    • Ensure virtual machine data is consistent
  • Prepare Protected VMs for Migration
    • Create a final snapshot of the volume on which the protected VM’s reside
  • Synchronize Storage
    • Perform a final storage synchronisation to cover all changes

DC02 When DC01 Back Online

When you bring the original protected site back on line a ‘Recovery’ is required which performs the operations above which could not be completed.  In the screenshot below this has been completed successfully.

DC01 Recovery Performed AKA Planned Migration

This is the point now which a ‘Reprotect’ can be performed so that the original Protected site becomes the Recovery site.  At this moment we started to experience issues with the following failure notification:

Failed to reverse replication for failed devices.   Cause: A storage operation on unknown consistency group ‘PG01’

A call was logged with HP and VMware as the SRM logged showed that it was a storage provider fault and that the reverse replication command could not issued.

2015-01-27T11:01:50.894Z [01664 error ‘Recovery’ ctxID=69310807 opID=bbdef04] Plan execution (reprotect workflow) failed; plan id: recovery-plan-1234, plan name: RP01, error: (dr.storageProvider.fault.StorageReverseReplicationFailed) {

–>    dynamicType = <unset>,

–>    faultCause = (dr.storage.fault.UnknownDeviceGroup) {

–>       dynamicType = <unset>,

–>       faultCause = (vmodl.MethodFault) null,

–>       id = “RP01”,

–>       msg = “”,

–>    },

–>    msg = “”,

–> }

This is when things got interesting and in my opinion VMware decided to hide behind some rather ambiguous text.

Ambiguous Text

The text below is taken from the VMware Site Recovery Manager 5.8 Documentation Center

‘If you performed a disaster recovery operation, you must perform a planned migration when both sites are running again. If errors occur during the attempted planned migration, you must resolve the errors and rerun the planned migration until it succeeds’

How do you perform a planned migration if you have performed a disaster recovery option? There is no option for this only ‘Recovery’ what do they actually mean?  Well the next paragraph states the following:

Reprotect is not available under certain conditions:

  • Recovery plans cannot finish without errors. For reprotect to be available, all steps of the recovery plan must finish successfully.
  • You cannot restore the original site, for example if a physical catastrophe destroys the original site. To unpair and recreate the pairing of protected and recovery sites, both sites must be available. If you cannot restore the original protected site, you must reinstall Site Recovery Manager on the protected and recovery sites.

So in our case all steps of the ‘Recovery’ operation had finished and we expected to be able to failback, considering that the same documentation under Reprotect Virtual Machines After a Recovery states:

‘After a recovery, the recovery site becomes the new protected site, but it is not protected yet. If the original protected site is operational, you can reverse the direction of protection to use the original protected site as a new recovery site to protect the new protected site.

Manually reestablishing protection in the opposite direction by recreating all protection groups and recovery plans is time consuming and prone to errors. Site Recovery Manager provides the reprotect function, which is an automated way to reverse protection.’

VMware Support Statement

After numerous backward and forward exchanges.  VMware’s answer was that in the event of an unplanned failover to perform a supported reprotect you must meet the following conditions:

  • Delete your Recovery Plans
  • Delete your Protection Groups
  • Manually reverse replication on your storage
  • Re-create your Protection Groups
  • Re-create your Recovery Plans

Really VMware?

Final Thoughts

SRM is mature intelligent product that understands when a Disaster Recovery failover has been performed.

  • Why then do we have the options for ‘Recovery’ and ‘Reprotect’ if these are not supported in this scenario?
  • Why does SRM documentation not clearly state what is and isn’t supported?
  • Why is SRM not able to cope with this scenario?  Surely it should be supported.

This was new to me and my use cases for SRM have now reduced.  One of the key components of the product is to remove manual administration to mitigate risk of human errors.

The positives are that with this new found knowledge I will be looking at alternative products as such Zerto to meet customer requirements.

vROPs Foundation – The Case of the Missing Edition

It appears that in the newest release of vRealize Operations Manager the Foundation version has been discontinued.

Screenshot taken from VMware United Kingdom vRealize Operations Manager edition comparison.

vROPs

What Does This Mean?

My experience with vRealize Operations Manager predecessor vCenter Operations Manager was that it required data collection for at least 60 days for you to leverage anything meaningful from it.

This meant you could run the foundation version initially which would collect all the relevant performance data required and then use your free trial key after 60 days to open up the features required.

This approach was great for PoC’s or pilots as it didn’t require any initial investment from the business.  My concern is that only being able to leverage the product for 60 days, customers might not 100% believe the information that vROPs is reporting.

Screenshot taken from VMware United States vRealize Operations Manager evaluation center

vROPs 60 Day

I have raised this internally with VMware in the United Kingdom to see if a ‘Foundation’ version of vROPs is in the pipeline.

Credit goes to Neil Gardner all round top bloke and one of my colleagues who brought this to my attention.

How To: Map HP StoreVirtual Volumes to Datastores

Problem Statement

You have created numerous datastores on your HP StoreVirtual of the same size and presented these to your ESXi Hosts.  However, you have since forgotten how the datastores map back to the volumes.

When you check the Runtime Name of your devices (Storage > Devices) to find out the LUN number, you see that each LUN has is ‘0’ as per the screenshot below.

LUN 0

This can be confirmed in HP StoreVirtual Centralised Management Console under Servers > Select Server > Volumes & Snapshots

LUN 0 HP SV

Not very helpful at all!

Resolution

Each datastores has a unique iSCSI Target string which can be used to identify how they are mapped to volumes.

To find out what they are select the Datastore > Properties > Manage Paths

Device Properties

At the bottom we can see the Target, this shows tells us the following details:

  • DC02-MG01
    • Denotes the Management Group the volume is in
  • 39 is the hexadecimal representation of 27 which is the VMware NAA (thanks to Jonathan Reid for this information)
    • Denotes the unique target identifier for the volume
  • DC01-DR01SRM
    • Denotes the volume name on the HP StoreVitual

Target Name

So we now know this datastore corresponds to the volume called DC01-DR01SRM in Management Group DC02-MG01.

VCDX Defense Schedule 2015

A quick post to mention that the VCDX  defense schedule has been released for 2015.

Defenses will be held simultaneously at Palo Alto (USA), Frimley (UK) and Singapore (Asia)

For more details of how to register for the defense or to see if more dates become available, I suggest you book mark this VMware Community page and follow Karl Childs @karlchilds on Twitter.