HP 3PAR Streaming Remote Copy Replication

The replication in 3PAR Arrays has always been mediocre.  In the older versions of 3PAR Inform OS if you choose ‘sync replication’ for a single remote copy group, you could not use ‘a sync’ for any other remote copy groups.

This feature was addressed in a newer version of 3PAR Inform OS, however your lowest RPO using ‘a sync’ was bottlenecked at 15 minutes regardless of available bandwidth.

With the release of the HP 3PAR 20000 Series, comes a new feature which is streaming ‘a sync’ replication.

What Is Streaming ‘A Sync’ Replication

Essentially, if you have the bandwidth and cache available the source 3PAR will stream replication across to the target 3PAR reducing your RPO below 15 minutes.  I like to think of it as a best endeavours.

Replication Modes

When designing a replication infrastructure it’s important to know the transport method as well as the thresholds in terms of bandwidth and latency between source and target arrays.  This ensures that you are not only within a supported SLA, but also to ensure that write performance of the source array is not effected.

The table below shows supported thresholds.

Replication Modes

Architecture

The source array uses a local cache to maintain host write transactions in memory.  A concept known as ‘delta sets’ are used.

Source 3PAR Array

  • I/Os are transferred from primary array to secondary array as part of a delta sets
  • I/Os on the primary array that belong to a particular remote copy group are grouped together into delta sets
    • A delta set is made up of sub-set of I/Os, where sub-set represents I/Os owned by a remote copy group on each given node

Target 3PAR Array

  • A delta set is applied on the secondary RC volume group only after:
    • The entire delta set has been received in the secondary array cache
    • And the previous sets that this delta set depends upon have completed.
  • A secondary RC volume group is always in a crash consistent state, before or after the application of a delta set. It is not crash consistent during the application of a delta set.
    • If the delta set fails to apply on the secondary volume then the group stops and a fail back to the last coordinated snapshot is required

Remote Copy Architectuer

What About Write Bursts?

A write burst is when the array receives a significant number of writes which could last for a few minutes.  If the inter site link between source and target array is sufficient this has no impact.

It is when the inter site link cannot cope or the write cache gets filled then the source 3PAR will choose a random remote copy group to stop and a snapshot is taken.

Note: You have no control over which remote copy group

Once stopped these groups will start again at the next sync period.

Final Thoughts

This is a great feature set being added to the 3PAR 20,000 Series.  I’m sure when the next .1 release update is received you will be able to select which remote copy groups you would want to stop either due to a write burst or cache overflow.

With most 3PAR updates, I expect the streaming ‘a sync’ replication to find its way into the 7×00 series within a short period of time.

How To: Perform a SRM Unplanned Failover & Maintain ‘Business As Usual’ Operations

SRM LogicalPurpose

The purpose of this blog post is to provide the steps required to perform a Site Recovery Manager unplanned failover and maintain business as usual operations.  I performed these steps twice on a clients live production environment with users accessing production virtual machines at the ‘source’ site.  The users noticed no impact to their daily work activities.

Pre-Requisites

The pre-requisites listed below had been discussed with the client and change control invoked for the following items:

  • vCenter and Site Recovery Manager would not be accessible during the unplanned failover
  • vSphere Client 5.5 U2 is used to enable editing of virtual machines with hardware level 10
  • Source vCenter and Site Recovery Manager ‘pinned’ to an ESXi Host using DRS Groups Manager ‘should’ rules to enable easy location of virtual machines
  • Replication stopped for the production remote copy virtual volumes for the duration of the test
  • Test virtual volume created and presented to ESXi Hosts using an existing Host Set
  • Test virtual machine created using Mike Brown’s Tiny VM to minimise inter site link bandwith consumption.  Note this doesn’t have VMware Tools installed.
  • Remote Copy IP and Management Interfaces for 3PAR StoreServ had been located on upstream switch

Steps One – Isolate Storage

Isolation of the 3PAR StoreServ at the ‘source’ site by issuing ‘shutdown’ command on the Management and Remote copy IP interfaces on the upstream switch.

If RCIP traffic and Management traffic are on the same subnet, RCIP traffic will traverse Management interfaces

Verify that you can no longer ping the RCIP interfaces and that your Remote Copy Group are in a ‘Stopped’ status.

Step Two – vCenter & SRM

Connect to the ESXi Host that runs the vCenter and Site Recovery Manager virtual machines and manually disconnect their virtual NIC’s

Result

Using the above process, we have isolated the 3PAR StoreServ, vCenter and Site Recovery Manager virtual machines.  This simulates having an inter site link failure, but enables users to continue to access virtual machines at the source site.

Perform your unplanned failover on the Test Virtual Volume and then issue the ‘no shutdown’ command against your 3PAR StoreServ Remote Copy and Management interfaces.  Then finally reconnect the virtual NICs on your vCenter and Site Recovery Manager virtual machines.

3PAR StoreServ & Site Recovery Manager Expected Behaviour

Purpose

The purpose of this post is to document the expected behaviour of the 3PAR StoreServ 7×00 and VMware Site Recovery Manager in both a ‘planned failover’ and ‘unplanned failover’.

Envrionment

The tests where performed on two different environments each containing the same infrastructure.

  • vCenter 5.5 Update 2 (Build 2001466)
  • Site Recovery Manager 5.8.0 (Build 2056894)
  • HP 3PAR SRA 5.5.2.285
  • HP 3PAR Inform OS 3.1.3 (MU1) P03, P07, P09

3PAR Details

Prominent details about the 3PAR configuration are highlighted below.

  • Single common provisioning group used for virtual volumes and remote copy space
  • Auto LUN ID used
  • Auto Recover enabled
  • A Synchronous replication using 15 minute interval schedule
  • Virtual Volumes are presented to a Host Set at Source Site and are ‘Exported
  • Virtual Volumes are presented to a Host Set at Target Site and are ‘Un-Exported’

During the tests, I was logged into the source and destination 3PAR StoreServ’s and issued the following 3PAR CLI commands to observe behaviour state.

  • showrcopy groups SRMTEST01*
    • Shows state of the remote copy group at each location
  • showrcopy links
    • Shows the status of the remote copy links at each location
  •  showvv
    • Shows the virtual volume information at each location

Planned Failover

Planned Failover is when both the source and target sites are both up.

The table below shows the observed behaviour on the 3PAR StoreServ at both the source and target sites along with the SRM workflow step

SRM Workflow Step Source RG Name Source Role Destination RG Name Destination Role Sync State
Pre Failover SRMTEST01 Primary SRMTEST01.r398979 Secondary Synched/Synching
Planned Failover SRMTEST01 Primary SRMTEST01.r398979 Primary-Rev Stopped
Reprotect SRMTEST01 Secondary-Rev SRMTEST01.r398979 Primary-Rev Synced/Syncing
Planned Failback SRMTEST01 Primary SRMTEST01.r398979 Secondary Stopped
Reprotect SRMTEST01 Primary SRMTEST01.r398979 Secondary Synced/Syncing

Unplanned Failover – Source Site Down

An Unplanned Failover is when the source site is down and the target site is up.

Before the Unplanned Failover workflow is instigated, the 3PAR StoreServ, vCenter and SRM virtual machines are isolated in the source site.

Note: This particular test this was performed during production hours with users accessing the source virtual machines for business as usual activities.  I will create a further blog post on how to achieve this.

The table below shows the observed behaviour on the 3PAR StoreServ behaviour at both source and target sites along with the SRM workflow step when the inter site link is down.

SRM Workflow Step Source RG Name Source Role Destination RG Name Destination Role Sync State
Unplanned Failover SSRMTEST01 Primary (Unconfirmed) SRMTEST01.r398979 Primary-Rev Stopped

The details below describe the behaviour observed and any error messages encountered.

  • 60 seconds is the timeout value for 3PAR remote copy to see the inter site link as down
  • showrcopy grounds SRMTEST01* command ran to verify that SyncStatus field displays ‘stopped’

StoreServ-7200 cli% showrcopy groups SRMTEST01*

Name                                   Target                 Status       Role            Mode       Options

SCC_SRMTEST01.r398979   StoreServ-7200   Stopped   Secondary   Periodic   Period 15m, over_per_alert
LocalVV                               ID                        RemoteVV                    ID          SyncStatus    LastSyncTime
SRMTEST01_DR                   14096                 SRMTEST01_PR           16598   Stopped        2015-04-21 14:22:57 BST

  • showrcopy links command ran on target 3PAR StoreServ to verify partner link is down

StoreServ-7200 cli% showrcopy links

Remote Copy System Information
Status: Started, Normal

Link Information

Target Node Address Status Options
StoreServ-7200 0:3:1 172.16.1.10 Down
StoreServ-7200 1:3:1 172.16.1.11 Down
receive 0:3:1 receive Up
receive 1:3:1 receive Up

  • Target SRM Server Error Message displayed

SRM Error Message

  • Target SRM logs checked which shows this is an expected behaviour as part of the SRM workflow, the target SRA tries to contact the source SRA but fails as the site is down.

Message [2015-04-21 14:35:47.272 ‘arrayMgm.GetRCTargetSysInfo’ 3PAR_3031 verbose (Process id=1652) (Thread id=1)] Complete: Info. Call. –> [2015-04-21 14:35:47.272 ‘discoverDevices.Run’ 3PAR_1013 error (Process id=1652) (Thread id=1)] Error. Peer array id <39897> is not a valid entry in the connected HP 3PAR Storage Server.

Unplanned Failover – Source Site Up

Inter site link re-established and source site checks are performed which entail:

  • Services checked on source  vCenter and SRM Server
    • SRM Service is stopped, expected behaviour as cannot communicate with vCenter. SRM Service started

The next step is CRITICAL in the SRM workflow.   At this point the source and target sites both hold primary read/write copies of data.

SRM at the source site believes that replication is continuing and that nothing has changed!

A device refresh is needed to enable to leverage the HP 3PAR SRA to discover the state of the 3PAR StoreServ arrays.  Once done the ‘Failover in Progress’ should be displayed.

Failover In Progress

 

The table below shows the observed behaviour on the 3PAR StoreServ behaviour at both source and target sites along with the SRM workflow step when the inter site link is up.

SRM Workflow Step Source RG Name Source Role Destination RG Name Destination Role Sync State
Source Site Up SRMTEST01 Primary SRMTEST01.r398979 Primary-Rev Stopped
Planned Failover SRMTEST01 Primary SRMTEST01.r398979 Primary-Rev Stopped
Reprotect SRMTEST01 Secondary-Rev SRMTEST01.r398979 Primary-Rev Synced/Syncing
Planned Failback SRMTEST01 Primary SRMTEST01.r398979 Secondary Stopped
Reprotect SRMTEST01 Primary SRMTEST01.r398979 Secondary Synced/Syncing

Final Thoughts

Using 3PAR StoreServ with Site Recovery Manager provides an easy to use workflow orchestration.  However it is critical to understand the behaviour of each dependency and identify and remediate any action which is not expected.

The key step in an unplanned failover is to refresh your devices once the inter site link is re-established.  If this is not done, you will asking SRM to perform a workflow which is out of synch with the 3PAR StoreServ which will result in a rebuild of your SRM environment and a call to HP and VMware support.

New: HP 3PAR StoreServ File Persona

For me, this is one of the best announcements at HP Discover, 3PAR StoreServ entering the world of ‘file’ level storage natively, removing the requirement for a StoreEasy gateway.

File Storage

 

Features

HP have confirmed that the following key features will work with ‘file’ level storage:

  • Thin Provisioning
  • Zero Detect
  • Adaptive and Dynamic Optimization
  • Adaptive Flash Cash (for reads)
  • Synchronous & Asynchronous replication via Remote Copy
  • Symantec & McAfee Anti Virus integration
  • Data at Rest Encryption*

*Note this is an optional license

3PAR Dashboard

Within the 3PAR Dashboard is a section called ‘File Persona’ which will enable the management of file shares, virtual file servers and persona configuration.

File Persona

Support

The following features will be supported at the initial release:

  • SMB 1.0, 2.0 and 3.0
  • NFSv3 and v4
  • Active Directory, LDAP and local user Authentication
  • DFS Namespace including Microsoft MMC support

Licence

To use ‘file’ level storage an extra license is required.  More on this to come when updates are released.

Arrays

To support ‘file persona’ the array needs to have extra cache, these come from the ‘C’ type models.  This essentially means that you need to swop out your existing controllers or purchase a new array.

More information on the ‘C’ arrays can be found over at Patrick Terlisten’s blog vCloudnine.de

New: HP 3PAR StoreServ Management Console ‘SSMC’

Those of you who have used the 3PAR Inform Management Console know that it wasn’t exactly the best, screen refreshes taking a while, being logged out of StoreServ’s with the connection still showing as open.

HP have decided to give the 3PAR Inform Management Console a ‘facelift’, step forward HP’s new 3PAR StoreServ Management Console AKA ‘SSMC’.

What’s New

  • New dashboard with the same look and feel as OneView
  • Management of file and block from same interface
  • Inbuilt System Reporter
  • Web based
  • Smart Search across all objects

So what does it look like? Well below are a couple of screenshots to wet your appetite.

3PAR Dashboard

3PAR Dashboard

 3PAR SmartSearch

3PAR SmartSearch

HP is moving towards a similar management experience for storage, servers and networking.  Something in my opinion has been long overdue.

Compatibility

SSMC will be compatible with 3PAR Inform OS 3.1.3 or above.

License

No licenses are required this will be a free download

Supported Operating Systems

SSMC will be available as a Windows based install on Windows Server 2008 R2, 2012 or 2012 R2.  It will also be available in certain flavours of Linux.

Final Thoughts

Over the long term I expect that the SSMC will be integrated into the VSP for 3PAR as this will give HP the ability to control software updates to the SSMC in a controlled fashion.

The Service Processor (SPOCK) is still a separate entity again, I expect this will be integrated further as the dot releases become available.