Azure Heavy Hitter Updates

Keeping up with Azure can be a full time task in itself with the plethora of updates. With this in mind, I thought I would share a couple of updates, which in my opinion are heavy hitters.

Account Failover for Azure Storage

Many of us use GRS storage for an added safety net, to ensure that data is available in a secondary paired region if the primary region has an outage. The kicker has always been that no SLA exists for this, it’s down to Microsoft to decide when they declare the primary region out and provide access to the replicated data.

Well that is all about to change with the announcement of ‘Account Failover for Azure Storage‘. This means that you are now in control of failing data over to your secondary region.

A couple of points which are worth noting:

  1. Having data available is only a single layer, think about security, identity and access, networks, virtual machines, PaaS etc in your secondary region
  2. Upon failover the secondary storage account is LRS, you will need to manually change this to GRS-RA and replicate back to your original primary region

Adaptive Network Hardening in Azure Security Center

I really enjoy updating an Access Control List, said no one ever!

Defining Network Security Groups (NSG) takes time and effort, with engagement across multiple stakeholders to determine traffic flow or you spend your time buried deep inside Log Analytics.

Microsoft have announced the public preview of Adaptive Network Hardening in Azure Security Center, which learns traffic flows (using machine learning) and provides recommendations for internet facing virtual machines.

A couple of points which are worth noting:

  1. This should be enabled when virtual machines are deployed to reduce the risk of rogue traffic
  2. As it mentions on the tin, this is for internet facing VMs only. However I’m sure this may be updated in due course.

Thanks for reading, tune in for the next post.

Using Azure Data Factory to Copy Data Between Azure File Shares – Part 3

This blog post is a continuation of Part 1 Using Azure Data Factory to Copy Data Between Azure File Shares and
Part 2 Using Azure Data Factory to Copy Data Between Azure File Shares. In this final part we are going to configure alerts to send an email on a failed pipeline run.

First of all select your Data Factory and then Select > Alerts > New Alerts Rule

In the previous configuration, the Azure Data Factory is running once a day. So with this in mind, we are going to Select ‘Add Condition’ then Failed Pipeline Runs.

Scroll down and Select Alert Logic. Ensure the conditions are set to Greater Than, Total 1. This essentially defines that if an issue occurs, perform an action.

Under the Evaluation based on, Select 12 Hours and Frequency Every Hour. This is how often the query is evaluated. It should look something like this:

Next we need to create an Action Group so when the above condition is met, an action is taken. I have called my Action Group VMF-WE-DFAG01, which stands for VMFocus, West Europe, DataFactory, ActionGroup 01.

For the short name, I have used Copy Failure, note this needs to be under 12 characters long.

Finally, I have chosen the ‘Action Type’ as Email/SMS/Push and entered in the appropriate contact details. Once done it should look something like this.

After a short while, you will receive an email from Microsoft Azure to confirm that you have been added to an Action Group.

Finally we want to give the Alert Rule a Name and a Description, such as the below.

That’s it your Azure Data Factory is all configured and ready for production use!

How To Configure WOL ESXi5

Distributed Power Management is an excellent feature within ESXi5, it’s been around for a while and essentially migrates workloads to fewer hosts to enable the physical servers to be placed into standby mode when they aren’t being utilised.

Finance dudes like it as it saves ‘wonga’ and Marketing dudettes like it as it give ‘green credentials’.  Everyone’s a winner!

vCenter utilises IPMI, iLO and WOL to ‘take’ the physical server out of standby mode.  vCentre tries to use IPMI first, then iLO and lastly WOL.

I was configuring Distributed Power Management and thought I would see if a ‘how to’ existed and perhaps my  ‘Google magic’ was not working, as I couldn’t find a guide on configuring WOL with ESXi5.  So here it is, let’s crack on and get it configured.

Step 1

First things first, we need to check our BIOS supports WOL and enable it.  I use a couple of HP N40L Microservers and the good news is these bad boys do.

WOL Boot

Step 2

vCenter uses the vMotion network to send the ‘magic’ WOL packet.  So obviously you need to check that vMotion is working.  For the purposes of this how to, I’m going to assume you have this nailed.

Step 3

Check you switch config. Eh don’t you mean my vSwitch config Craig? Nope I mean your physical switch config.  The ports that your vMotion network plugs into need to be set to ‘Auto’ as for WOL to work the ‘magic’ with certain manufacturers this has to go over a 100Mbps network connection.

Switch

Step 4

Now we have checked our physical environment, let’s check our virtual environment.  Go to your ‘physical adapters’ to determine if WOL is supported.

This can be found in the vSphere Web Client (which I’m trying to use more) under Standard Networks > Hosts > ESXi02 > Manage > Networking > Physical Adapters

WOL 1

We can see that every adapter supports WOL except for vmnic1.

Step 5

So we need to check our vMotion network to ensure that vmnic1 isn’t being used.

Hop up to ‘virtual switches’ and check your config.  Good news is I’m using vmnic0 and vmnic2 so we are golden.

WOL 2

Step 6

Let’s enable Distributed Power Management. Head over to vCenter > Cluster > Manage > vSphere DRS > Edit and place a tick in Turn ON vSphere DRS and select Power Management.  But ensure that you set the Automation Level to Manual. We don’t want servers to be powered off which can’t come back on again!

WOL 3

Step 7

Time to test Distributed Power Management! Select your ESXi Host, choose Actions from the middle menu bar and select All vCenter Actions > Enter Standby Mode

WOL 4

Ah, we have a dialogue box appear saying ‘the requested operation may cause the cluster Cluster01 to violate its configured failover level for high availability.  Do you want to continue?’

The man from delmonte he says ‘yes’ we want to continue!  The reason for the message is my HA Admission Control is set to 50%, so invoking a Host shut down is violating this setting.

WOL 5

vCenter is rather cautious and quite rightly so.  Now it’s asking if we want to ‘move powered off and suspended virtual machines to other hosts in the cluster’.  I’m not going to place a tick in the box and will select Yes.

WOL 6

We have a Warning ‘one or more virtual machines may beed to be migrated to another host in the cluster, or powered off, before the requested operation can proceed’.  This makes perfect sense as we are invoking DPM, we need to migrate any VM’s onto another host.

WOL 7

A quick vMotion later, and we can now see that ESXi02 is entering Standby Mode

WOL 8

You might as well go make a cup of tea as it takes the vSphere Client an absolute age to figure out the host is in Standby Mode.

WOL 9

Step 8

Let’s power the host back up again.  Right Click your Host and Select Power On

WOL 10

Interestingly, we see the power on task running in the vSphere Web Client, however if you jump into the vSphere Client and check the recent tasks pane, you see that it mentions ‘waiting for host to power off before trying to power it on’

WOL 11

This had me puzzled for a minute and then I heard my HP N40L Microserver boot and all was good with the world.  So ignore this piece of information from vCenter.

Step 9

Boom our ESXi Host is back from Standby Mode

WOL 12

Rinse and repeat for your other ESXi Hosts and then set Distributed Power Management to Automated and you are good to go.

How To Change Default IOP Limit

After my last blog post, I realised I hadn’t actually walked you threw how to change the default IOP limit used by Round Robin.

To crack on and do this we need a SSH client such as Putty

Each change, only has to be made per Datastore which makes things a little easier.

SSH to your ESXi Host and enter your credentials.  We are going to run the command to give us the Network Address Authority names of our LUN’s.

esxcli storage nmp device list | grep naa

NAA 1

A quick look in the vSphere Web Client shows us which Datastores the NAA belong too.

NAA 2

In my case, I want to change the settings for all of the Datastores.  So we will start by checking the current multi path policy to ensure it’s set to Round Robin and the default IOP maximum limit.  Let’s run the following command:

esxcli storage nmp psp roundrobin deviceconfig get -d naa.6000eb3b4bb5b2440000000000000021

A bit like ‘Blue Peter’ here is one I did earlier! Not very helpful.

NAA 3

Let’s run the same command again but for a different NAA.

NAA 4

Excellent, to change the default maximum IOP limit to 1 enter this command

esxcli storage nmp psp roundrobin deviceconfig set -d naa.6000eb39c167fb82000000000000000c –iops 1 –type iops

To check, everything is ‘tickety boo’ enter

esxcli storage nmp device list | grep policy

You should see that each Datastore default maximum IOP limit is set at 1

NAA 5

Performance Increase? Changing Default IOP Maximum

I was reading Larry Smith JR’s blog post on Nexentastor over at El Retardo Land and I didn’t know that you could change the default maximum amount of IOPS used by Round Robin.

By default vSphere allows 1000 IOPS down each path before switching over to the next path.

Now, I wanted to test the default against 1 IOP down each path, to see if I could eek some more performance out of the vmfocus.com lab.

So before we do this, what’s our lab hardware?

ESXi Hosts

2 x HP N40L Microserver with 16GB RAM, Dual Core 1.5GHz CPU, 4 NICs

SAN

1 x HP ML115 G5 with 8GB RAM, Quad Core 2.2GHz CPU, 5 NICs

1 x 120GB OCZ Technology Vertex Plus, 2.5″ SSD, SATA II – 3Gb/s, Read 250M using onboard SATA Controller

Switch

1 x HP 1910 24G

And for good measure the software?

ESXi Hosts

2 x ESXi 5.1.0 Build 799733 using 2 x pNIC on Software iSCSI Initiator with iSCSI MPIO

1 x Windows Server 2008 R2 2GB RAM , 1 vCPU, 1 vNIC

SAN

1 x HP StoreVirtual VSA running SANiQ 9.5 with 4GB RAM, 2vCPU, 4 vNIC

Switch

1 x HP v1910 24G

Let’s dive straight into the testing shall we.

Test Setup

As I’m using a HP StoreVirtual VSA, we aren’t able to perform any NIC bonding, which in turn means we cannot setup any LACP on the HP v1910 24G switch.

So, you may ask the question why test this as surely to use all the bandwidth you need them to be in LACP mode.  Yep, I agree with you, however, I wanted to see if changing the IOP limit per path to 1, would actually make any difference in terms of performance.

I have created an SSD Volume on the HP StoreVirtual VSA which is ‘thin provisioned’.

Volume Details

From this I created a VMFS5 datastore in vSphere 5.1 called SSDVOL01.

Datastore

And set the MPIO policy to Round Robin.

MPIO

VMF-APP01 is acting as our test server and this has a 40GB ‘thinly provisioned’ HDD.

HDD

We are going to use IOMeter to test our performance using the parameters set out under vmktree.org/iometer/

Test 1

IOP Limit – 1000

SANiQ v9.5

Test 1

Test 2

IOP Limit – 1

SANiQ v9.5

Test 2

Test 1 v 2 Comparison

Test 1 Comparison

We can see that we get extra performance at the cost of higher latency.  Now let’s upgrade to SANiQ v10.0 AKA LeftHand OS 10.0 and perform the same tests again and see what results we get as HP claim it to be more efficient,

Test 3

IOP Limit – 1000

LeftHand OS10.0 (SANiQ v10.0)

Test 3

Test 1 v 3 Comparison

Test 1v3  Comparison

HP really have made the LeftHand OS 10.0 more efficient some very impressive results!

Test 4

IOP Limit – 1

LeftHand OS10.0 (SANiQ v10.0)

Test 4

Test 2 v 4 Comparison

Test 2v4 Comparison

Overall, higher latency for slightly better performance.

Test 1 v 4 Comparison

Test 1v4 Comparison

From our original configuration of a 1000 IOPS Limit per path and SANiQ 9.5.  It is clear that an upgrade to LeftHand OS10.0 is a must!

Conclusion

I think the results speak for themselves, I’m going to stick with the 1 IOP limit on LeftHand OS10.0 as even though the latency is higher, I’m getting a better return on my overall random IOPS.