How To Change Default IOP Limit

After my last blog post, I realised I hadn’t actually walked you threw how to change the default IOP limit used by Round Robin.

To crack on and do this we need a SSH client such as Putty

Each change, only has to be made per Datastore which makes things a little easier.

SSH to your ESXi Host and enter your credentials.  We are going to run the command to give us the Network Address Authority names of our LUN’s.

esxcli storage nmp device list | grep naa

NAA 1

A quick look in the vSphere Web Client shows us which Datastores the NAA belong too.

NAA 2

In my case, I want to change the settings for all of the Datastores.  So we will start by checking the current multi path policy to ensure it’s set to Round Robin and the default IOP maximum limit.  Let’s run the following command:

esxcli storage nmp psp roundrobin deviceconfig get -d naa.6000eb3b4bb5b2440000000000000021

A bit like ‘Blue Peter’ here is one I did earlier! Not very helpful.

NAA 3

Let’s run the same command again but for a different NAA.

NAA 4

Excellent, to change the default maximum IOP limit to 1 enter this command

esxcli storage nmp psp roundrobin deviceconfig set -d naa.6000eb39c167fb82000000000000000c –iops 1 –type iops

To check, everything is ‘tickety boo’ enter

esxcli storage nmp device list | grep policy

You should see that each Datastore default maximum IOP limit is set at 1

NAA 5

Performance Increase? Changing Default IOP Maximum

I was reading Larry Smith JR’s blog post on Nexentastor over at El Retardo Land and I didn’t know that you could change the default maximum amount of IOPS used by Round Robin.

By default vSphere allows 1000 IOPS down each path before switching over to the next path.

Now, I wanted to test the default against 1 IOP down each path, to see if I could eek some more performance out of the vmfocus.com lab.

So before we do this, what’s our lab hardware?

ESXi Hosts

2 x HP N40L Microserver with 16GB RAM, Dual Core 1.5GHz CPU, 4 NICs

SAN

1 x HP ML115 G5 with 8GB RAM, Quad Core 2.2GHz CPU, 5 NICs

1 x 120GB OCZ Technology Vertex Plus, 2.5″ SSD, SATA II – 3Gb/s, Read 250M using onboard SATA Controller

Switch

1 x HP 1910 24G

And for good measure the software?

ESXi Hosts

2 x ESXi 5.1.0 Build 799733 using 2 x pNIC on Software iSCSI Initiator with iSCSI MPIO

1 x Windows Server 2008 R2 2GB RAM , 1 vCPU, 1 vNIC

SAN

1 x HP StoreVirtual VSA running SANiQ 9.5 with 4GB RAM, 2vCPU, 4 vNIC

Switch

1 x HP v1910 24G

Let’s dive straight into the testing shall we.

Test Setup

As I’m using a HP StoreVirtual VSA, we aren’t able to perform any NIC bonding, which in turn means we cannot setup any LACP on the HP v1910 24G switch.

So, you may ask the question why test this as surely to use all the bandwidth you need them to be in LACP mode.  Yep, I agree with you, however, I wanted to see if changing the IOP limit per path to 1, would actually make any difference in terms of performance.

I have created an SSD Volume on the HP StoreVirtual VSA which is ‘thin provisioned’.

Volume Details

From this I created a VMFS5 datastore in vSphere 5.1 called SSDVOL01.

Datastore

And set the MPIO policy to Round Robin.

MPIO

VMF-APP01 is acting as our test server and this has a 40GB ‘thinly provisioned’ HDD.

HDD

We are going to use IOMeter to test our performance using the parameters set out under vmktree.org/iometer/

Test 1

IOP Limit – 1000

SANiQ v9.5

Test 1

Test 2

IOP Limit – 1

SANiQ v9.5

Test 2

Test 1 v 2 Comparison

Test 1 Comparison

We can see that we get extra performance at the cost of higher latency.  Now let’s upgrade to SANiQ v10.0 AKA LeftHand OS 10.0 and perform the same tests again and see what results we get as HP claim it to be more efficient,

Test 3

IOP Limit – 1000

LeftHand OS10.0 (SANiQ v10.0)

Test 3

Test 1 v 3 Comparison

Test 1v3  Comparison

HP really have made the LeftHand OS 10.0 more efficient some very impressive results!

Test 4

IOP Limit – 1

LeftHand OS10.0 (SANiQ v10.0)

Test 4

Test 2 v 4 Comparison

Test 2v4 Comparison

Overall, higher latency for slightly better performance.

Test 1 v 4 Comparison

Test 1v4 Comparison

From our original configuration of a 1000 IOPS Limit per path and SANiQ 9.5.  It is clear that an upgrade to LeftHand OS10.0 is a must!

Conclusion

I think the results speak for themselves, I’m going to stick with the 1 IOP limit on LeftHand OS10.0 as even though the latency is higher, I’m getting a better return on my overall random IOPS.

vSphere Web Client: No vCenter

Following on from previous blog post vSphere Web Client: Provided Credentials Are Invalid we have logged into the vSphere Web Client but we don’t actually have anything we can manage.  I think the words we are looking for are ‘man down’.

It all boils down to permissions, we need to logout from the vSphere Web Client and fire up our old trust friend the vSphere Client.

Login with the user credentials you would need to access vCenter Server Appliance, the defaults are U: root P: vmware

vCenter 1

Ah ha, now we see our vCenter (I’m sure you weren’t concerned that all your config had gone)

vCenter 2

Right Click the root level and Add Permission

vCenter 4

Select Assigned Roles and change this to Administrator and then Click Add

vCenter 5

Select your Domain, and change the View to Show Groups First and select Domain Admins and then Add.  Naturally you might not want Domain Admins to have access in the ‘real world’ so select the appropriate Security Group.

vCenter 6

You should see that your Domain\Domain Admins appears under ‘Groups’ Hit OK

vCenter 7

Then Hit OK again to confirm

vCenter 8

TOP TIP: Make sure Propagate to Child Objects is ticked

Exit the vSphere Client and login to the vSphere Web Client using https://<IP Address>:9443/vsphere-client/

vCenter 9

Boom, we have a vCenter Server, Hosts and everything!

vSphere Web Client: Provided Credentials Are Invalid

So you have battled your way through installing vSphere 5.1 and you are finally at the point when you are ready to login, but you get the epic fail ‘provided credentials are not valid’.  By now you have probably tried every format under the sun to login.

domain\username

username@domain

username

SSO 1

But nothing is working, what’s going on? The vCenter Server Appliance is showing that Active Directory Authentication is ‘Enabled’

SSO 2

Well to be honest, the vCenter Server Appliance is telling ‘porky pies’ it hasn’t actually done squat with Active Directory and this is the reason you can’t login.  So let’s get that sorted.

Login to the vSphere Web Client using https://<IP Address>:9443/vsphere-client/

Enter the username and password you use to login to the vCenter Server Appliance, the defaults are U: root P: vmware

SSO 3

Hooray, you are in the vSphere 5.1 Web Client! We need to select Administration from the left hand menu

SSO 4

Select Sign-On and Discovery and then Configuration followed by clicking the + in the top left under Identity Sources

SSO 5

Voila, this is where we need to do the Active Directory Authentication as follows:

Identity Source Type select Active Directory

Name: vmFocus

Primary Server URL: this is your Primary Domain Controller, the format is ldap://vmf-dc01.vmfocus.local

Base DN For Users: this is CN=Users,DC=vmfocus,DC=local

Domain Name: this is vmfocus.local

Domain Alias: this is vmfocus

Base DN For Groups: this is CN=vCenter_Access,rootOU=SecurityGroups,DC=vmfocus,DC=local

Authentication Type: Password

Username: vmfocus\vmware.service

Password: password

Once you have entered all this in, hit Test Connection

SSO 11

TOP TIP: If you don’t know your base DSN, fire up ADSI EDIT and it’s easy to see

If all is successful, you should see ‘the connection has been established successfully’.

SSO 7

We now need to tell vSphere 5.1 to use the Active Directory to allow users to login.  Select your domain and click Add to Default Domains

SSO 8

You will get the warning ‘having multiple domains in the Default Domain list might result in locked user accounts during authentication’ I think we are willing to take the risk, considering we can’t even login yet.  So hit OK.

SSO 9

Fingers crossed, you should see your domain listed at the bottom under ‘Default Domains’ Don’t forget to hit the save icon.

SSO 10

Right then let’s give it a whirl, logout and try login with an Active Directory User who is in the Group vCenter_Access

SSO 12

Boom it works! But hold on a minute, I don’t see my vCenter or Hosts.  Hold tight, we will cover this in our next blog post.

Gotcha: vSphere Metro Storage Cluster (VMSC) & HP StoreVirtual

So you have put together an epic vSphere Metro Storage Cluster using your HP StoreVirtual SAN (formerly Lefthand) using the following rules:

  • Creating volumes for each site to access it’s datastore locally rather than going across the inter site link
  • Creating DRS ‘host should’ rules so that VM run on the ESXi Hosts local to the volumes and datastores they are accessing.

The gotcha occurs when you have a either a StoreVirtual Node failure or a StoreVirtual Node is rebooted for maintenance, let me explain why.

In this example we have a Management Group called SSDMG01 which contains:

  • SSDVSA01 which is in Site 1
  • SSDVSA02 which is in Site 2
  • SSDFOM which is in a Site 3

We have a single volume called SSDVOL01 which is located at Site 1

StoreVirtual uses a ‘Virtual IP’ Address to ensure fault tolerance for iSCSI access, you can view this under your Cluster then iSCSI within the Centralized Management Console.  In my case it’s 10.37.10.2

Even though iSCSI connections are made via the Virtual IP Address, each Volume goes via a ‘Gateway Connection’ which is essentially just one of the StoreVirtual Nodes.  To check which gateway your ESXi Hosts are using to access the volumes, select your volume and then choose iSCSI Sessions.

In my case the ESXi Hosts are using SSDVSA01 to access the volume SSDVOL01 which is correct as they are at Site 1.

Let’s quickly introduce a secondary a second Volume called SSDVOL02 and we want this to be in Site 1 as well.  Let’s take a look at the iSCSI sessions for SSDVOL02

Crap, they are going via SSDVSA02 which is at the other site, causing latency issues.  Can I do anything about this in the CMC? Not that I can find.

HP StoreVirtual is actually very clever, what it has done is load balance the iSCSI connections for the volumes across both nodes in case of a node failure.  In this case SSDVOL01 via SSDVSA01 and SSDVOL02 via SSDVSA02.  If you have ever experienced a StoreVirtual node failure you know that it takes around 5 seconds for the iSCSI sessions to be remapped, leaving your VM’s without access to there HDD for this time.

What can you do about this? Well when creating your volumes make sure you do them in the order for site affinity to the ESXi Hosts, we know that the HP StoreVirtual just round robins the Gateway Connection.

That’s all very well and good, what happens when I have a site failure, let’s go over this now.  I’m going to pull the power from SSDVSA01 which is the Gateway Connection for SSDVOL01.  It actually has a number of VM’s running on it.

Man down! As you can see we have a critical event against SSDVSA01 and the volume SSDVOL01 status is ‘data protection degraded.

Let’s take a quick look at the iSCSI sessions for SSDVOL01, they should be using the Gateway Connection SSDVSA02

Yep all good, it’s what we expected.  Now let’s power SSDVSA01 back up again and see what happens.  You will notice that the HP StoreVirtual re syncs the volume between the Nodes and then it’s shown as Status: Normal.

Here’s the gotcha, the iSCSI sessions will continue to use SSDVSA02 in Site 2 even though SSDVSA01 is back online at Site 1.

After around five minutes StoreVirtual will automatically rebalance the iSCSI Gateway Connections.  Great you say, ah but we have a gotcha.  As SSDVOL02 has now been online the longest, StoreVirtual will use SSDVSA01 as the gateway connection meaning we are going across the intersite link.  So to surmise our current situation:

  • SSDVOL01 using Site2 SSDVSA01 as it’s Gateway Connection
  • SSDVOL02 using Site1 SSDVSA02 as it’s Gateway Connection

Not really the position we want to be in!

Rebalance 2Rebalance

We can get down and dirty using the CLIQ to manually rebalance the SSDVOL01 onto SSDVSA01 perhaps? Let’s give it a whirl shall we.

Login to your VIP address using SSH but with the Port 16022 and enter your credentials.

Then we need to run the command ‘rebalanceVIP volumeName=SSDVOL01’

Rebalance 3

If your quick and flick over to the CMC you will see the Gateway Connection status as ‘failed’ this is correct don’t panic.

Rebalance 4

Do we have SSDVOL01 using SSDVSA01? Nah!

Rebalance 2

The only way to resolve this is to either Storage vMotion your VM’s onto a volume with enough capacity at the correct site or reboot your StoreVirtual Node in Site 2.

In summary, even though HP StoreVirtual uses a Virtual IP Address this is tied to a Gateway Connection via a StoreVirtual Node, you are unable to change the iSCSI connections manually without rebooting the StoreVirtual Nodes.

Hopefully, HP might fix this with the release of LeftHand OS10.1