I was reading Larry Smith JR’s blog post on Nexentastor over at El Retardo Land and I didn’t know that you could change the default maximum amount of IOPS used by Round Robin.
By default vSphere allows 1000 IOPS down each path before switching over to the next path.
Now, I wanted to test the default against 1 IOP down each path, to see if I could eek some more performance out of the vmfocus.com lab.
So before we do this, what’s our lab hardware?
ESXi Hosts
2 x HP N40L Microserver with 16GB RAM, Dual Core 1.5GHz CPU, 4 NICs
SAN
1 x HP ML115 G5 with 8GB RAM, Quad Core 2.2GHz CPU, 5 NICs
1 x 120GB OCZ Technology Vertex Plus, 2.5″ SSD, SATA II – 3Gb/s, Read 250M using onboard SATA Controller
Switch
1 x HP 1910 24G
And for good measure the software?
ESXi Hosts
2 x ESXi 5.1.0 Build 799733 using 2 x pNIC on Software iSCSI Initiator with iSCSI MPIO
1 x Windows Server 2008 R2 2GB RAM , 1 vCPU, 1 vNIC
SAN
1 x HP StoreVirtual VSA running SANiQ 9.5 with 4GB RAM, 2vCPU, 4 vNIC
Switch
1 x HP v1910 24G
Let’s dive straight into the testing shall we.
Test Setup
As I’m using a HP StoreVirtual VSA, we aren’t able to perform any NIC bonding, which in turn means we cannot setup any LACP on the HP v1910 24G switch.
So, you may ask the question why test this as surely to use all the bandwidth you need them to be in LACP mode. Yep, I agree with you, however, I wanted to see if changing the IOP limit per path to 1, would actually make any difference in terms of performance.
I have created an SSD Volume on the HP StoreVirtual VSA which is ‘thin provisioned’.
From this I created a VMFS5 datastore in vSphere 5.1 called SSDVOL01.
And set the MPIO policy to Round Robin.
VMF-APP01 is acting as our test server and this has a 40GB ‘thinly provisioned’ HDD.
We are going to use IOMeter to test our performance using the parameters set out under vmktree.org/iometer/
Test 1
IOP Limit – 1000
SANiQ v9.5
Test 2
IOP Limit – 1
SANiQ v9.5
Test 1 v 2 Comparison
We can see that we get extra performance at the cost of higher latency. Now let’s upgrade to SANiQ v10.0 AKA LeftHand OS 10.0 and perform the same tests again and see what results we get as HP claim it to be more efficient,
Test 3
IOP Limit – 1000
LeftHand OS10.0 (SANiQ v10.0)
Test 1 v 3 Comparison
HP really have made the LeftHand OS 10.0 more efficient some very impressive results!
Test 4
IOP Limit – 1
LeftHand OS10.0 (SANiQ v10.0)
Test 2 v 4 Comparison
Overall, higher latency for slightly better performance.
Test 1 v 4 Comparison
From our original configuration of a 1000 IOPS Limit per path and SANiQ 9.5. It is clear that an upgrade to LeftHand OS10.0 is a must!
Conclusion
I think the results speak for themselves, I’m going to stick with the 1 IOP limit on LeftHand OS10.0 as even though the latency is higher, I’m getting a better return on my overall random IOPS.
Assuming “Test 4” is actually “IOPS Limit – 1” (typo) – right? You might want to look at using IOPS=QUE_DEPTH as well to resolve (lessen) your latency bump…
Thanks Colin, removed the typo. I will monitor the Software iSCSI Adapter queue depths and see what’s happening.
Hello,
in our tests we found out that if flow control ist enabled or not on th iSCSI-Switches is a very important thing. (With FlowControl tha latency is half and the real-life performance double up and CPU-Load is going down.)
Thanks for reading the article, great point. With HP StoreVirtual you should enable flow control on your switches (this is the recommended practice http://h17007.www1.hp.com/docs/justrightit/HP%20StoreVirtual%20P4000%20Networking%20Recommendations.pdf), however my lab switch doesn’t have this ability. Naturally, always consult with your storage provider for recommended practices.