VSAN Observer Windows Server 2012 R2

Problem Statement

When launching VSAN Observer rvc.bat on Windows Server 2012 R2 from C:\Program Files\VMware\Infrastructure\VirtualCenter Server\support\rvc the CMD shell automatically closes after entering password.

Troubleshooting Steps Taken

  • Launched rvc.bat using ‘Run As Administrator’
  • Installed nokogiri -v 1.5.5 as described in Andrea Mauro blog post VMware Virtual SAN Observer
  • Followed the steps in VMware KB2064240 ‘Enabling or capturing performance statistics using Virtual SAN Observer for VMware Virtual SAN)
  • Tried the following credentials when launching rvc.bat
    • administrator@vmf-vc01.vmfocus.com
    • administrator@localhost
    • administrator@vmf-vc01

Frustratingly none of these steps worked, so I decided to ask Erik Bussink whom I know has been working with VSAN for a while and had written the excellent blog post ‘Using the VSAN Observer in vCenter 5.5

Resolution

Launch rv.bat and enter the credentials in the format administrator@vpshere.local@FQDN which is administrator@vpshere.local@vmf-vc01.vmfocus.com for me

VSAN Observer 01

Enter the password for the SSO account administrator@vsphere.local

Enter vsan.observer <vcenter-hostname>/<Datacenter-name>/computers/<Cluster-Name>/ –-run-webserver -–force  which for me is vsan.observer vmf-vc01.vmfocus.com/Datacenter01/computers/Cluster01 –-run-webserver -–force

VSAN Observer 02

This fails with ‘OpenSSL::X509::CertificateError: error getting time’.

VSAN Observer runs under http, so to get around this add the parameter –no-https

vsan.observer vmf-vc01.vmfocus.com/Datacenter01/computers/Cluster01 –-run-webserver -–force –no-https

VSAN Observer 03

Launch http://vcentername:8010 which in my case is http://vmf-vc01:8010

VSAN Observer 04

Notice that I’m using FireFox as the browser, I found that Internet Explorer displayed the message {{profilingTimes}} and incomplete information.

VSAN Observer 05

VSAN Configuration

In the last blog post I covered the VSAN Prerequisites, now it’s time to configure VSAN.  For the sake of completeness I had already configured a vDS with a port group named VSAN_VLAN20 as shown in the screenshot below.

VSAN vDS01

Enabling VSAN

Enabling VSAN is a one click operation at the Cluster level.  Simply tick to Turn On Virtual SAN

VSAN 01

  • Automatic enables VSAN to claim SSD and SATA and form a disk group on each ESXi Host
  • Manual enables the vSphere administrator to manually assign disks to the disk group on each ESXi Host

For my deployment ‘Automatic’ was the logical choice as I had already created a VMFS volumes on my local datastores on each ESXi Host and therefore VSAN would be unable to claim them.

Under Disk Management I can see the disk group which has been created and the local disks which have been assigned into the disk group.

VSAN 02

Storage Policy

VSAN automatically creates storage policies which are exposed via VASA when VSAN is enabled.  The storage policies available are:

  • Number of Failures to Tolerate
    • VSAN creates a RAID 1 copy of the working data set, with a witness on a third ESXi Host.  If the policy is set to 1 then 2 copies of each data set are created.  If the policy is set to 2 then 3 copies of each data set are created.
  • Number of Disk Stripes Per Object
    • An object is striped across magnetic disks to potentially increase performance.  Two things to bear in mind here, the first that if you have multiple magnetic disks in a disk group, then VSAN might stripe across those and the second is that a stripe width greater than one should only be used if you are getting read cache misses that cannot be served from a single magnetic disk e.g. VM requires 400 I/O
  • Flash Read Cache Reservation
    • Provides the ability to specify in percentage terms how much of an SSD is used for read cache e.g. 100GB VM with 1% policy would use 1GB on a 250GB SSD
  • Object Space Reservation
    • Provides the ability to reserve all space upfront using Lazy Zeroed Thick

Note: If you do not define a Storage Policy VSAN automatically defaults to ‘Number of Failures to Tolerate equals 1’

I have created a Storage Policy called VSAN Failure To Tolerate 1.  When you click on ‘Rules Based on Vendor Specific Capabilites’ and select ‘VSAN’ the above Storage Policies are presented and you can select which policy is required.

VSAN 03

Virtual Machines

The last thing to do is migrate virtual machines across to the VSAN Datastore.  This is a straight forward operation which only requires the vSphere administrator to select the correct Storage Policy.

VSAN 04

VSAN Prerequisites

In my last blog post I covered the VSAN Lab for VMFocus.com.  In this post I’m going to cover the prerequisites that need to be met before I will be in a position to install VSAN.

VMware Compatibility Guide

For a production environment your first point of call should be the VMware Compatibility Guide with the ‘what are you looking for tagged with ‘Virtual SAN’ to confirm that your hardware is compatible and perhaps most importantly will be supported by VMware.

Vmware Compatibility GuideIt’s also worth pointing out that in a production environment, you should cross reference the recommended drivers against those used within an custom OEM ESXi image from Dell or HP as pointed out in Cormac Hogans blog post entitled VSAN and OEM ESXi ISO images.

My preference would be to use a custom OEM ESXi image and then downgrade the drivers as you get visibility to the of all the manufacturers MIBs.

Even though the HP DL380 G6 is on the VMware Compatibility Guide the Smart Array P410i isn’t and nor are my hard drives.  So as this is a lab environment, I will be using the HP ESXi 5.5 U2 Custom ISO on the HP DL380 G6 with the latest drivers.

Step 1 – Firmware

The first step is to ensure the firmware is up to date on my HP DL380 G6 servers.  The easiest way to do this is to download an install HP Service Pack for ProLiant ISO which includes HP Smart Update Manager v7.1.0.

Launch the batch file entitled ‘launch_hpsum.bat’ and you will be redirected to the HP Smart Update Manager 7.1.0 web browser.

An inventory of the software packages on the HP Service Pack for ProLiant ISO will be undertaken so that HP SUM understands what firmware it has access to within the HP SUM ISO repository.

HP SUM 01

 

Once this completes launch Nodes, I will need to add in two Nodes for each ESXi host, one for iLO and one for the ESXi Host.  The iLO node will update the iLO firmware and the ESXi Host node will update the hardware firmware on the HP DL380 G6.

When each Node is added, you need to supply the correct credentials to access the iLO and ESXi Host and also apply the Baseline (in my case HP Service Pack for ProLiant 2014.09.0 at E:/hp/swpackages).

HP SUM 02

Next we perform an Inventory of the Node to see if any firmware needs to be upgraded

HP SUM 03

Once the inventory is performed, its simply a case of Deploying the updates and restarting the ESXi Host.

HP SUM 04

 

Step 2 – Storage Controller Queue Depth

Storage Controller queue depth can cause an issue with VSAN when it is serving normal SCSI requests and undertaking a rebuild operation after a failure.  The recommended queue depth for VSAN is 256.  We will verify that the Smart Array P410i meets this requirement by jumping into ESXTOP and pressing d then f and then d again to select QSTATS.

The Smart Array P410i is on vmhba1 and the queue depth is 1011.

Queue Depth 01

Step 3 – Storage Controller Read Cache

Now the firmware has been updated, the next step is to disable write caching on the Smart Array P410i.  This needs to be done to allow VSAN to make the decisions about write buffering and de-staging of IO to magnetic disk rather than the storage controller.

The Smart Array P410i does not allow direct pass through so we need to configure each SSD and SATA drive in RAID 0 and change the storage controller to 100% Read Cache and 0% Write Cache.  To do this we will use hpssacli (which is included in HP ESXi Custom ISO).  The hpssacli enables us to make changes to the storage controller via SSH.

Note: For each command we want to run, the prefix /opt/hp/hpssacli/bin/hpssacli 

The first thing we need to do is identify the slot the Smart Aray P410i is in, by typing /opt/hp/hpssacli/bin/hpssacli ctrl all show config

Controller Cache 05

As you can see mine is in ‘slot 0’ with is embedded.  Next I’m going to run the command /opt/hp/hpssacli/bin/hpssacli ctrl slot=0 show detail to check the cache ratio.

Controller Cache 06

Mine is currently set to 25% Read and 75% Write.  To change this run the command /opt/hp/hpssacli/bin/hpssacli ctrl slot=0 modify cacheratio=100/0 then another /opt/hp/hpssacli/bin/hpssacli ctrl slot=0 show detail confirms the change has been made

Controller Cache 07

Step 4 – Wipe Hard Drives

For VSAN to be able to see my and claim my hard drives they need to be wiped clean of the existing VMFS format. This can be achieved using parted Util which is included in vSphere kernel or Gparted.

My preference is to keep things simple, so I’m going to download Gparted and boot from the ISO to wipe my hard drives.

Step 5 – Check SSD Enabled

As my Smart Array P410i is running RAID 0 for my SSD drives, we need to verify that they are shown in vSphere as SSD by running the command esxcli storage core device list to obtain the network address authority (NAA)

SSD 03

As we can see Is SSD: false.  So I’m going to change this by running the command esxcli storage nmp satp rule add –device naa.600508b1001c3ffd07cece41dbad09b4  –satp VMW_SATP_LOCAL –option enable_ssd

Then we are going to reclaim the SSD by running the command esxcli storage core claiming reclaim –device naa.600508b1001c3ffd07cece41dbad09b4

To verify it’s now displayed as an SSD run the command esxcli storage core device list again

SSD 04

Step 6 – IGMP Snooping

VSAN requires Multicast traffic to be enabled for two reasons:

  • To discover participating members in a VSAN cluster and to determine VSAN host states
  • To update Clustering Service (CMMDS) for items such as object placement and statistics

IGMP (Internet Group Management Protocol) snooping is a multicast constraining mechanism that runs on layer 2 devices to manage and control multicast groups.

  • Reduces broadcast traffic
  • Increases security
  • Per host interaction

Best to show this with a picture!

VMFocus VSAN IGMP Diagram

VSAN traffic on my HP v1910 is on VLAN 20, so I will enable IGMP snooping for this VLAN only.  The first step is to enable IGMP on the HP v1910.  To this this click Network > IGMP Snooping > Enable

IGMP 03

After enabling IGMP globally, next I need to apply it to VLAN 20, select ‘Operation’

IGMP 01

 

IGMP Snooping ‘Enable’ > Version 3

IGMP 02

 

Verify that it IGMP is enabled for VLAN 20.

IGMP 04

 

Step 7 – Virtual Machines

Now that I have completed the tasks above, it’s time to create the following virtual machines on VMF-ESXi03 which has a 6 x 72GB 10K SAS drives in RAID 5

  • VMF-DC01 (Windows Server 2012 R2 Domain Controller
  • VMF-DC02 (Windows Server 2012 R2 Domain Controller)
  • VMF-CA01 (Windows Server 2012 R2 Certificate Authority)
  • VMF-SQL01 (Windows Server 2012 R2 running SQL Server 2012)
  • VMF-VC01 (Windows Server 2012 R2 running vCenter 5.5 U2)

See you on the next post when I have these all up and running.

VSAN Lab

Hands up, I confess I haven’t played with VSAN yet.  Even though I have three ESXi Hosts, these have been configured for use with HP StoreVirual and Site Recovery Manager.

It’s time to shake things up a little and introduce VSAN into the VMFocus.com lab.  This will be the first in a series of blog posts installing and configuring VSAN on HP DL380 G6 servers.  So before we go any further a logical diagram of the configuration.

VSAN Logical

Hardware

ESXi Hosts

As menitioned in the VMFocus.com lab I have 3 x HP DL380 G6, each with the following specification:

  • 2 x Intel Xeon L5520 Quad Core 2.26GHz giving a total of 16 Hyper Threaded Cores
  • 56GB RAM
  • 8 x 1GB NIC’s (2 x Dual Port Built in, 1 x Quad Port,)
  • 1 x P410 Smart Array 6Gb/s
  • 1 x Samsung EVO 250GB 2.5″ SSD 6Gb/s
  • 1 x Hitachi Travelstar 7.2K 1TB 2.5″ SATA 6Gb/s
  • 2 x HP 72GB 15K SAS HDD
    • One Host has an extra 2 x 300GB 10K 2.5″ HDD
  • 2 x PSU
  • 1 x iLO
  • 1 x 8GB SD Card

Networking

  • 1 x HP v1910 24G Layer 2 switch with static routing

Configuration

ESXi Host

  • Boot
    • The plan is to boot ESXi from internal SD card
  • Management Network
    • vDS with a Port Group consisting of two active 1Gbps interfaces and providing a resilient VMkernel Management network.
  • VSAN Network
    • vDS with a Port Group consisting of two active 1 Gbps interfaces providing an resilient VSAN network.
  • vMotion Network
    • vDS with a Port Group consisting of two active 1 Gbps interfaces and will provide a Multi-NIC vMotion network.
  • Virtual Machine Network
    • vDS with a Port Group consisting of two active 1 Gbps interfaces and will provide an active active Virtual Machine network
  • Backups
    • Veeam 8 will be used as this is compatible with VSAN.  The ESXi Host which has the 2 x 300GB configured in RAID 1 will be used as the backup repository

VSAN

Initial Deployment

You get a bit of a chicken and egg scenario with VSAN as you need to have vCenter available for your VSAN Cluster to be created.  My plan is to use the 2 x HP 72GB 15K SAS HDD in RAID 1 which will provide storage capacity and performance for Active Directory, vCenter and SQL before the VSAN cluster is formed.  They can also then be used to place ISO’s on in the future.

Disk Group

This is going to be a simple configuration as I’m limited to one disk group only having one SSD in each ESXi Host.

Storage Policy

Again, this is going to be straight forward as it’s my home lab I want to have a mixture between capacity and performance, so I will use the default setting which is ‘number of failures to tolerate = 1’

Network

This could be a blog post in itself! VSAN has a number of requirements which have to be met, these are:

  • VSAN does not support multiple VSAN VMkernel interfaces on the same subnet for load balancing
  • VSAN does support IP Hash Load Balancing but if it is the only type of traffic on a 1Gb network, then you are unlikely to receive any benefits against using Route Based on Originating Port ID with Explicit Failover
  • VSAN does support multiple VSAN VMkernel interfaces on different subnets for load balancing
    • As I haven’t deployed VSAN yet, I’m not sure if that applies to different disk groups e.g. 1 x Disk Group in 192.168.1.x/24 and another 1 x Disk Group in 192.168.2.x/24.  Something which I will have to test.

With the above in mind and the constraints of my network being 1 GbE with 8 physical NIC’s I decided to go with a simple configuration using Load Based Teaming with resilience at the pNIC level (switch is a single point of failure) based on the following:

  • 2 x 1 GbE physical NIC’s per traffic type providing a simple configuration for troubleshooting (in case I encounter any issues)
  • A single Port Group with NIOC could be leveraged across the 8 x 1GbE physical NIC’s.  However shares would need to be configured and VSAN traffic would only be entitled to 572 Mbps in periods of congestion (think vMotion).
    • High – 4 Shares – 572 Mbps for VSAN traffic
    • Normal – 2 Shares – vMotion/Virtual Machine traffic 286 Mbps
    • Low – 1 Share – Management traffic – 143 Mpbs
  • NIOC on a 1GbE network is not supported
  • NIC teaming is used for availability not bandwidth aggregation
    • Route based on Port ID would be active / standby with Explicit Failover order
    • Route based on IP Hash would be active / active but unlikely to use extra bandwidth as source and destination will remain constant in the VSAN intra-cluster communication
    • Route based on physical NIC load (Load Based Teaming) provides an active / active configuration

A picture speaks a thousand words, so the proposed network logical diagram is shown below.

VSAN Network

Stay tuned for the next blog post in which I cover the prerequisites for the VMFocus home lab.