vSphere 5.x Space Reclamation On Thin Provisioned Disks

VMFocus Wide Featured Image

Space reclamation can be performed either on vSphere after a Storage vMotion has taken place or when files have been deleted from within a guest operating system.

With the release of LeftHand OS 12.0 as covered in my post ‘How To: HP StoreVirtual LeftHand OS 12.0 With T10 UNMAP‘, I thought it would be an idea to share the process of space reclamation within the guest operating system.

The reason for covering space reclamation within the guest operating system, is that I believe it’s the more common in business as usual operations.  Space reclamation on vSphere and Windows is a two step process.

  • Zero the space in the guest operating system if you are running Windows Server 2008 R2 or below.
    • UNMAP is enabled automatically as in Windows Server 2012 or above
    • If VMDK is thin provisioned you might want to shrink it back down again
  • Zero the space on your VMFS file system

I’m going to run space reclamation on a Windows Server 2008 R2 on a virtual machine called DC01-CA01 and has the following storage characteristics:

Original Provisioned Space

  • Windows C: Drive – 24.9GB free space
  • Datastore – 95.47GB free space
  • Volume – 96.93GB consumed space
    • 200GB Fully Provisioned with Adaptive Optimisation enabled

Space Reclaimation 05

Next I’m going to drop two files onto the virtual machine which total 2.3GB in space.  This changes the storage characteristics of DC01-CA01 to the following:

Increased Provisioned Space

  • Windows C: Drive – 22.6GB free space
    • 2.3GB increase in space usage
  • Datastore – 93.18GB free space
    • 2.29GB increase in space usage
  • Volume – 99.22GB consumed space
    • 2.29GB increase in space usage

Space Reclaimation 06

Sdelete

Next I have deleted the files from the C: Drive on DC01-CA01 and emptied the recycle bin.  Followed by running sdeldete with the command parameters ‘sdelete.exe -z C:’ This takes a bit of time, so I’m going to make a cup of tea!

Space Reclaimation 07

WARNING: Running Sdelete will increase the size of the thin provisioned disk to it’s maximum size.  Make sure you have space to accommodate this on your volume(s).

VMKFSTools

Now sdelete has finished, we need to run vmkfstools on the datastore to shrink the thin provisioned VMDK back down to size. To do this the virtual machine needs to be powered off.

SSH into the ESXi Host and CD into the directory in which your virtual machine resides.  In my case this is cd /vmfs/volumes/DC01-NODR01/DC01-CA01

Next run the command ls -lh *.vmdk which shows the space being used by the virtual disks.  Currently stands at 40GB.

Space Reclaimation 13

Next we want to get rid of the zero blocks in the MDK by issuing the command vmkfstools –punchzero DC01-CA01.vmdk

Space Reclaimation 15

Now that’s done let’s check our provisioned space to see what is happening.

Interim Provisioned Space

  • Windows C: Drive – 24.9GB free space
    • Back to the original size
  • Datastore – 95.82GB free space
    • 0.35GB decrease from original size
  • Volume – 121.35GB consumed space
    • 24.42GB increase from the original size!

Space Reclaimation 16

So what’s going on then?  Well Windows is aware that blocks have been deleted and passed this information onto the VMFS file system, which has decreased the VMDK size using the vmkfstools –punchzero command, however no one has told my HP StoreVirtual it can reclaim the space and allocate it back out again.

The final step is to issue the vmkfstools -y 90 command.  More details about this command are covered in Jason Boche’s excellent blog post entitled ‘Storage: Starting Thin and Staying Thin with VAAI UNMAP‘ on this function.

Note: vmkfstools was deprecated in ESXi 5.1 and replaced with esxcli storage vmfs unmap -l datastorename  See VMware KK2057513 for more details

WARNING: Running vmkfstools -y 90 will create a balloon file on your VMFS datastore.  Make sure you have space to accommodate this on your datastore and that no operations will happen that could drastically increase the size of the datastore whilst the command is running

Space Reclaimation 17

One final check of provisioned space now reveals the following:

Final Provisioned Space

  • Windows C: Drive – 24.9GB free space
    • Back to the original size
  • Datastore – 95.81GB free space
    • 0.34GB decrease from original size
  • Volume – 95.04GB consumed space
    • 1.89GB decrease from the original size

Final Thought

Space reclamation has three different levels, guest operating system, VMFS file system and the storage system.  Reclamation needs to be performed on each of these layers in turn so that the layer beneath knows it can reclaim the disk space and allocate it out accordingly.

The process of space reclamation isn’t straight forward and should be ran out of hours as each step will have an impact on the storage sub system especially if it’s ran concurrently across virtual machines and datastores.

My recommendation is to reclaim valuable disk space out of hours to avoid potential performance or capacity problems.

VSAN Observer Windows Server 2012 R2

VMFocus Wide Featured Image

Problem Statement

When launching VSAN Observer rvc.bat on Windows Server 2012 R2 from C:\Program Files\VMware\Infrastructure\VirtualCenter Server\support\rvc the CMD shell automatically closes after entering password.

Troubleshooting Steps Taken

  • Launched rvc.bat using ‘Run As Administrator’
  • Installed nokogiri -v 1.5.5 as described in Andrea Mauro blog post VMware Virtual SAN Observer
  • Followed the steps in VMware KB2064240 ‘Enabling or capturing performance statistics using Virtual SAN Observer for VMware Virtual SAN)
  • Tried the following credentials when launching rvc.bat
    • administrator@vmf-vc01.vmfocus.com
    • administrator@localhost
    • administrator@vmf-vc01

Frustratingly none of these steps worked, so I decided to ask Erik Bussink whom I know has been working with VSAN for a while and had written the excellent blog post ‘Using the VSAN Observer in vCenter 5.5

Resolution

Launch rv.bat and enter the credentials in the format administrator@vpshere.local@FQDN which is administrator@vpshere.local@vmf-vc01.vmfocus.com for me

VSAN Observer 01

Enter the password for the SSO account administrator@vsphere.local

Enter vsan.observer <vcenter-hostname>/<Datacenter-name>/computers/<Cluster-Name>/ –-run-webserver -–force  which for me is vsan.observer vmf-vc01.vmfocus.com/Datacenter01/computers/Cluster01 –-run-webserver -–force

VSAN Observer 02

This fails with ‘OpenSSL::X509::CertificateError: error getting time’.

VSAN Observer runs under http, so to get around this add the parameter –no-https

vsan.observer vmf-vc01.vmfocus.com/Datacenter01/computers/Cluster01 –-run-webserver -–force –no-https

VSAN Observer 03

Launch http://vcentername:8010 which in my case is http://vmf-vc01:8010

VSAN Observer 04

Notice that I’m using FireFox as the browser, I found that Internet Explorer displayed the message {{profilingTimes}} and incomplete information.

VSAN Observer 05

vCloud Air DRaaS – Improvements

VMFocus Wide Featured Image

Last October, I blogged about the vCloud Air DRaaS – The Good, Bad & Ugly in which I covered the following aspects:

  • Service Overview
  • vCloud Connector
  • Test Recovery
  • Failover and Failback

Logical Overview

The main area which was lacking with vCloud Air DRaaS was failback.  Failback could only occur offline whilst the virtual machine is shutdown.  If we do the basic maths on a 50GB virtual machine on 100Mbps dedicated connection it would take 76 minutes.

Multiple this by 100 virtual machines then the numbers start to get crazy.  It would take 127 hours or a little over 5 days to failback.  Could you image saying to your Directors, sorry we need everyone to take a week off work whilst we failback?

For the sake of brevity the calculation is shown below.  Overhead would be around 10% on 100Mbps link, giving 90Mbps throughput.

Calculation

8Mb equal 1MB

50GB equals 51200MB

51200MB x 8Mb = 409,600 Mb

409,600 / 90Mbps = 4,551 seconds

4,551 seconds / 60 seconds = 76 minutes

100 VM’s x 76 minutes = 7,600 minutes

7,600 minutes / 60 = 127 hours

Good News

VMware understand that this kind of service was never going to be taken seriously by customers and could only be used for non production workloads and have announced some new service enhancements in a blog posted dated 20th January 2015.  The enhancements are:

  • Native failback support – provides seamless reverse replication from vCloud Air data centers to a customer’s environment, as well as support for offline data transfer via physical disk, to accommodate larger environments.
  • Multiple recovery points – enables multiple point-in-time copies of replicated VM(s), allowing you to roll back to earlier snapshots of your data center environment in the event of corruption or the need to recover to an earlier set of data.

Final Thought

This is an excellent move by VMware as now DRaaS could become reality.  What I would have hoped is that during failover VMware would have announced that they could offer virtual machine backups as part of the product offering.

Don’t forget DRaaS isn’t a panacea to fix application or service access for end users.  The same rules apply to an on-premises solution as they do a cloud based solution.

VSAN Configuration

VMFocus Wide Featured Image

In the last blog post I covered the VSAN Prerequisites, now it’s time to configure VSAN.  For the sake of completeness I had already configured a vDS with a port group named VSAN_VLAN20 as shown in the screenshot below.

VSAN vDS01

Enabling VSAN

Enabling VSAN is a one click operation at the Cluster level.  Simply tick to Turn On Virtual SAN

VSAN 01

  • Automatic enables VSAN to claim SSD and SATA and form a disk group on each ESXi Host
  • Manual enables the vSphere administrator to manually assign disks to the disk group on each ESXi Host

For my deployment ‘Automatic’ was the logical choice as I had already created a VMFS volumes on my local datastores on each ESXi Host and therefore VSAN would be unable to claim them.

Under Disk Management I can see the disk group which has been created and the local disks which have been assigned into the disk group.

VSAN 02

Storage Policy

VSAN automatically creates storage policies which are exposed via VASA when VSAN is enabled.  The storage policies available are:

  • Number of Failures to Tolerate
    • VSAN creates a RAID 1 copy of the working data set, with a witness on a third ESXi Host.  If the policy is set to 1 then 2 copies of each data set are created.  If the policy is set to 2 then 3 copies of each data set are created.
  • Number of Disk Stripes Per Object
    • An object is striped across magnetic disks to potentially increase performance.  Two things to bear in mind here, the first that if you have multiple magnetic disks in a disk group, then VSAN might stripe across those and the second is that a stripe width greater than one should only be used if you are getting read cache misses that cannot be served from a single magnetic disk e.g. VM requires 400 I/O
  • Flash Read Cache Reservation
    • Provides the ability to specify in percentage terms how much of an SSD is used for read cache e.g. 100GB VM with 1% policy would use 1GB on a 250GB SSD
  • Object Space Reservation
    • Provides the ability to reserve all space upfront using Lazy Zeroed Thick

Note: If you do not define a Storage Policy VSAN automatically defaults to ‘Number of Failures to Tolerate equals 1′

I have created a Storage Policy called VSAN Failure To Tolerate 1.  When you click on ‘Rules Based on Vendor Specific Capabilites’ and select ‘VSAN’ the above Storage Policies are presented and you can select which policy is required.

VSAN 03

Virtual Machines

The last thing to do is migrate virtual machines across to the VSAN Datastore.  This is a straight forward operation which only requires the vSphere administrator to select the correct Storage Policy.

VSAN 04

VSAN Prerequisites

VMFocus Wide Featured Image

In my last blog post I covered the VSAN Lab for VMFocus.com.  In this post I’m going to cover the prerequisites that need to be met before I will be in a position to install VSAN.

VMware Compatibility Guide

For a production environment your first point of call should be the VMware Compatibility Guide with the ‘what are you looking for tagged with ‘Virtual SAN’ to confirm that your hardware is compatible and perhaps most importantly will be supported by VMware.

Vmware Compatibility GuideIt’s also worth pointing out that in a production environment, you should cross reference the recommended drivers against those used within an custom OEM ESXi image from Dell or HP as pointed out in Cormac Hogans blog post entitled VSAN and OEM ESXi ISO images.

My preference would be to use a custom OEM ESXi image and then downgrade the drivers as you get visibility to the of all the manufacturers MIBs.

Even though the HP DL380 G6 is on the VMware Compatibility Guide the Smart Array P410i isn’t and nor are my hard drives.  So as this is a lab environment, I will be using the HP ESXi 5.5 U2 Custom ISO on the HP DL380 G6 with the latest drivers.

Step 1 – Firmware

The first step is to ensure the firmware is up to date on my HP DL380 G6 servers.  The easiest way to do this is to download an install HP Service Pack for ProLiant ISO which includes HP Smart Update Manager v7.1.0.

Launch the batch file entitled ‘launch_hpsum.bat’ and you will be redirected to the HP Smart Update Manager 7.1.0 web browser.

An inventory of the software packages on the HP Service Pack for ProLiant ISO will be undertaken so that HP SUM understands what firmware it has access to within the HP SUM ISO repository.

HP SUM 01

 

Once this completes launch Nodes, I will need to add in two Nodes for each ESXi host, one for iLO and one for the ESXi Host.  The iLO node will update the iLO firmware and the ESXi Host node will update the hardware firmware on the HP DL380 G6.

When each Node is added, you need to supply the correct credentials to access the iLO and ESXi Host and also apply the Baseline (in my case HP Service Pack for ProLiant 2014.09.0 at E:/hp/swpackages).

HP SUM 02

Next we perform an Inventory of the Node to see if any firmware needs to be upgraded

HP SUM 03

Once the inventory is performed, its simply a case of Deploying the updates and restarting the ESXi Host.

HP SUM 04

 

Step 2 – Storage Controller Queue Depth

Storage Controller queue depth can cause an issue with VSAN when it is serving normal SCSI requests and undertaking a rebuild operation after a failure.  The recommended queue depth for VSAN is 256.  We will verify that the Smart Array P410i meets this requirement by jumping into ESXTOP and pressing d then f and then d again to select QSTATS.

The Smart Array P410i is on vmhba1 and the queue depth is 1011.

Queue Depth 01

Step 3 – Storage Controller Read Cache

Now the firmware has been updated, the next step is to disable write caching on the Smart Array P410i.  This needs to be done to allow VSAN to make the decisions about write buffering and de-staging of IO to magnetic disk rather than the storage controller.

The Smart Array P410i does not allow direct pass through so we need to configure each SSD and SATA drive in RAID 0 and change the storage controller to 100% Read Cache and 0% Write Cache.  To do this we will use hpssacli (which is included in HP ESXi Custom ISO).  The hpssacli enables us to make changes to the storage controller via SSH.

Note: For each command we want to run, the prefix /opt/hp/hpssacli/bin/hpssacli 

The first thing we need to do is identify the slot the Smart Aray P410i is in, by typing /opt/hp/hpssacli/bin/hpssacli ctrl all show config

Controller Cache 05

As you can see mine is in ‘slot 0′ with is embedded.  Next I’m going to run the command /opt/hp/hpssacli/bin/hpssacli ctrl slot=0 show detail to check the cache ratio.

Controller Cache 06

Mine is currently set to 25% Read and 75% Write.  To change this run the command /opt/hp/hpssacli/bin/hpssacli ctrl slot=0 modify cacheratio=100/0 then another /opt/hp/hpssacli/bin/hpssacli ctrl slot=0 show detail confirms the change has been made

Controller Cache 07

Step 4 – Wipe Hard Drives

For VSAN to be able to see my and claim my hard drives they need to be wiped clean of the existing VMFS format. This can be achieved using parted Util which is included in vSphere kernel or Gparted.

My preference is to keep things simple, so I’m going to download Gparted and boot from the ISO to wipe my hard drives.

Step 5 – Check SSD Enabled

As my Smart Array P410i is running RAID 0 for my SSD drives, we need to verify that they are shown in vSphere as SSD by running the command esxcli storage core device list to obtain the network address authority (NAA)

SSD 03

As we can see Is SSD: false.  So I’m going to change this by running the command esxcli storage nmp satp rule add –device naa.600508b1001c3ffd07cece41dbad09b4  –satp VMW_SATP_LOCAL –option enable_ssd

Then we are going to reclaim the SSD by running the command esxcli storage core claiming reclaim –device naa.600508b1001c3ffd07cece41dbad09b4

To verify it’s now displayed as an SSD run the command esxcli storage core device list again

SSD 04

Step 6 – IGMP Snooping

VSAN requires Multicast traffic to be enabled for two reasons:

  • To discover participating members in a VSAN cluster and to determine VSAN host states
  • To update Clustering Service (CMMDS) for items such as object placement and statistics

IGMP (Internet Group Management Protocol) snooping is a multicast constraining mechanism that runs on layer 2 devices to manage and control multicast groups.

  • Reduces broadcast traffic
  • Increases security
  • Per host interaction

Best to show this with a picture!

VMFocus VSAN IGMP Diagram

VSAN traffic on my HP v1910 is on VLAN 20, so I will enable IGMP snooping for this VLAN only.  The first step is to enable IGMP on the HP v1910.  To this this click Network > IGMP Snooping > Enable

IGMP 03

After enabling IGMP globally, next I need to apply it to VLAN 20, select ‘Operation’

IGMP 01

 

IGMP Snooping ‘Enable’ > Version 3

IGMP 02

 

Verify that it IGMP is enabled for VLAN 20.

IGMP 04

 

Step 7 – Virtual Machines

Now that I have completed the tasks above, it’s time to create the following virtual machines on VMF-ESXi03 which has a 6 x 72GB 10K SAS drives in RAID 5

  • VMF-DC01 (Windows Server 2012 R2 Domain Controller
  • VMF-DC02 (Windows Server 2012 R2 Domain Controller)
  • VMF-CA01 (Windows Server 2012 R2 Certificate Authority)
  • VMF-SQL01 (Windows Server 2012 R2 running SQL Server 2012)
  • VMF-VC01 (Windows Server 2012 R2 running vCenter 5.5 U2)

See you on the next post when I have these all up and running.