Implementing & Testing Windows Server 2012: Deduplication

Microsoft have introduced deduplication as a standard feature into Windows Server 2012, I’m pretty excited about this, as it brings an enterprise feature set to SMB.

We are going to test deduplication which is part of the File Server role, before we do this what is deduplication and how will it benefit us?

Well deduplication is the process of eliminating duplicate copies of data. In most environments you often find large amounts of data which are repeated, think about documents that you work on and click File Save As after a small change.

Within Windows Server 2012 is a little tool called DDPEval.exe which is located in the WindowsSystem32 directory. This tool will go away and calculate expected space savings from using deduplication. We are going to use this to calculate our estimated space savings and then see how this compares to the actual savings.

With this in mind, what data are we going to pop onto the Windows Server 2012 File Server?

General – 2,347 Files consisting of 266 Folders and equating to 9.29GB of data. This is made up of your everyday Office 2010 documents, some PDF’s and JPEG’s, the usual files you find on most servers.
PDF – 562 Files consisting of 48 Folders equating to 660MB of data. These are all PDF’s
Pictures – 360 Files equating to 1.21GB of data. These are JPEGs

These files are my own, and I think I’m pretty good at not duplicating files, so I’m interested in knowing how much space we will save.

Enabling Deduplication

To enable deduplication we need install the File role onto our server. To do this launch Server Manager and click Add Roles & Features

Select Role-based or feature installation and click Next

Select the server to install the role on, in this case VMF-FILE01 and click Next

Expand File And Storage Services (Installed) then expand File and iSCSI Services and select File Server and Data Deduplication

Click next on Select features

Click Install

Once completed click Close

I have a Data Drive (D:) which is 50GB in size, this is going to be the test basis for deduplication.

I’m now going to copy the data I mentioned at the start of this blog onto the Data D: Drive.

To make thing slightly more interesting, I’m going to make a duplicate of each folder, this will leave us with:

2 x General Folders
2 x PDF Folders
2 x Picture Folders

Screenshot of the Folder Properties

Screenshot of the Folder List

Screenshot of D: Drive Space Used

So now we are ready, let’s run DDPEval.

DDPEval

Jump into the CMD line and go into C:WindowsSystem32 and run DDPEval D: /O:C:/DDPEval

This command will run the DDPEval Tool against the D: Drive and create an output file on C: Drive named DDPEval

The results are in, and I’m impressed, Windows believes it can save me 57%. Let’s see what happens when we enable deduplication.

Configuring Deduplication

By default deduplication happens in the background as a process, however the process is low priority and will be paused if the deduplication process has an impact on system performance.

With this in mind, we are now going to configure and enable deduplication. Go back into Server Manager select File and Storage Service then Volumes and then Disks

Select Disk 1 and then go down to Volumes at the bottom and right click on D: and select Configure Data Deduplication

Enable data deduplication and change the de duplicate files older than (in days) to 0 Then click on Set Deduplication Schedule

Next we will Enable throughout optimization and the start time which works best for me is 22:00 and the task can last 6 hours. Click OK and then OK again.

As I’m really impatient, I’m going to run a Powershell command to start it now!

Before we do, a quick check on thins from the Volumes View in Server Manager

As you can see, no deduplication has been run yet and we have 27.6GB disk space free.

Right then, Powershell time. Let’s run Start-DedupJob -Volume D: -Type Optimization which will kick off the scheduled deduplication job

If you want to check on progress run the command Get-DedupJob

Time for a cup of tea whilst this finishes off.

The results are in.

As you can see we have saved a massive 61% Now you might be thinking well that’s not really a fair test as you actually just copied the folders and all the files within General, Pictures and PDF’s.

This time let’s remove the copied folders so we are just left with the three original folders.

Let’s run the Powershell CMD Start0DedupJob -Volume D: -Type Optimization and wait for the results.

I’m impressed again, we have a 23% space saving on, imagine how much this would be within a business!

Deduplication Considerations

When using deduplication, as always you have a number of gotcha’s that I wanted to point out:

Cannot be an operating system volume
Can only be enabled on a per volume basis.
Can be on shared storage, however the partition must be formatted as NTFS
Cannot be removable drives
Do not use on Exchange or SQL servers
Overhead is 1 CPU Core per Deduplication Job/Schedule
It can work with DFS R shares with targets having either deduplicated or non deduplicated DFS R shares (note I haven’t tested this, see http://technet.microsoft.com/en-us/library/cc773238.aspx#BKMK_074).

Large Files

I performed some further testing with large files, in fact I actually used the Windows Server 2012 ISO which I copied twice, meaning it was using 6.88GB of space rather than 3.44GB. I wasn’t able to find any issues with using large files and Server 2012 deduplicated the ISO, with savings going up from 2.72GB (on the last test) to 7.13GB on this test. I’m not 100% sure how Windows has done this as the ISO is only 3.44GB in size and our space savings have increased by 4.41GB!

Final Thoughts

The deduplication feature within Server 2012 is excellent. It showed space savings of between 23% to 61% on a small amount of data. It can and should be enabled on File Servers.

4 thoughts on “Implementing & Testing Windows Server 2012: Deduplication”

Martin says:

22 November 2012 at 10:51

Am I wrong to suppose dedupe ist going to be the same with Windows Storage Server 2012 ? If not used as primary DC we might save a lot of money using this one instead (no CALs needed).

1. Craig says:
  
  22 November 2012 at 11:47
  
  Hi Martin, indeed, make sure it’s the Windows Server Storage 2012 Standard not the Workgroup edition.
  
Patrick says:

22 February 2013 at 23:23

Thanks for this guide, helped me a lot.
One suggestion, it would be nice if you open a screenshot and could walk throught he screenshots instead of opening each of them 😉

bradjensenelstore says:

6 February 2015 at 21:12

The one thing it is missing to make it like the big name dedupe appliances is replication, so we rote it.