Microsoft have introduced deduplication as a standard feature into Windows Server 2012, I’m pretty excited about this, as it brings an enterprise feature set to SMB.
We are going to test deduplication which is part of the File Server role, before we do this what is deduplication and how will it benefit us?
Well deduplication is the process of eliminating duplicate copies of data. In most environments you often find large amounts of data which are repeated, think about documents that you work on and click File Save As after a small change.
Within Windows Server 2012 is a little tool called DDPEval.exe which is located in the WindowsSystem32 directory. This tool will go away and calculate expected space savings from using deduplication. We are going to use this to calculate our estimated space savings and then see how this compares to the actual savings.
With this in mind, what data are we going to pop onto the Windows Server 2012 File Server?
- General – 2,347 Files consisting of 266 Folders and equating to 9.29GB of data. This is made up of your everyday Office 2010 documents, some PDF’s and JPEG’s, the usual files you find on most servers.
- PDF – 562 Files consisting of 48 Folders equating to 660MB of data. These are all PDF’s
- Pictures – 360 Files equating to 1.21GB of data. These are JPEGs
These files are my own, and I think I’m pretty good at not duplicating files, so I’m interested in knowing how much space we will save.
To enable deduplication we need install the File role onto our server. To do this launch Server Manager and click Add Roles & Features
Select Role-based or feature installation and click Next
Select the server to install the role on, in this case VMF-FILE01 and click Next
Expand File And Storage Services (Installed) then expand File and iSCSI Services and select File Server and Data Deduplication
Click next on Select features
Once completed click Close
I have a Data Drive (D:) which is 50GB in size, this is going to be the test basis for deduplication.
I’m now going to copy the data I mentioned at the start of this blog onto the Data D: Drive.
To make thing slightly more interesting, I’m going to make a duplicate of each folder, this will leave us with:
- 2 x General Folders
- 2 x PDF Folders
- 2 x Picture Folders
Screenshot of the Folder Properties
Screenshot of the Folder List
Screenshot of D: Drive Space Used
So now we are ready, let’s run DDPEval.
Jump into the CMD line and go into C:WindowsSystem32 and run DDPEval D: /O:C:/DDPEval
This command will run the DDPEval Tool against the D: Drive and create an output file on C: Drive named DDPEval
The results are in, and I’m impressed, Windows believes it can save me 57%. Let’s see what happens when we enable deduplication.
By default deduplication happens in the background as a process, however the process is low priority and will be paused if the deduplication process has an impact on system performance.
With this in mind, we are now going to configure and enable deduplication. Go back into Server Manager select File and Storage Service then Volumes and then Disks
Select Disk 1 and then go down to Volumes at the bottom and right click on D: and select Configure Data Deduplication
Enable data deduplication and change the de duplicate files older than (in days) to 0 Then click on Set Deduplication Schedule
Next we will Enable throughout optimization and the start time which works best for me is 22:00 and the task can last 6 hours. Click OK and then OK again.
As I’m really impatient, I’m going to run a Powershell command to start it now!
Before we do, a quick check on thins from the Volumes View in Server Manager
As you can see, no deduplication has been run yet and we have 27.6GB disk space free.
Right then, Powershell time. Let’s run Start-DedupJob -Volume D: -Type Optimization which will kick off the scheduled deduplication job
If you want to check on progress run the command Get-DedupJob
Time for a cup of tea whilst this finishes off.
The results are in.
As you can see we have saved a massive 61% Now you might be thinking well that’s not really a fair test as you actually just copied the folders and all the files within General, Pictures and PDF’s.
This time let’s remove the copied folders so we are just left with the three original folders.
Let’s run the Powershell CMD Start0DedupJob -Volume D: -Type Optimization and wait for the results.
I’m impressed again, we have a 23% space saving on, imagine how much this would be within a business!
When using deduplication, as always you have a number of gotcha’s that I wanted to point out:
- Cannot be an operating system volume
- Can only be enabled on a per volume basis.
- Can be on shared storage, however the partition must be formatted as NTFS
- Cannot be removable drives
- Do not use on Exchange or SQL servers
- Overhead is 1 CPU Core per Deduplication Job/Schedule
- It can work with DFS R shares with targets having either deduplicated or non deduplicated DFS R shares (note I haven’t tested this, see http://technet.microsoft.com/en-us/library/cc773238.aspx#BKMK_074).
I performed some further testing with large files, in fact I actually used the Windows Server 2012 ISO which I copied twice, meaning it was using 6.88GB of space rather than 3.44GB. I wasn’t able to find any issues with using large files and Server 2012 deduplicated the ISO, with savings going up from 2.72GB (on the last test) to 7.13GB on this test. I’m not 100% sure how Windows has done this as the ISO is only 3.44GB in size and our space savings have increased by 4.41GB!
The deduplication feature within Server 2012 is excellent. It showed space savings of between 23% to 61% on a small amount of data. It can and should be enabled on File Servers.