Data Deduplication in Windows Server - dummies

By Doug Lowe

Beginning with Windows Server 2012, Microsoft has included an innovative technology called data deduplication, which can dramatically reduce the amount of actual disk space required to store your data. Depending on the type of data, you can expect to save anywhere from 20 percent to more than 80 percent. At 20 percent savings, 10TB of data consumes only 8TB of disk storage. At 80 percent savings, 10TB consumes just 2TB.

Data deduplication works by finding portions of files that are identical and storing just a single copy of the duplicated data on the disk. The technology required to find and isolate duplicated portions of files on a large disk is pretty complicated. Microsoft uses an algorithm called chunking, which scans data on the disk and breaks it into chunks whose average size is 64KB. These chunks are stored on disk in a hidden folder called the chunk store. Then, the actual files on the disk contain pointers to individual chunks in the chunk store. If two or more files contain identical chunks, only a single copy of the chunk is placed in the chunk store and the files that share the chunk all point to the same chunk.

Microsoft has tuned the chunking algorithm sufficiently that in most cases, users will have no idea that their data has been deduplicated. Access to the data is as fast as if the data were not deduplicated. For performance reasons, data is not automatically deduplicated as it is written. Instead, regularly scheduled deduplication jobs scan the disk, applying the chunking algorithm to find chunks that can be deduplicated.

To use data deduplication, you must first enable the data deduplication feature in Server Manager. In Server Manager, choose Add Roles and Features. Then when you get to the Server Roles page, expand the File and Storage Services role and select Data Deduplication.

image0.jpg

To configure data deduplication, open Server Manager, choose File and Storage Services, click Volumes, right-click the volume that you want to deduplicate, and then choose Configure Data Deduplication. The Deduplication Settings page appears.

image1.jpg

From this page, you can enable data deduplication, exclude certain file types, and set a schedule for the deduplication jobs to run. Once deduplication is set up, give the deduplication job time to run. Soon enough, you’ll start to see the amount of free space on the volume grow as the data is deduplicated.