How to deploy Data Duplication on Windows Server

Storage capacity plays a pivotal role in balancing data costs and performance, compelling file server and backup administrators to leverage every available advantage to meet the escalating demands of users while adhering to budget constraints.

Data deduplication emerges as a powerful solution, effectively replacing redundant data within file blocks and significantly curtailing storage space usage. This technology proves particularly beneficial in environments rife with duplicate data, such as servers tasked with storing backup jobs or virtualization images. By employing hash comparisons, deduplication identifies duplicate information and substitutes it with pointers directing to a single source.

Since its introduction with Windows Server 2012, Microsoft’s Data Deduplication feature has remained a vital component of Windows Server. According to Microsoft, users of this feature can achieve remarkable storage savings, with reductions of up to 50% for user documents and an impressive 95% for virtualization libraries.

Incorporating data deduplication into storage infrastructure not only lowers storage costs but also enhances capacity and sustains the overall efficiency and capability of file servers. This article delves into the installation and configuration of Windows Server Data Deduplication, alongside best practices and troubleshooting insights.

Deduplication use cases

Any Windows Server that accommodates substantial data volumes stands as a prime candidate for the Data Deduplication utility. This includes file servers, backup storage servers, and virtualization hosts—most enterprises typically utilize at least one, if not all, of these server types.

When isolating storage environments to their own volumes, consider the following deployments:

  • Software development shares.
  • Multimedia storage shares (note that deduplication may not always be effective for these file types).
  • Large data volumes.

Install Data Deduplication

Before proceeding, ensure that the current version of Windows Server on your device supports data deduplication, which is available from Windows Server 2012 onward. It is essential to note that Data Deduplication requires the NTFS file system.

As Data Deduplication is not installed by default, follow these steps for installation:

  1. Open Server Manager.
  2. Select Manage and then Add Roles and Features.
  3. Click Next through the subsequent three pages.
  4. Expand File and Storage Services and then File and iSCSI Services.
  5. Select Data Deduplication.
  6. When prompted, select Add Features.
  7. Continue clicking Next through the remaining pages.
  8. On the Confirmation page, click Install.
Install data deduplication.

While most Windows administrators prefer the GUI for server management, Data Deduplication can also be installed using the following PowerShell cmdlet:

Install-WindowsFeature -Name FS-Data-Deduplication

PowerShell can also install data deduplication.

Configure and enable Data Deduplication

Data Deduplication can be managed through the File and Storage Services node in Server Manager, offering a plethora of information and configuration options for effective storage management. It is advisable to explore these options thoroughly. Tools like FSRM provide numerous reporting and data control features that can enhance storage optimization, ultimately improving the quality and efficiency of backup jobs.

To access the configuration interface, follow these steps:

  1. Expand File and Storage Services.
  2. Expand the Volumes node.
  3. Right-click the volume you wish to manage with data deduplication and select Configure Data Deduplication.
  4. Change the Data Deduplication setting from Disabled to General-purpose server.
  5. Specify a file age for deduplication, with the default being three days.
  6. Exclude file types as necessary.
  7. Select Set Deduplication Schedule.
  8. Check the boxes for Enable background optimization and Enable throughput optimization.
  9. Schedule the deduplication process to run during off-peak hours, ideally at night, while being mindful of other processes that may also be scheduled during this time, such as backups.
  10. Click OK and Apply to activate deduplication with your specified settings.
Step 3 of configuring data deduplication is clicking Configure Data Deduplication in the Volumes node.
Step 4 of configuring data deduplication is changing the setting from Disabled to General purpose server.
Step 8 of configuring data deduplication is selecting Enable background optimization and Enable throughput optimization settings.

The various deduplication types are tailored to optimize the process for specific data types. The General-purpose server option will suffice for most administrators, but users can also choose from the following:

  • General-purpose: Ideal for standard file servers.
  • Hyper-V: Optimized for Hyper-V workloads or VDI devices.
  • Backup: Focuses on deduplication of backup sets.

Defining a file age is crucial to prevent deduplication resources from being allocated to frequently changing files. By selecting an appropriate file age, deduplication can concentrate on inactive files, such as those older than three days. Additionally, optimizing deduplication involves excluding file types that are unlikely to benefit from the process, including database files, certain multimedia formats, highly compressed files, and files smaller than 32KB.

Monitor, optimize, and assess volumes for deduplication

Implementing deduplication on volumes is justified only if it enhances storage efficiency. Given that it places additional stress on the server’s CPU, memory, and storage subsystems, justifying its application is essential. The Data Deduplication Saves Evaluation Tool—ddpeval.exe—is designed to estimate the amount of storage space that deduplication can reclaim on a volume.

This command-line tool can be found in the WindowsSystem32 directory post-installation of Data Deduplication. It is important to note that it does not operate on system or boot volumes and will not function on volumes where deduplication is already configured.

The ddpeval-explorer file should be in WindowsSystem32.

To evaluate drive G: on the local system, for instance, execute the following commands:

  1. Type ddpeval.exe G:
  2. To evaluate a specific directory, such as SalesData, type ddpeval.exe “G:SalesData”.

The tool can also be utilized across the network using a UNC path. For example, type ddpeval.exe “fileserver01g$SalesData”.

PowerShell provides another avenue for reporting deduplication information. The Get-DedupStatus cmdlet can be employed to display the current deduplication status.

Get-DedupStatus shows the current deduplication status.

Alternative products

While Windows Server Data Deduplication is a complimentary feature, seamlessly integrated with Windows and offering robust configuration and monitoring capabilities, organizations may find alternative data deduplication products better suited to their specific needs.

Organizations should explore other tools if they operate within a cloud or hybrid infrastructure or require deduplication across both Windows and Linux file servers. Dedicated utilities often provide performance and automation advantages, frequently integrating deduplication into broader backup and imaging services rather than functioning as standalone solutions.

Some noteworthy alternatives include:

  • Veeam Backup and Replication, which supports backup and recovery for hybrid and cloud environments with integrated deduplication.
  • Arcserve UDP, renowned for its extensive backup and recovery capabilities, also features integrated deduplication.
  • Acronis Cyber Protect, an image-based backup product, comes equipped with integrated deduplication.

Best practices and troubleshooting

Data Deduplication has established itself as a reliable technology, widely adopted by administrators to manage storage costs and enhance data backup efficiency. However, it is essential for administrators to remain cognizant of troubleshooting tips and best practices.

Troubleshooting

Consider the following tips to troubleshoot and optimize deduplication jobs:

  • Ensure sufficient memory is available for deduplication processes, with a recommended allocation of 1 GB of RAM per 1 TB of data.
  • Check processor capability for any conflicting jobs.
  • Assess the storage system’s I/O performance.
  • Utilize the ddpeval.exe utility to verify the effectiveness of deduplication.
  • Review Event Viewer logs, particularly the Microsoft-Windows-Deduplication/Operational log.

PowerShell can also be employed to monitor deduplication performance. Utilize the Get-DedupJob, Get-DedupStatus, and Update-DedupStatus cmdlets for more detailed insights.

Best practices

Keep the following best practices in mind for future deduplication deployments:

  • Utilize the latest versions of Windows Server to access the most advanced features.
  • Avoid enabling Data Deduplication on system volumes; restrict its use to volumes containing data.
  • Ensure volumes have adequate free space to accommodate deduplication processes.
  • Prevent scheduling conflicts with backup and data replication software.
  • Schedule deduplication tasks during off-peak hours.
  • Conduct tests on data deduplication to assess its impact on performance and storage efficiency.
  • Consider compressing data after deduplication.

Damon Garn owns Cogspinner Coaction and provides freelance IT writing and editing services. He has authored multiple CompTIA study guides, including those for Linux+, Cloud Essentials+, and Server+, and contributes extensively to Informa TechTarget, The New Stack, and CompTIA Blogs.

Winsage
How to deploy Data Duplication on Windows Server