Select Page

Exploring Data Deduplication for the Enterprise

APRIL, 2018
If you work in an IT environment where you have to deal with data storage, backup or transfer; then you must’ve heard the term: data deduplication (dedup for short).

This article explores deduplication: What it is, what it does, how it’s used and why is it important?

What is Data Deduplication?

Data deduplication facilitates optimized usage of dedicated storage space by eliminating redundant copies of data. Instead of keeping an exact copy of the data, the process removes the duplicate and adds a reference point to the original data.

In order to do this, stored data is analyzed to detect duplicate byte patterns. These identical patterns are then removed and replaced.

In light of this explanation, you must be wondering how often do these duplicate byte patterns occur and how much of an impact can dedup make in terms of storage efficiency?

How often do these duplicate byte patterns occur?

Same byte patterns occur dozens, hundreds or even thousands of times, depending on the scale of the data. For your reference, consider the amount of times you make small changes to a document or a file or a powerpoint presentation; each time there’s a new byte pattern being created. Considering that, the amount of duplicate data for a full data set tends to be enormous.

How much of an impact can dedup make in terms of storage efficiency?

The dataset or workload of the volume governs how much storage optimization you can achieve via deduplication. Datasets with high dedup ratio can experience optimization up to 95% or reduction in storage space utilization by 20 times.

Exploring Data Deduplication for the Enterprise
Besides the optimized utilization of storage space, dedup also contributes to cost effectiveness. Dedup reduces the storage space used by data; that means you consume at least 30% less storage space. This implies you pay 30% less as well.

Using Data Deduplication – Use Case

In order to clarify the benefits of data deduplication, consider this use case. Let’s say you have an email that was sent to all of your employees. This email had a 1 MB attachment; if you have 100 employees and all of them backup their data, then that’s 100 instances. This means that without deduplication, 100 MB of data will be stored but with dedup, it’ll be 1 MB. All of the other instance in the server will be replaced by a reference point redirecting to the original 1 MB.

To this point, we’ve established three things about Data Deduplication:

  • It reduces storage space usage by eliminating duplicate byte patterns.
  • Consequently you need less storage space.
  • The cost implications of storage are also equally reduced.

Now let’s discuss where you can use dedup.

Where can you use Data Deduplication?

Storage appliances are great targets for data deduplication. This includes both physical appliances and virtual appliances. Storage appliances like Network Attached Storage (NAS), Hyper-converged Appliances and Storage Area Networks (SAN) should be paired with dedup services to effectively leverage the acquired storage space.

All StoneFly appliances deliver enterprise level dedup services to facilitate optimized utilization of available resources.

Backup appliances are also good targets for dedup services; as is evident from the previously mentioned use case. If multiple team members have the same copy of something and they are backing it up to ensure data loss prevention; then unnecessary storage space is being consumed. With dedup, this storage space utilization is optimized and backup costs are effectively reduced.

How is Data Deduplication deployed?

Implementing dedup services varies depending on application and the vendor. For instance, the implementation process is different for appliances that include deduplication services and for standalone deduplication services.

Generally, there are two ways of deploying deduplication technology:

  • At the source.
  • At the target.


Exploring Data Deduplication for the Enterprise
Deduplication at source

This is deduplication at the source of data; prior to data transfer. For instance, you have a storage appliance that backs up data at scheduled intervals. This data first goes through dedup and then is sent for backup.

The benefit of dedup at source is that, besides efficient storage space consumption, this reduces bandwidth consumption. Therefore cost reduction is amplified. The downside to this is that since data has to be deduped before transmission, the data transfer rate suffers.

Deduplication at target

Contrary to dedup at source, dedup at target takes place at the receiving end. In the above mentioned setup dedup at target happens at the backup appliance end. Dedup at target is further classified in two types: in-line deduplication and post process deduplication. In-line deduplication occurs before the backup is written; while post process deduplication happens after the backup is completed.

The benefit of in-line deduplication is that it efficiently uses the space dedicated for backup data. However, the downside is that it increases the time consumed by the backup process.

Conclusion – Simplify Data Deduplication with StoneFly’s Appliances

Evidently deduplication is necessary for optimized storage and backup. With StoneFly’s appliances, you can acquire enterprise level deduplication services and leverage your acquired resources effectively at reduced costs.

Instead of indulging into complex dedup technology issues, setup StoneFly’s appliances and let the experts take care of your data requirements for you.

Close Bitnami banner