Dell DR series Administrator's Manual

Contents

Whenever a document is repeatedly backed up, the 0s and 1s stay the same because the file is simply being duplicated.The similarities between two files can be easily identified using block deduplication because the sequence of their 0sand 1s remain exactly the same. In contrast to this, there are differences in online data. Online data has few exactduplicates. Instead, online data files include files that may contain a lot of similarities between each file. For example, amajority of files that contribute to increased data storage requirements come pre-compressed by their nativeapplications, such as:• Images and video (such as the JPEG, MPEG, TIFF, GIF, PNG formats)• Compound documents (such as .zip files, email, HTML, web pages, and PDFs)• Microsoft Office application documents (including PowerPoint, MS Word, Excel, and SharePoint)NOTE: The DR Series system experiences a reduced savings rate when the data it ingests is alreadycompression-enabled by the native data source. It is highly recommended that you disable data compressionused by the data source. For optimal savings, the native data sources need to send data to the DR Seriessystem in a raw state for ingestion.Block deduplication is not as effective on existing compressed files due to the nature of file compression because its 0sand 1s change from the original format. Data deduplication is a specialized form of data compression that eliminates alot of redundant data. The compression technique improves storage utilization, and it can be used in network datatransfers to reduce the number of bytes that must be sent across a link. Using deduplication, unique chunks of data, orbyte patterns, can be identified and stored during analysis. As the analysis continues, other chunks are compared to thestored copy and when a match occurs, the redundant chunk is replaced with a small reference that points to its storedchunk. This reduces the amount of data that must be stored or transferred, which contributes to network savings.Network savings are achieved by the process of replicating data that has already undergone deduplication.By contrast, standard file compression tools identify short repeated substrings inside individual files, with the intent ofstorage-based data deduplication being to inspect large volumes of data and identify large amounts of data such asentire files or large sections of files that are identical. Once this has been done, this process allows for the system tostore only one copy of the specific data. This copy will be additionally compressed using single-file compressiontechniques. For example, there may be cases where an email system may contain 100 or more emails where the same 1Megabyte (MB) file is sent as an attachment and the following shows how this is handled:• Without data deduplication, each time that email system is backed up, all 100 instances of the same attachment aresaved, which requires 100 MB of storage space.• With data deduplication, only one instance of the attachment is actually stored (all subsequent instances arereferenced back to the one saved copy), with the deduplication ratio being approximately 100 to 1). The uniquechunks of data that represent the attachment are deduplicated at the block chunking level.NOTE: The DR Series system does not support deduplication of any encrypted data, so there will be nodeduplication savings derived from ingesting encrypted data. The DR Series system cannot deduplicate datathat has already been encrypted because it considers that data to be unique, and as a result, cannotdeduplicate it.In cases where self encrypting drives (SEDs) are used, when data is read by the backup application, it is decrypted bythe SED or the encryption layer. This works in the same way as if you were opening an MS-Word document that wassaved on a SED. This means that any data stored on a SED can be read and deduplicated. If you enable encryption in thebackup software, you will lose deduplication savings because each time the data is encrypted, the DR Series systemconsiders it to be unique.Replication: Replication is the process by which the same key data is saved from multiple storage devices, with the goalof maintaining consistency between redundant resources in data storage environments. Data replication improves thelevel of fault-tolerance, which improves the reliability of maintaining saved data, and permits accessibility to the samestored data. The DR Series system uses an active form of replication that lets you configure a primary-backup scheme.During replication, the system processes data storage requests from a specified source to a specified destination (alsoknown as a target) that acts as a replica of the original source data.16