Does Backup Need a File System of its Own? A Deep Dive Into VembuHIVETM File Systems

Vembu
Jayashree Subramanian
Vembu Technologies
December 2014
Executive Summary
Smart File System

Two interesting trends in the backup industry have created the need for a smart file system. First, there is a rising demand for a more sophisticated use of backup data, than just traditional recovery. Think how interesting it would be to share and collaborate the files residing inside in your disk image backup without having to mount or boot the image. This is possible only if the file system is able to read the image file, bit-by-bit and understand what files are stored inside the image file.

Second, the demand for online backup has led the service providers to host their backup on cloud infrastructure. This means that backup applications could potentially harness the cluster file system and computing possibilities of cloud. For example, it is possible to dramatically improve the read/write speed of backup data by storing it in SAN/NAS and distributing the operations to a large cluster of servers.

This opens up new avenues for intelligent use cases for backup data such as big data analytics. The traditional file systems (NTFS, EXT, FAT, etc.) and the modern cloud file systems were not designed for backup applications. The file formats (VHD, VMDK) used by backup products do not exploit the power of cloud. A single file system cannot be a panacea for all applications which is why Vembu Technologies developed its own cloud file system called VembuHIVETM.

WHAT IS A FILE SYSTEM?

In computing, a file system (or file systems) is used to control how data is stored and retrieved. Without a file system, information placed in a storage area would be one large body of data with no way to tell where one piece of information stops and the next begins.

SOURCE: Wikipedia

What is VembuHIVETM
Efficient Cloud File System

VembuHIVETM is an efficient cloud file system designed for large-scale backup and disaster recovery (BDR) application with support for advanced use-cases. VembuHIVETM can be thought of as a File System of File Systems with in-built version control, encryption, deduplication and in-built error correction. During the backup, the data present in the backup files or an image is separated from all the bookkeeping associated with it, i.e., its metadata and stored as objects.

VembuHIVETM manages the metadata smartly through its patent-pending technology, in a way that is agnostic to the file system of the backup, which is why we call VembuHIVETM, a file system of file systems. This helps the backup application to instantly associate the data in VembuHIVETM to any file system metadata, thereby allowing on-demand file or image restores in many possible file formats. The data and metadata storage, harness cluster file system and computing and storage.

This is a really powerful concept that will address some very interesting use cases not just in the backup and recovery domain but also in other domains, such as big-data analytics.

The key to the design of VembuHIVETM is its novel mechanism to capture and generate appropriate metadata and store it intelligently in a cloud infrastructure. The increment data (the changes with respect to a previous version of the same backup) are treated like versions in a version control system (CVS, GIT). This revolutionary way of data capture and metadata generation provides seamless support to a wide range of complex restore use cases.

Use Cases of VembuHIVETM
Built-In Version Control and Point-In-Time Restores

During an incremental backup, VembuHIVETM stores only the changed blocks since the latest backup, similar to versions in a version control system. Due to this and the smart metadata management that is flexible enough to expose the underlying data in multiple ways, VembuHIVETM exposes every incremental as a virtual full backup. i.e. a restoration of a backup with any time stamp, will not require merging of all the changes to a previous full backup. A point-in-time full is available for every timestamp during which an incremental backup was done. These backup versions can be instantly booted or mounted without any tedious merges.

Built-In Error Correction Techniques for Reliability

A parity file (additional redundancy) is added to each data chunk in the VembuHIVETM using advanced error correction techniques. In the event of a data corruption, the information in the parity file is used in fixing errors in the VembuHIVETM file storage. VembuHIVETM also maintains such parity information at the backup file or disk image-level, chunk- level, repository-level, and client or backup-level. These capabilities are not provided in the existing file system.

Mail/Document/File Level Restores

VembuHIVETM is intelligent enough to understand the way content is organized inside the backup data and interpret in multiple ways, irrespective of where it came from (BDR, file backup, or virtual machine backup) and thus can perform on-demand granular restores.

Deduplication for Storage Reduction

Besides storage capacity, the more data there is to manage, the greater is greater the impact and costs associated with provisioned servers, network bandwidth and even human resources to manage the infrastructure. In the face of high volume data growth, backup & restore products still need to meet recovery time and recovery point objectives (RTOs and RPOs). Vembu’s innovative, global, variable-length, block level, client & server-based deduplication technology provides for dramatic storage cost and bandwidth savings.