Redefining Data Protection on a Virtual File System
This paper examines how to use Vembu BDR to implement distributed backup and disaster recovery (DR) operations in a centrally managed data protection environment with an ingenious twist. Rather than store image backups of VMs and block-level backups of physical and VM guest host systems as a collection of backup files, Vembu BDR utilizes a document-oriented database as a backup repository, dubbed VembuHIVE, which Vembu virtualizes as a file system.
Documents in a document-oriented database encapsulate information encoded in value-key pairs using a language, such as XML or Java Script Object Notation JSON. Like a file, a document can store any data without following a strict schema. In addition, every document in a document-oriented database can be retrieved using a unique key for the document and can be queried on its content using the encoding language as a query language.
Moreover, the value-key construct for documents creates a database that is highly scalable through the simple addition of storage and compute resources. The scalability of a document-oriented database has been leveraged by a number of large commercial web sites, including eBay.
During a backup, the VembuBDR service, which handles all backup and recovery functions on the Vembu BDR server, de-duplicates and compresses data from image- and block-based backups. Next, VembuBDR encodes the processed data with content metadata and streams the new collection of processed data and content metadata as documents into the VembuHIVE document-oriented database using very large data blocks. During a full backup of the Exchange VM, the VembuBDR service streamed processed data into VembuHIVE using blocks that averaged just under 3MB.
Backup Specific Restore Anywhere
By replacing structural metadata related to a VM host’s file system with content meta data, before VembuBDR commits the data to a document in VembuHIVE, enables Vembu to virtualize VembuHive as a file system, with respect to backup documents. With VembuHIVE acting as a virtual file system, the functionality of the Vembu BDR product can be extended by introducing modules that mimic advanced OS file system utilities that provide such feature as de-duplication, error correction, and version control.
In particular, by applying formatting utilities to VembuHIVE documents, Vembu BDR is able to present a full disk image associated with a VM backup in multiple disk formats, such as .vhd, .vhdx, .vmdk, and .img, on a virtual drive created on the Vembu BDR server. More importantly, Vembu BDR is able to leverage the presentation of disk images with full read/write access on demand in a number of significant ways, including the need of many data protection packages to run backups directly on a VM to protect and recover application-level data items.
By mounting the logical disks of a vSphere VM in a virtual drive as local disk files, Vembu BDR is able to implement application-level backup and recovery functions that would typically require a full backup agent installed on the original VM’s host OS. These disk images can also be used to instantly boot a backed-up VM as a Hyper-V VM, without regard to the original VM’s host. Using Vembu Instant-boot simplifies overhead tasks by eliminating the need to mount a network datastore containing read-only pointers to backup data, remap disk writes to a cache or redo logs, and consolidate the pointers and logs into a standard configuration.
For an Instant-boot of a VM image, the VembuBDR service creates a persistent document within VembuHIVE that can be read, modified, and saved. We were able to recover our Exchange VM by choosing a backup time and booting the respective image using a fully automated recovery process that completed in well under five minutes. VembuBDR utilizes the local server’s Hyper-V defaults to configure the new VM. Consequently, we were able to customize the settings of the Exchange VM using Hyper-V Manager and comply with an SLA to restore Exchange in about 5 minutes to a state representing a loss of no more than 30 minutes of email processing.
Backups and Business Continuity
To implement host-level VM image backups—often dubbed agentless backups—in a vSphere virtual infrastructure (VI), Vembu BDR utilizes VMware application programming interfaces (APIs), including the vSphere Storage APIs for Data Protection (VADP), In particular, VADP provide a snapshot-based framework for VM backup, which Vembu BDR leverages using the latest release of VMware Virtual Disk Development Kit (VDDK 5.5) to access, manipulate, and transfer VM data.
By combining tight integration with vSphere for VM image backups with unified block-level OS and application backups of physical systems, Vembu BDR provides a critical business value to any CIO working with line of business (LoB) executives. For LoB executives, the most important function of IT is the ensuring of business continuity for key business applications. Moreover, these executives drive the growing demand on IT to comply with a service level agreement (SLA) for business continuity. Pivotal components in such an SLA are a Recovery Point Objective (RPO), which limits the amount of data that can be lost, and a Recovery Time Objective (RTO), which limits amount of time taken to recover after a system outage.
Adding Application Awareness
For data protection, a VI provides IT with greater flexibility; however, a VI simultaneously presents IT with radically different logical constructs from a typical physical infrastructure. A unique duality characterizes a VI. From a physical perspective, a VI is a collection of host servers running a common hypervisor and supporting a set of applications. From a logical perspective, each hypervisor application is a VM running a distinct OS and hosting its own set of applications.
While VI management software attempts to make VM duality transparent, data protection operations continue to remain difficult for IT administrators to master. IT is able to provide highly efficient hypervisor-level data protection by backing up VMs as unique entities. Nonetheless, such a data protection scheme on its own fails to support the needs of users. LoB users focus exclusively on data objects associated with the applications running within a VM, such as a user’s mailbox in an Exchange mailbox database.
For an IT administrator to perform data protection tasks, such as application data recovery and log truncation, a host-level VM backup must invoke APIs within the guest OS to quiesce VM application I/O activity by committing all current transactions and freezing new transactions. Application quiescence creates a crash-consistent backup within the guest OS. In the case of a host-level backup of a VM running Exchange, Microsoft explicitly recommends using Windows Volume Shadow Service (VSS) Writer to quiesce Exchange, truncate logs, and avoid data loss.
To quiesce VM guest OS applications, Vembu provides Appaware, a VMware Tools extension, which is frequently referred to as a VSS requestor agent. IT administrators install Appaware on any VM running an application requiring log truncation after a backup. VSS requestor agents are frequently used to call APIs in a VM guest OS; however, some competitor’s, such as Veeam, download and install a VSS requestor agent at run-time, and then remove it after the backup is completed.
Vembu’s Appaware agent uses the VSS Writer to implement Redirect on Write(RoW) snapshots within a Windows guest OS, rather than CoW snapshots, which imposes less total I/O overhead on an incremental VM backup.
Critical RPO and RTO Success Factors
Minimizing data loss for an application means maximizing the number of backups created for an application. To meet an aggressive RPO for a critical application, IT operations must be able to schedule frequent backups, that occur as the application runs throughout the work day. To support fast incremental VM backups that have a minimal impact on application processing, vSphere implements a Changed Block Tracking (CBT) mechanism, which explicitly maps all of the modified data blocks for an incremental backup. In addition, the updated VMware VDDK 5.5 significantly reduces the overhead associated with an ESX copy on Write (CoW) snapshot.
For VM logical disks on ESX datastores, CoW snapshots are highly space efficient. To represent a snapshot of a logical disk, an ESX host is able to create an empty file instantly in the VM’s datastore. Only when data needs to be written to a logical disk, does the host actually write data into the snapshot. In particular, the host reads the current data, writes that data to the snapshot, and then writes the new data to the original location. For a VM, the presence of a CoW snapshot results in performing three logical I/O operations for each new write to the file representing a VM logical disk.
The overhead for writes associated with a CoW snapshot escalates dramatically when a business critical application with a high level of I/O activity is running on a VM a Windows guest OS. As part of the backup process, a VSS requestor agent on the VM will need to invoke the VSS Writer to quiesce the application and create snapshots of logical disks. In this process, most VSS requestor agents double down on I/O write overhead by also implementing the Windows Server guest OS snapshots as CoW snapshots that are encapsulated within the ESX host CoW snapshot.
Lowering Snapshot Overhead Through Redirection
Vembu BDR leverages all of the VADP optimization features in performing an ESX CoW snapshot of the VM; however, Vembu’s Appaware agent uses the VSS Writer to implement Redirect on Write (RoW) snapshots within a Windows guest OS, rather than CoW snapshots, which imposes less total I/O overhead on an incremental VM backup. While an RoW snapshot provides the same space-efficiency as a COW snapshot, an RoW snapshot critically does not double the number of logical write operations.
Like a CoW snapshot, an RoW snapshot starts as an empty container. A RoW snapshot, however, does not copy existing data into a snapshot file before writing new data to the original location. An RoW snapshot process writes new data directly to the snapshot file and sets up a pointer to redirect access around the old data, which remains in place. Given the lower overhead associated with an active RoW snapshot, this scheme has been adopted by a number of vendors, including NetApp.
For a highly active VM—even when using the new VMware VDDK 5.5—removing an ESX snapshot that encapsulates a CoW VSS snapshot can take twice as long as copying CBT data in an incremental VM backup. In contrast, the overhead impact of RoW snapshots is only manifested when unwinding pointers to remove a long chain of snapshots. Since Appaware leaves only one RoW VSS snapshot open for log truncation during an ESX snapshot, there is no chain of RoW snapshots to unwind when completing an incremental VM backup. Consequently, the issue of RoW overhead extending the time window of an incremental VM backup using Vembu BDR is moot,