Today’s enterprise environment workloads are requiring more availability and uptime than ever before. Organizations have the need to provide data availability and aggressive SLAs to meet customer and business-critical data demands. Vendors are providing businesses today with technology solutions that are able to help them keep pace with the stringent demands on these enterprise environments.

When thinking about overall disaster recovery and availability, businesses must think about data availability, not just from a single workload or host failure, but potentially from an entire site-level perspective. Especially if businesses have sites located in disaster-prone areas such as coastal environments and others, thinking about how data and business-critical workloads can be made highly available, even if a primary site fails, must be considered for business continuity.

Protect Your Data with BDRSuite

Cost-Effective Backup Solution for VMs, Servers, Endpoints, Cloud VMs & SaaS applications. Supports On-Premise, Remote, Hybrid and Cloud Backup, including Disaster Recovery, Ransomware Defense & more!

Table of Contents

  1. VMware Stretched Cluster Configurations and Goals
  2. VMware vSAN Stretched Clusters
  3. High Availability is not Disaster Recovery
  4. Concluding Thoughts

While traditional clusters provide high-availability when a host fails, how can clusters be made highly available in the case of a site failure?

Stretched clusters have been around for a while, however, vendors such as VMware and others are making the ability to successfully achieve a working stretched cluster solution easier than ever before. Many of the moving parts and capabilities to make this achievable are baked right into today’s hypervisor solutions such as VMware’s vSphere.

Let’s take a look at high-availability with VMware vSphere Stretched Clusters and see how this solution allows organizations to solve very challenging high-availability problems in the enterprise.

Download Banner

VMware Stretched Cluster Configurations and Goals

The primary goal of the “stretched” cluster configuration is to provide the same level of availability and benefits that customers get with a VMware HA enabled cluster in a local site, extended to geographically disperse clusters. This means that rather than having high-availability that protects against a host failure, this is extended to being able to lose an entire cluster, and remain highly available. The stretched cluster essentially stretches, compute, network, and storage between sites instead of just between hosts in a cluster.

An added benefit to having a stretched cluster configuration is that it essentially allows making use of hardware at multiple locations. A traditional DR environment that only has standby hardware receiving warm data from production, and that is not actively utilized, is a hard sell from a fiscal perspective. However, it is much easier to convince business stakeholders that additional hardware will be used to provide availability but will also be actively utilized for capacity and performance.

When it comes to VMware solutions, there are a couple of “stretched” cluster solutions to make note of which involve both traditional storage infrastructure configurations and then also a software-defined approach utilizing VMware vSAN.

Let’s look at both approaches and how they are architected.

First, let’s look at the stretched cluster configuration utilizing traditional storage.

A VMware vSphere Metro Storage Cluster or vMSC is a special storage configuration that combines replication with array-based clustering. When thinking about the vSphere Metro Storage Cluster, these are typically deployed when distances are limited between datacenters such as they exist in the same metropolitan area or possibly within the same campus location. This is mainly due to the strict latency requirements that exist with metro storage clusters.

What are the requirements and considerations for the VMware vSphere Metro Storage Cluster?

  • No special licenses required – Note If automated workload balancing and placement is needed, vSphere DRS, and Storage DRS are needed which does require VMware vSphere Enterprise Plus license
  • Storage should be connected using Fibre Channel, iSCSI, NFS, and FCoE is supported.
    Max latency between sites for the vSphere ESXi management networks is 10ms round-trip time (RTT)
  • vSphere vMotion and vSphere Storage vMotion has a maximum supported latency of 150ms with vSphere 6.0, keep in mind this is not for stretched cluster use. (Also requires Enterprise Plus license)
  • Synchronous storage replication links have a maximum supported latency of 10ms RTT. Customers need to refer to their specific storage vendor as they may have different published or supported latency requirements for successful vMSC
    • A common maximum RTT for storage systems is 5ms
  • The vSphere vMotion network needs 250 Mbps of dedicated bandwidth per concurrent vMotion session requirement
  • In regards to FT, only legacy FT is supported, SMP FT is not supported on vMSC
    • Note that when a DRS VM/Host rule is created for a VM both the primary as well as the secondary FT VM will respect the rule!
  • Storage IO Control is not supported on a vMSC enabled datastore
    • Note that SDRS IO Metric enables Storage IO Control, as such this feature needs to be disabled

A key consideration and requirement with the VMware vSphere Metro Storage Cluster solution is found in the storage requirements and configuration. Traditional synchronous storage replication creates a “primary” and “secondary” relationship with the LUNS being accessed. This will not work with vMSC configurations as vMSC must be able to read from and write to both storage locations at the same time. Additionally, all disk writes happen synchronously. This explains the very aggressive latency requirements and high-bandwidth needs for successful vMSC configuration. As distance increases, the possibility to create successful vMSC clusters decreases, as latency generally increases with distance.

VMware vSAN Stretched Clusters

VMware vSAN is a software-defined storage solution that pools together storage resources from multiple servers into a single shared data store, enhancing storage efficiency and simplifying management. It’s a key component of VMware’s hyper-converged infrastructure, offering scalable and resilient storage for virtualized environments. VMware vSAN optimizes storage performance and reliability while reducing hardware costs.

The VMware vSAN Stretched Cluster configuration is based on the very powerful VMware vSAN software-defined storage architecture. When configuring VMware vSphere-based stretched clusters, VMware vSAN is the most common platform that customers generally utilize for stretching workloads. VMware vSAN provides an easy built-in mechanism to build and enable stretched clustering. This empowers customers to have a more “push button” means to architect these types of clusters within the hypervisor itself.

In the VMware vSAN configuration, the “stretched” cluster allows vSAN storage to be extended from a single site into two sites for increasing the availability of workloads and providing intersite load balancing. Very similar to the requirements of the vSphere Metro Storage Cluster, the vSAN stretched cluster generally requires the distance between the stretched data centers to be limited due to the low-latency requirements of the solution.

In comparing the complexity of the two solutions, vSAN and vMSC, the vSAN stretched cluster solution is extremely easy to achieve as the configuration is carried out right from within the vSphere client. There is no specialized replication that must be configured or LUNs provisioned due to the way that vSAN works. Since vSAN is a specialized object store, the objects are simply synchronized between the storage at both sites and kept in sync. In the vSAN stretched cluster implementation, one site is designated the preferred site while the other site by default becomes the secondary or non-preferred site.

In the vSAN Stretched cluster configuration, there are two sites that host the actual data and one witness host. The witness host needs to reside in a third site and is the piece of the vSAN stretched infrastructure that provides the witness components of the virtual machine objects. These are comprised of only metadata and not the actual storage objects of the virtual machines themselves.
This special witness component is the mechanism for quorum that serves as the tiebreaker regarding datastore availability. A witness node typically forms a vSAN cluster with the preferred site when the preferred and secondary sites are isolated from one another. If the preferred site becomes isolated from both the preferred site and the witness, the witness node will form a vSAN cluster with the secondary site. Then objects are resynchronized when the primary site comes online.

In this way, availability to vSAN datastores is maximized and organizations can effectively build out stretched clusters using the built-in vSAN stretched cluster functionality.

Configuring a VMware vSAN Stretched Cluster configuration (image courtesy of VMware)

Configuring a VMware vSAN Stretched Cluster configuration (image courtesy of VMware)

High Availability is not Disaster Recovery

As a quick digression from architecting stretched clusters, it is important to note that while stretched cluster technology can certainly be a part of the overall disaster recovery plan for organizations, it should not itself be the sole means of disaster recovery. Disaster recovery involves much more than high availability. While stretched clusters can provide high-availability at a site-level, the data itself needs to be protected with backups.

Why?

High-availability does not protect your data from accidental or intentional deletion of data. Additionally, threats to data such as ransomware are not prevented or remediated by high-availability mechanisms such as stretched clustering. Using proper backups and other high-availability mechanisms such as those found in Vembu BDR Suite will allow properly protecting the most valuable asset, data.

Concluding Thoughts

Today, organizations are under more pressure than ever to provide constant uptime to production workloads and business-critical data. Today’s vendors such as VMware are providing built-in tools to allow businesses to create a highly available infrastructure that allows protecting against failures, even at the site level. The VMware vSphere Metro Storage Cluster utilizes traditional SAN technology for replication and array-based data synchronization.

Many businesses today will most likely provision a software-defined stretched cluster using VMware’s vSAN technology. The stretched clustering mechanism available in vSAN provides an almost “push button” approach to creating a stretched cluster configuration. This takes much of the heavy lifting out of the configuration process and builds out the configuration in a way that just works. Regardless of the type of stretched cluster configuration, there are restrictions based on latency that will be affected by the cluster site’s distance from one another. Typically stretched clusters will reside in the same metropolitan area or on the same campus. Organizations today can utilize either of these solutions from VMware to implement effective stretched cluster configurations for creating highly available infrastructure while keeping in mind these solutions are not a replacement in themselves for a proper backup solution and overall disaster recovery methodology.

To ensure comprehensive protection for your Virtual machines running on either VMware ESXi or vCenter, Try BDRSuite today!

Follow our Twitter and Facebook feeds for new releases, updates, insightful posts and more.

5/5 - (1 vote)