When designing any system for high availability, one of the basic fundamentals of engineering in any system is to provide redundancy. Redundancy ensures that if you do have failures (you will), there are components or mechanisms in place that will take over for the failed component or system.
Windows Server has some really great redundancy mechanisms built-in, out-of-the-box, that can help to provide redundancy for business-critical systems that are housing production workloads. Redundancy is key to ensuring the uptime of these resources when minutes or even seconds count. Ideally, with properly designed redundancy weaved throughout the system, end-users or business stakeholders should not even notice there was an issue.
In this post, we will take a look at the native Windows Server redundancy mechanisms for high availability and see how each contributes to being able to ensure high availability in the enterprise datacenter.
Native Windows Server Redundancy Mechanisms for High Availability
In this listing of features, we will be taking a look at the redundancy mechanisms built into Windows Server 2019 and how they are able to help ensure uptime and high availability of systems. The following redundancy mechanisms are found in Windows Server 2019 that allow ensuring uptime and high availability of systems.
- Failover Clustering
- Guest Clustering
- Network Load Balancing (NLB)
- Storage Spaces Direct (S2D)
With each of these mechanisms, Windows Server provides the ability to increase the reliability, not only of systems and underlying Windows infrastructure but also the applications and this is key. In today’s web-driven world, having as close to the “never achievable” 100% uptime value is what most businesses are striving for. Customers and business stakeholders depend on systems, infrastructure, and ultimately, the applications to be available whenever and wherever they need them.
Let’s take a look at each of these different Windows Server features and see how they are able to greatly extend the ability of Windows Server to provide high availability and redundancy of the underlying operating system to handle failures and load.
Microsoft has engineered most of its enterprise technologies around the Failover Clustering feature in Windows Server. Failover clustering provides the ability to aggregate the resources of a number of physical hosts and present those hosts as a virtual entity that is able to house workloads. These workloads can be anything from Hyper-V virtual machines, SQL Server, Exchange Server, and even file services that need to be highly available.
Failover clustering allows for various Windows Server physical nodes to share storage between them. Each member of the Windows Server Failover Cluster provides compute resources to the cluster. Using a Hyper-V Failover Cluster as an example, if one of the physical Hyper-V hosts fails, the other healthy hosts in the Hyper-V Cluster are able to assume ownership of the virtual machines, restart them, and automatically bring those virtual machines back up to an operational state. This provides exceptional redundancy for the underlying physical hosts that are providing compute resources for the Hyper-V guest virtual machines.
Windows Failover Clustering even works in a “nested” fashion. In other words, Windows Failover Clustering server nodes, can actually be run as VMs inside a Hyper-V environment.
What redundancy is gained by guest clustering? As discussed, the Failover Clustering of the physical Windows Server nodes provides redundancy at the physical host layer to ensure that a failed host does not bring down business-critical resources at least for more than a short period of time. A healthy host picks the workloads up and is able to bring them back online.
However, if this brief period of time where the guest virtual machines are not available is unacceptable to the business, guest clustering allows taking the concept of clustering a step further by clustering the applications running inside the guest virtual machines. By creating a guest cluster of VMs running applications such as SQL or other business-critical applications, if a physical host goes down and the physical failover clustering takes over thereby restarting the VMs on a healthy host, guest clustering picks up the application that may have been running on the primary VM that is now getting restarted due to the host failure. Another guest cluster node simply assumes the application workload and is able to provide even further redundancy to the application. Guest clustering provides ultra-redundancy not only for VMs but for applications.
Network Load Balancing (NLB)
What if there is a stateless application that needs to have redundancy only from a network perspective?
Failover Clustering may be a bit overkill in this case. Network Load Balancing provides the ability to have several servers that are not truly reliant on one another generally running web applications to be able to load balance traffic at a network level. The Windows Server NLB functionality provides a way to load balance traffic by way of software to share the load across various.
Typically, Windows Server NLB is implemented to provide redundancy for web traffic such as with the IIS Role, but it can also be used in conjunction with FTP, remote access, and proxy servers. NLB provides a great way of being able to have an “elastic” solution, on-premises. What do we mean? Elasticity comes from the ability to add NLB servers or remove NLB servers if need be for maintenance, load, or other reasons. With NLB all of these functions can be performed without a maintenance period.
NLB in Windows Server works by using dedicated (static) addresses on the participating Windows Servers and then forming a virtual IP address configured on each participating server. The forward-facing DNS record will use the virtual IP address instead of the dedicated IP addresses assigned on each server. NLB then load balances the traffic between the participating servers in the NLB team. This provides redundancy and performance benefits in times of heavy load.
Storage Spaces Direct (S2D)
Storage Spaces Direct is an exciting software-defined storage technology introduced in Windows Server 2016 for running today’s workloads such as Hyper-V. It has matured even further in Windows Server 2019 with enhanced functionality. Storage Spaces Direct provides a software-defined solution built on top of commodity storage that creates a virtual datastore across the storage contributed by the participating hosts in the Storage Spaces Direct cluster.
The underlying fault tolerance mechanism with S2D is mirroring. This includes two-way and three-way mirroring. In two-way mirroring, two copies of everything are created. So that if you lose a host in the S2D backed Hyper-V cluster, you can continue operating and virtual machines remain operational. In three-way mirroring, three copies of everything are kept which allows tolerating the loss of two servers in the minimum three-server configuration for this redundancy.
On top of the mirroring, parity encoding often called erasure coding provides fault tolerance using mathematical algorithms. In Windows Server 2016, local reconstruction code or LRC was introduced that splits reconstruction operations into smaller groups to reduce the overhead from rebuilds after failures happen.
Mirror-accelerated parity was also introduced in Windows Server 2016 which is essentially mirroring to accelerate erasure coding. With this mirror-accelerated parity, an S2D volume can be a part mirror and part parity. This allows for writes to be written on the mirrored portion and then gradually moved into the parity portion later.
S2D contains many redundancy technologies that make it an extremely robust solution for running today’s virtualized workloads on top of Hyper-V software-defined storage.
In Windows Server 2019, there are many native Windows Server redundancy mechanisms for high availability. From failover clustering, guest clustering, NLB, and S2D, Windows Server provide an extremely resilient, redundant, and performant platform for business-critical workloads that require strict uptime SLAs. By utilizing these cost-effective means for high availability and redundancy, organizations have many native means to increase their resiliency using these out-of-the-box Windows Server features.