Today’s “always-on” and “always available” IT environments require a highly available infrastructure design. Simply put in other words- you don’t want to have a single point of failure in your business-critical infrastructure. If so, it quickly becomes apparent that you are only as good as your weakest link. Redundancy needs to exist throughout the environment, including your physical infrastructure and applications.
When it comes to your hosts that run applications and services, you want to ensure you have multiple hosts that can provide redundancy when one of your hosts fails. This is true when you have hosts running applications like MS SQL Server or if you have hosts that are serving as virtual hosts for your virtualized environments.
In the world of Microsoft Windows, Windows Server Failover Cluster is the mechanism used for hosting applications and hypervisors in a highly available way so as to provide minimum disruptions when there is a failure in the environment.
Windows Server 2019 provides the most advanced Windows Server to date; as it provides the most advanced features in terms of Windows Failover capabilities. In this article let’s take a look at Windows Failover Clustering in general, What is it? What is Hyper-V Failover Cluster?, How do they work? What is new in Windows Server 2019 Failover Cluster along with all its use cases? as well as the implementation of the Windows Failover Cluster in Windows Server 2019.
What is Windows Failover Clustering?
In order to increase high-availability for your business-critical data using Windows Server, you need Failover Clustering in Windows Server to achieve it.
What is a failover cluster?
The failover cluster in Windows Server is a group of independent nodes that work together as a single logical unit to increase the availability and scalability of the roles and services that run on top of the failover cluster; this can be a well-known application like MS SQL Server or virtualization specific roles like Hyper-V.
A failover cluster is generally connected together via physical means as well as via software or at the application layer. The general advantage of running these applications on top of a failover cluster is if one or more cluster nodes fail, other nodes are there to pick up the application, service, or role of the failed node. This process is referred to as failover and hence it is named as “failover cluster”.
Hyper-V Failover Cluster
To put it in a simple way, A group of similar Hyper-V Servers/ Windows Servers enabled with Hyper-V roles connected and working together in order to balance the situation when one of the connected servers running any application, service or role goes down or fails; the hyper-v servers are referred as nodes in tech terms.
Working Mechanism of Failover Cluster
Failover clusters have an internal monitoring mechanism that constantly queries the nodes that are members of the failover clusters to ensure they are healthy, alive, and are functioning properly; this is referred to as “heartbeat”. Sometimes if a node becomes unhealthy, the clustered service or application that exists on the unhealthy node may be restarted to resolve the issue or completely fail over as mentioned above.
Failover Clusters provide another extremely powerful storage mechanism called Cluster Shared Volume (CSV). CSV allows providing shared storage between cluster nodes so that a consistent, distributed namespace is provided to access shared storage from all nodes. This provides the necessary multi-connected storage functionality for Hyper-V machines also.
Quorum in a failover clustering configuration is a special voting mechanism that helps to prevent a situation called split-brain.
What is split-brain? – A split-brain situation can develop in the case of a failure or impairment of any one of the cluster nodes. As mentioned above, there is a special type of communication that is maintained between the nodes of a Windows Server Failover Cluster. If anything happens then the nodes in the failover cluster are separated from one another, then the nodes may assume they need to take ownership of cluster resources.
Read more: Understanding Quorum in a Failover cluster
What is new in Windows Server 2019 Failover Cluster?
With each new release of Windows Server, Microsoft generally adds features and functionalities to the Failover Cluster feature. Windows Server 2019 is no exception; there have been some really great feature improvements with Windows Server 2019 that make the Failover Cluster feature even more powerful than previous releases.
Let’s check that out:
- Cluster sets – One of the great new features of the Windows Server 2019 software-defined data center solution is the ability to form cluster sets.
- Azure-aware clusters – Microsoft is making it easier and easier to run workloads inside the Azure IaaS environment. With Azure-aware clusters, Windows Failover Clusters can now detect while they are running inside Azure. When they are running in the Azure IaaS environment, they automatically optimize themselves to provide proactive failover and logging of Azure planned maintenance events, etc. Additionally, you no longer have to configure the load balancer with a dynamic network name.
- Cross-domain cluster migration – Windows Failover Clusters that are running on top of Windows Server 2019 can now be moved between different Active Directory domains. This is a long-requested feature for Windows Failover Clusters that opens many possibilities and eases the pains of domain consolidations, mergers, etc.
- USB witness – With the USB witness configuration, the two-node Windows Failover Cluster, you can use a simple USB device that is attached to a commodity network device such as a router, etc to provide the witness component for a quorum, this is called true two-node.
- Cluster infrastructure improvements – With Windows Server 2019, the CSV cache is enabled by default to boost the performance of virtual machines running on Cluster Shared Volumes. Additionally, there have been enhancements to allow the Windows Failover Cluster to have more ability and logic to detect problems with the cluster and to automatically repair it. This includes partitioned nodes and the use of network route detection also.
- Cluster Aware Updating supports Storage Spaces Direct – A great improvement to Windows Failover Clusters in Windows Server 2012 was the Cluster Aware Updating feature. This automates the process of applying software updates on clustered servers while maintaining the availability of the roles that are housed on the cluster. This feature has been improved with each release of Windows Server. Now with Windows Server 2019, the CAU feature recognizes and is integrated with Storage Spaces Direct (S2D). The feature orchestrates restarts of all servers in the cluster for maintenance operations including updates.
- File share witness enhancements – New enhancements, improvements, and failsafe with the file share witness have been implemented with Windows Server 2019. This includes using the file share witness in poor internet-connected remote locations, lack of shared drives, lack of a domain controller such as in a DMZ, workgroup or cross-domain clusters, as well as blocking DFS file shares.
- Cluster hardening – Communication within the cluster over SMB for CSV volumes and S2D leverages certificates to make the communication as secure as possible and eliminates the dependencies on New Technology LAN Manager (NTLM).
- Failover cluster no longer uses NTLM authentication – NTLM has gone by the way-side with Windows Server 2019 Failover Cluster authentication. With Windows Server 2019, failover clusters use Kerberos and certificate-based authentication exclusively.
What is a cluster set? – A loosely-coupled grouping of multiple failover clusters including compute, storage, and hyper-converged infrastructure that enables you to move virtual machines between clusters and different cluster sets.
With this newest version of Failover Clustering available in Windows Server 2019, there are many new enhancements to make note of. Perhaps one of the most common roles that are hosted on a Windows Server Failover Cluster is Hyper-V. This allows running virtual machines in a highly-available way; specific to Hyper-V.
Prerequisites for Installing the Windows Failover Clustering
Before installing the Windows feature component, you need to verify the prerequisites. What are those? The following are generally noted by Microsoft as the prerequisites of installing the Windows Failover Clustering feature and those specific to Hyper-V:
- Install the same version of Windows Server for all Failover Cluster nodes
- Have servers with the same or similar hardware configurations
- Make sure storage and network components are adequate for connections, etc
- Shared storage – Failover clustering requires shared storage, either in the form of Storage Spaces Direct (S2D) or shared storage. The shared storage can be traditional shared storage via SAN devices with iSCSI and NFS targets as well as with new software-defined approaches such as Storage Spaces Direct.
- Attached storage should contain multiple physical disks that are configured in a way that provides redundancy. Some configurations may utilize a disk or logical storage as a disk witness
- Basic, not dynamic disks are supported
- With Cluster Shared Volumes, use NTFS- with S2D, it is recommended to use ReFS
- Especially for software-defined Windows Failover Cluster solutions (i.e. Storage Spaces Direct, etc), use WSSD certified hardware solutions
- If you are running specialized Windows Failover Clusters such as Storage Spaces Direct (S2D), you have to pay close attention to the hardware requirements as S2D has very specific requirements
- For Hyper-V specific clusters, the cluster servers must support the hardware requirements for the Hyper-V role which includes having processors with hardware-assisted virtualization. This includes Intel Virtualization Technology (Intel VT) or AMD Virtualization (AMD-V) technology. Additionally, Hardware-enforced DEP must also be enabled.
How to Implement Failover Clustering in Windows Server?
In the following walkthrough, let’s take a look at the implementation of failover clustering in Windows Server 2019. This involves a few steps as that of the following:
- Installing the same version of Windows Server and patches on at least two failover cluster nodes.
- Deciding on domain-joined, multi-domain, or workgroup clusters.
- Configuring shared storage between the failover cluster nodes.
- Installing the Failover Cluster feature and Role services you want to cluster (Hyper-V, etc…).
- Testing the cluster configuration.
- Creating the Failover Cluster.
- Configuring quorum.
You can find brief explanations for the above-said steps in the following sections:
Install Windows Server 2019 and Patches
Let’s skip past the point of installing Windows Server as we have already installed Windows Server 2019 on two failover cluster hosts. One important point to note is that you want to make sure your failover cluster hosts are running the same version of Windows Server and also are at the same patch level. This ensures that everything between the hosts operates consistently and there is no unexpected behavior or variance between your nodes.
Ensure your failover cluster hosts are running the same Windows version and patch level.
Active Directory Domain Join?
With Windows Server 2016, Microsoft opened up some new and very exciting capabilities with Windows Server Failover Clusters in the realm of providing domain join flexibility.
Starting with Windows Server 2016 and extended to 2019, you can have clusters that are domain-joined, cross-domain joined (multi-domain), or workgroup clusters. For the lab walkthrough, we are using the typical domain-joined cluster configuration. Just note the other options that are now available.
Both cluster hosts are joined to the domain.
Below, there are two shared drives that are mounted via iSCSI connections to SAN storage. As you can see, there is a volume mounted for quorum purposes and a larger volume that will be used to actually store data for Hyper-V virtual machines.
You will want to ensure all cluster hosts have connectivity to the shared volumes so that cluster failover, quorum, and other processes function as needed.
When the cluster is created, the cluster wizard is generally effective at choosing the disk you want to use for quorum purposes (smallest disk, etc). However, you can manually assign the disk for a quorum as well in the Configure Cluster Quorum Wizard found in the Failover Cluster Manager.
Right-click on the Failover Cluster name in the Failover Cluster Manager > More Actions > Configure Cluster Quorum Settings.
Manually assigning quorum
Mounting two volumes for failover cluster shared storage.
Install Hyper-V and Other Roles to Cluster
Since these two nodes will serve as Hyper-V hosts, we will install the Hyper-V Role on each host to get ready to add both to the failover cluster for hosting virtual machines. You can use Server Manager to install Roles/Features, however, PowerShell is a great way to quickly and easily install Windows Server components such as roles and features. To install Hyper-V, use the following one-liner.
- Install-WindowsFeature -Name Hyper-V -IncludeAllSubFeature-IncludeManagementTools -Restart
Installing the Hyper-V role on the failover cluster hosts
Install Failover Clustering Feature
Let’s now install the Failover Clustering feature as well as the management tools (Failover Cluster Manager) to manage the failover clustering feature. Again, PowerShell is a great way to do this. Use the following PowerShell one-liner:
- Install-WindowsFeature –Name Failover-Clustering –IncludeManagementTools
Install the Failover Clustering feature.
Testing the Failover Cluster Configuration
One of the tools provided by Microsoft that is helpful when configuring a Windows Server Failover Cluster is the Validate Configuration tool that can be found in the Failover Cluster Manager. This helps to shed light on any issues with the configuration before creating the cluster.
The validation runs extensive tests on very common problem areas of cluster configurations, including network configuration as well as storage configuration. It ensures that the shared storage configured meets the requirements needed by Windows Server failover clustering, including iSCSI Persistent reservations.
Using the Validate Cluster tool to validate the failover cluster configuration,
you can also use PowerShell to run the validation against the prospective failover cluster hosts.
- Test-Cluster < node1 >,< node2 >
Running the Test-Cluster cmdlet against your cluster hosts before creating the failover cluster.
The Validation process creates a report located on the cluster host it was running from, The Validation report is created in the C:\Windows\Cluster\Reports directory on the failover cluster host.
Viewing the Validation report created
The validation process creates a very detailed report divided up into the major sections of validation. This includes the Hyper-V configuration, inventory, network, storage, system configuration, etc. You will want to make note of any errors or warnings in the report to make sure these are corrected before proceeding with the failover cluster configuration.
Viewing the failover cluster validation report
Creating the Failover Cluster
Once you have verified the cluster configuration and resolved any issues found, you are ready to create the failover cluster. Creating the Windows Server Failover Cluster is easily accomplished in PowerShell:
- New-Cluster -Name HyperVCluster -node < node1 >,< node2 > -staticAddress < IP Address >
Creating a new Windows Server Failover Cluster
After the cluster is created, you can verify that the Active Directory object was created and that you see the cluster in the Failover Cluster Manager.
Thus, a New Failover Cluster computer object is created in Active Directory.
Windows Server Failover Clusters provide a very robust and resilient platform to run business-critical applications and services. With each release of the Windows Server platform, failover clustering has continued to be enhanced. And in the release of Windows Server 2019 has the most feature-rich failover clustering platform to date.
No matter how resilient your platform from a high-availability perspective, you must ensure your data is protected. This means you should have effective backups of your mission-critical data running on your Windows Server Failover Clusters, including Hyper-V virtual machines.
Once such an effective backup solution in the market is the Vembu BDR Suite which allows you to ensure complete data-availability for your Hyper-V environments that include standalone Hyper-V hosts to multiple Windows Server Failover Clusters hosting the Hyper-V role.
In Vembu BDR Suite, even when your virtual machines are moved to another Hyper-V host, the backups will continue to function without interruptions. Used in conjunction with the native high availability features within Hyper-V, the Vembu Backup for Microsoft Hyper-V ensures data availability for your Hyper-V production workloads even at times of disasters.
Vembu BDR Suite provides a very robust backup solution with the enterprise-class features for protecting your Hyper-V clusters effectively at a surprising price range. Download a free, fully-featured trial of Vembu BDR Suite.