VMware SRM authentication – What changed
Site Recovery Manager is VMware’s well knows disaster recovery orchestrator software that has been around for many years now. This product keeps getting better and better at every release but some of the shiny new features often steal the spotlight of other less known, under-the-hood improvements. The authentication mechanism of SRM was one of those when version 6 came out. Although we can’t say it is new anymore, it is still relevant to talk about it to understand how it used to work and how it works now.
VMware SRM Versions Comparison
In previous versions, all communications between SRM ⬄ vCenter and SRM ⬄ SRM were authenticated either by public key certificates or stored credentials which you chose during the installation process. Certificate based auth is obviously the recommended method but not everyone has an in-house certificate authority and public CAs no longer issue certificates with internal server names since November 1, 2015, so credential based was the default.
Anyway, when using certificate based authentication, because it was used to communicate with both vCenter and the other SRM instance, the requirements for the certificate was pretty strict and could make it complicated to implement.
In version 6.0, VMware integrated SRM in vCenter SSO which led to simplified and cleaner communication between the components. Rather than using certificates or credentials, Site Recovery Manager now leverages solution users to establish secure communications with external services like PSC and vCenter Server.
The solution user is generated by SRM during the installation process and is tied to said SRM instance. The installer assigns a private key and a certificate to the solution user and registers it in vCenter Single Sign-On. Note that you can see the solution user under the SSO page of the vCenter web client. Do not attempt to change anything on it, it is strictly reserved for internal use.
Unless you are using Enhanced Linked Mode, when pairing two sites, SRM will create another solution user to authenticate communications with the remote SRM instance.
Site Recovery Manager still requires an SSL/TLS certificate for use as the endpoint certificate for all TLS connections established to Site Recovery Manager. The certificate can be either self-generated or a custom one from your internal PKI. The good thing is that all the changes mentioned above participated in relaxed requirements in the certificate signing request.
Certificate error: Failed to connect to vCenter Server
Quick shout about an error I encountered not too long ago while installing SRM.
Failed to connect to vCenter Server at https://vCenter_FQDN:443/sdk. Reason:
com.vmware.vim.vmomi.core.exception CertificateValidationException: Server certificate chain is not trusted and thumbprint doesn’t match.”
If you have custom certificates installed on vCenter, you may encounter a certificate error when clicking on the site in the SRM web client plugin. This is likely to be a symptom of the service registration not being up to date in the lookup service.
In short, the lookup service registers the location of vSphere components so they can securely find and communicate with each other, sort of a truth table including, among other things, a URL and an sslTrust string (base 64 certificate). It seems that when the PSC UI (https://vCenter/psc) is used to replace the certificate, it won’t update the lookup service. Meaning the external service (i.e. SRM) will use the information of the registration service, so the old certificate, hence the error.
The fix is a bit cumbersome but works pretty well, you will find the complete procedure in the KBs below.
The certificate-manager tool provided in vCenter is supposed to make all the changes necessary, so I would recommend using it in the future rather than the shiny UI. Strange bug if you ask me. Why offer the possibility to replace the certificate in the UI if does a half-baked job?