Issue with vSphere vCLS VMs

Issue with vSphere vCLS VMs

This short post is about an issue in VMware vCenter that causes vSphere Cluster Services (vCLS) VMs fail to deploy. Because of this cluster functions like Distributed Resource Scheduler (DRS) doesn’t work. My post shows a solution for this problem.

The customers environment has clusters with functioning DRS. When a new cluster was deployed, everything works fine. But with the enabling of DRS and HA, an error appears in vCenter.

[vSphere DRS functionality was impacted due to unhealthy state vSphere Cluster Services caused by the unavailability of vSphere Cluster Service VMs. vSphere Cluster Service VMs are required to maintain the health of vSphere DRS.]

Troubleshooting

This means that vSphere could not successfully deploy the vCLS VMs in the new cluster. Unfortunately it was not possible to us to find the root cause. What we tried to resolve the issue:

  • Deleted and re-created the cluster. We tested to use different orders to create the cluster and enable HA and DRS. No matter what we tried, we always got the same error.
  • Since we used hosts that were already used in other clusters, we reinstalled them.
  • Attempted to switch Retreat Mode on and off. Here things got weird. For the new cluster retreat mode changes nothing. That made sense, since nothing had worked so far. But as we tried switching mode for other clusters too, nothing changed there either. So, neither the provision of vCLS nor deletions work.
  • Analyzed logs. We found out that it couldn’t be a host-problem. ESXi hosts did not get any order to create these VMs. But we also didn’t find a hint in vCenter logs.
  • Checked the Security Token Service (STS) certificate. An expired STS certificate can cause multiple different issues. In this case, certificate was valid.
  • Despite STS certificate was valid, we re-created it using the fixsts script. Also no success.

Solution of the problem

After all the setbacks we finally found a solution. Re-creating solution user certificates restored functionality to deploy vCLS VMs. It depends on the way you handle certificates in vCenter to re-create solution user certificates. In this case – and I guess this is the case for the most environments – machine certificate was replaced by a local CA signed certificate. All other certificates were not exchanged. With this setup, solution user certificates can easily replaced in certificate manger.

If your company has stricter security guidelines, you may have to follow one of these instructions:

If you are unsure about your current configuration and the implications of this solution, open a support ticket for clarification!

Notes

Leave a Reply

Your email address will not be published. Required fields are marked *