In this post, we are going to demonstrate the best practices regarding vSphere HA configuration on a vSAN-enabled cluster.
Configuration Order
vSAN and vSphere HA must be configured in a specific order:
– vSAN must be enabled before vSphere HA is enabled.
– vSphere HA must be disabled before vSAN is disabled.
Isolation Addresses & Response
In a traditional vSphere environment, the HA traffic traverses the management network. However, HA traffic traverses the VSAN network in the case of a VSAN-enabled cluster.
The default isolation address is the IP address of the management network default gateway. If a host can’t reach its isolation address nor its heartbeat datastore, the host will enter the isolation state and the configured isolation response action (whether to power-off /restart VMs or leave VMs powered on) will be triggered.
The below configuration must be considered:
vSAN Strectched Cluster
– Configure the default gateway of the VSAN network as your isolation address from the HA advanced settings as below:
das.usedefaultisolationaddress = false
das.isolationaddress0 = <gateway on vSAN network>
For stretched cluster, it is recommended to add two isolation addresses, one from each site.
– Configure the Host isolation response to power-off and restart VMs.
vSAN 2-node ROBO cluster
– As the 2 nodes are most likely to be connected directly back-to-back without a network switch, this means there is no default gateway for the VSAN network in this case. You don’t need to configure an isolation address in the case of a 2-node ROBO cluster as this scenario is already covered by VSAN.
– Disable the Host isolation response in this case. This will leave VMs powered on.
HA Heartbeat Datastores
Typically in a vSAN environment there’s only vSAN storage so there are no heartbeat datastores.
In a vSAN only environment you can disable this by selecting “Use datastore from only the specified list” in the HA interface and then set “das.ignoreInsufficientHbDatastore = true” in the advanced HA settings.
Admission Control
When you reserve capacity for your vSphere HA cluster with an admission control policy, this setting must be coordinated with the corresponding Primary level of failures to tolerate (PFTT) policy setting in the vSAN rule set and must not be lower than the capacity reserved by the vSphere HA admission control setting.
For example, for an 8-node VSAN cluster configured with FTT=2, the cluster resource percentage should not be more than 25%.
In the case of active/active VSAN stretched cluster configuration running virtual machines at both data sites, the recommendation is to set admission control to a percentage value of 50%. This will leave 50% of the cluster’s CPU and Memory resources free, and should ensure that one data site can run all the virtual machines in the event of a complete failure of the other site.
Summary of recommended HA settings for VSAN Stretched Cluster
Below is a table listing the recommended HA settings for a VSAN stretched cluster:
Hope this post is informative,
Mohamad Alhussein