ESXi host lost management
I have seen a very strange behavior these days in ESXi host. I looked at the networking configuration of an ESXi 5.1 U3 host. The configuration was changed over the last years. So I suggested to do some adaptations and improvements. So we did redundant pSwitch connections and dual-vMotion ports. We used management and vMotion ports on a single standard switch using two uplinks. VM traffic uses a second vSwitch with 4 uplinks configured. Cisco Switch with configured trunks/etherchannel were used. With two hosts the new configuration worked fine, on third host we had a lot of troubles. After removing a vmkernel port on the second vSwitch (used by VMs) we lost connection to management. Here are some facts:
- IP address of deleted port was not used for management.
- Deleted IP was not entered in DNS.
- The host was not able to ping its local IP addresses any more.
- SSH did not answer neither.
- After shutdown link and start again, this error was logged in vmkernel.log continuously (approx. two times a second):
cpu11:12784)WARNING: LinNet: netdev_tx_internal:2253:Attempting Tx on device that is already down/closing
Here are some recommendations:
- Do not use other load balancing policies than Route based on the originating virtual port on management network vmkernel ports.
- Set Route based on the originating virtual port as default load balancing policy on vSwitch if you use the vSwitch just form management and vMotion. Check if default is overwritten on port group level.
- If you use a vSwitch just for management and vMotion, remove every port group you do not need on the vSwitch.
- Do not use trunks on pSwitches for management network vmkernel ports.
- Enter maintenance mode on the host you are about to reconfigure. Because when your last change seems to be a reboot, no VM has to be shut down.
- If you have troubles to enter management over network, but you are able to ping its IP address, you can try:
- Remove every pNIC port from its vSwitch
- Double check there are no trunks on pSwitch for these uplinks
- Add just one uplink to the vSwitch and try again
- If this does not work, try other uplink
- Restart management network in between
- If you have troubles to enter management over network you also can suffer from this issue. It is about Broadcom 5719/5720 NICs using tg3 drivers.
I hope anyone can spare some time in troubleshooting! If you have more recommendations please let me know.