[HP] bugs you should know about
This is a list of bugs in HPE hardware and software. These are issues that could harm the availability of your environment. The list is updated irregularly. See this post for VMware bugs you should know about.
In a certain constellation of NIC driver (bnx2x) and firmware version, an update of driver version (2.713.30) in ESXi hosts can physically kill a NIC. Effected NIC driver bundle: HPE QLogic NX2 1/10/20 GbE Multifunction Drivers for VMware vSphere 5.5, 6.0, and 6.5 –> HP Flex-10 53x, HP Ethernet 53x and HP StoreFabric CN1100R. According to another report, all NICs using the chipset of QLogic/Broadcom 578×0 can be effected! ESXi host installed/updated using HPE custom image of july are effected.
For detailed information click here.
For installation/update use latest HPE custom image (HPE Custom Image for ESXi 6.5 Install CD, HPE Custom Image for ESXi 6.5U1 Install CD) release date: 2017-10-06. Also FW of host should be updated using current SPP.
[since September 2015]
When ESXi is installed on a flash-drive (SD, USB) and just the firmware of the iLO board is beeing updated, ESXi loses its boot-device. This alarm is shown by vCenter in summary tab of the host.
Since this FW-update no changes to the host will be saved! Except of this, there is no problem for running host and VMs.
To solve the problem, just reboot the host. Because no changes gets written to flash-drive after losing it, the host probably boots up using an old password for vpxd user. The password of this user gets changed regularly by vCenter. So the host will not re-connect automatically after reboot. Here a few steps that could be necessary to re-connect the host without errors:
- Check if there are VMs registered to the host. You see a VM named as path to vmx-file. This is probably the case, because the hosts starts with the VMs registered at the moment of losing its boot device. To un-register, use vSphere-Client, Host-Client, PowerCLI, vim-cmd, …
- Delete vpxd user on the host. It gets re-created when manually re-connect host to vCenter.
- Try to re-connect host in vCenter manually.
- If its not working, restart management agents on the host and try again.
In my opinion, it is a best practice to not just update a singe piece of hardware. Use Support Pack ProLiant (SPP) instead.
Because of a misbehavior in iLO (firmware before 2.20) the internal SD card can suddenly stops working on HP ProLiant Gen9 servers. On ESXi Hosts the following error may occur if it is installed to this device:
Lost connectivity to the device mpx.vmhba32:C0:T0:L0 backing the boot filesystem /vmfs/devices/disk/mpx.vmhba32:C0:T0:L0. As a result, host configuration changes will not be saved to persistent storage.
Upgrade iLO Firmware to at least 2.20.
Again there is a problem with
hp-ams packages! This time the process hangs when doing a restart of ESXi management agents on ProLiant G5, G6 and G7. ESXi versions 5.x and hp-ams versions 9.5, 9.6 and 10.0 are affected.
To be honest, this not really a bug, because
hp-ams is at the moment just supportet on Gen8 servers. So the solution is to uninstall the software:
- enable maintenance mode
- Stop it by executing
- remote it by running
esxcli software vib remove -n hp-ams
- reboot and exit maintenance mode
It seems to be a VMware problem that is fixed with vSphere 5.5 U2. It may cause a PSoD on an ESXi host on certain conditions. The errors logged in
vmkernel.log look like:
DMAR Fault IOMMU
IOMMU context entry dump for ...
I had the problem on an ProLiant server during the upgrade process to 5.5 U2. After installing the updates the host ran into a PSoD during reboot. Thank to the failback-feature during updates, the host booted the last working software-profile.
The Server in my case does not fit exactly to the description in the KB-article but the error were near the same. I resolved the issue by updating the firmware of the server to SPP 2014.09.0. Link to the current SPP you can find here.
On a VMware ESXi you can observe:
- cannot perform vMotion
- cannot start services such as SSH
- when trying to restart management agents you see an error that a process can’t fork
- on the console of the host, pressing ALT+F1, you can see can’t fork all over the screen
- in vmkernel.log you can see warning like
WARNING: Heap: 2677: Heap globalCartel-1 already at its maximum size. Cannot expand.or/and
WARNING: Heap: 3058: Heap_Align(globalCartel-1, 136/136 bytes, 8 align) failed.
- Veeam backup jobs fail with errors like:
Error: Client error: File does not exist or locked. VMFS path: [[datastore] path_to_vmx_file.vmx
Please, try to download specified file using connection to the ESX server where the VM registered.
Failed to create NFC download stream. NFC path: [nfc://conn:vC-server,nfchost:host-nn,stg:datastore-n@path_to_vmx_file.vmx]
This behavior can be caused by HP Agentless Management Services (AMS). These version are affected:
To resolve the issue you can:
- Stop the service by executing
/etc/init.d/hp-ams.sh stop(the next reboot will start the service again)
- Uninstall the service by running
esxcli software vib remove -n hp-ams
- Upgrade to minimum version
hp-ams 550.10.xusing Update Manager or esxcli
More information on VMware KB you can finde here.
There is a memory leak in HP NIC Management Agent. Now and then there are problems with HP Management Agents. Years ago SCSI Agent causes a lot of failed backups. According to HP Advisory NIC Agent version 9.4, 9.5 and 9.6 may allocate 5MB memory per hour on Windows 2012 and 2012 R2 Servers. Yes, this can just be the case when Windows is installed directly on HP ProLiant Servers. Because of virtualization most often Windows runs as a VM, but think about Hyper-V or in my case DataCore SANsymphony: the NIC Agent process consumed more then 2 GB of memory and causes a real performance impact. A mirror link could use a bandwidth of app. 2Gbit of a 10Gbit link in one direction, the other direction on the same link could use a bandwidth of app. 9Gbit. After disabling the NIC Agent in Windows system settings, 9Gbit could be used in both directions.
- Disable NIC Agent in Windows system settings
- Upgrade to an version that is not affected