[HP] bugs you should know about

Posted: October 25, 2014/Under: DataCore, HPE, HPE & VMware/By: vNote42

This is a list of bugs in HPE hardware and software. These are issues that could harm the availability of your environment. The list is updated irregularly. See this post for VMware bugs you should know about.

[October 2017]

PROBLEM

In a certain constellation of NIC driver (bnx2x) and firmware version, an update of driver version (2.713.30) in ESXi hosts can physically kill a NIC. Effected NIC driver bundle: HPE QLogic NX2 1/10/20 GbE Multifunction Drivers for VMware vSphere 5.5, 6.0, and 6.5 –> HP Flex-10 53x, HP Ethernet 53x and HP StoreFabric CN1100R. According to another report, all NICs using the chipset of QLogic/Broadcom 578×0 can be effected! ESXi host installed/updated using HPE custom image of july are effected.

For detailed information click here.

SOLUTION

For installation/update use latest HPE custom image (HPE Custom Image for ESXi 6.5 Install CD, HPE Custom Image for ESXi 6.5U1 Install CD) release date: 2017-10-06. Also FW of host should be updated using current SPP.

[since September 2015]

PROBLEM

When ESXi is installed on a flash-drive (SD, USB) and just the firmware of the iLO board is beeing updated, ESXi loses its boot-device. This alarm is shown by vCenter in summary tab of the host.

Since this FW-update no changes to the host will be saved! Except of this, there is no problem for running host and VMs.

SOLUTION

To solve the problem, just reboot the host. Because no changes gets written to flash-drive after losing it, the host probably boots up using an old password for vpxd user. The password of this user gets changed regularly by vCenter. So the host will not re-connect automatically after reboot. Here a few steps that could be necessary to re-connect the host without errors:

Check if there are VMs registered to the host. You see a VM named as path to vmx-file. This is probably the case, because the hosts starts with the VMs registered at the moment of losing its boot device. To un-register, use vSphere-Client, Host-Client, PowerCLI, vim-cmd, …
Delete vpxd user on the host. It gets re-created when manually re-connect host to vCenter.
Try to re-connect host in vCenter manually.
If its not working, restart management agents on the host and try again.

In my opinion, it is a best practice to not just update a singe piece of hardware. Use Support Pack ProLiant (SPP) instead.

[September 2015]

PROBLEM

Because of a misbehavior in iLO (firmware before 2.20) the internal SD card can suddenly stops working on HP ProLiant Gen9 servers. On ESXi Hosts the following error may occur if it is installed to this device:

Lost connectivity to the device mpx.vmhba32:C0:T0:L0 backing the boot filesystem /vmfs/devices/disk/mpx.vmhba32:C0:T0:L0. As a result, host configuration changes will not be saved to persistent storage.

SOLUTION

Upgrade iLO Firmware to at least 2.20.

[November 2014]

PROBLEM (2)

Again there is a problem with hp-ams packages! This time the process hangs when doing a restart of ESXi management agents on ProLiant G5, G6 and G7. ESXi versions 5.x and hp-ams versions 9.5, 9.6 and 10.0 are affected.

SOLUTION (2)

To be honest, this not really a bug, because hp-ams is at the moment just supportet on Gen8 servers. So the solution is to uninstall the software:

enable maintenance mode
Stop it by executing /etc/init.d/hp-amd.sh stop
remote it by running esxcli software vib remove -n hp-ams
reboot and exit maintenance mode

Problem (1)

It seems to be a VMware problem that is fixed with vSphere 5.5 U2. It may cause a PSoD on an ESXi host on certain conditions. The errors logged in vmkernel.log look like:

DMAR Fault IOMMU
IOMMU context entry dump for ...

I had the problem on an ProLiant server during the upgrade process to 5.5 U2. After installing the updates the host ran into a PSoD during reboot. Thank to the failback-feature during updates, the host booted the last working software-profile.

Solution

The Server in my case does not fit exactly to the description in the KB-article but the error were near the same. I resolved the issue by updating the firmware of the server to SPP 2014.09.0. Link to the current SPP you can find here.

[September 2014]

Problem

On a VMware ESXi you can observe:

cannot perform vMotion
cannot start services such as SSH
when trying to restart management agents you see an error that a process can’t fork
on the console of the host, pressing ALT+F1, you can see can’t fork all over the screen
in vmkernel.log you can see warning like
WARNING: Heap: 2677: Heap globalCartel-1 already at its maximum size. Cannot expand.or/and
WARNING: Heap: 3058: Heap_Align(globalCartel-1, 136/136 bytes, 8 align) failed.
Veeam backup jobs fail with errors like:
Error: Client error: File does not exist or locked. VMFS path: [[datastore] path_to_vmx_file.vmx Please, try to download specified file using connection to the ESX server where the VM registered. Failed to create NFC download stream. NFC path: [nfc://conn:vC-server,nfchost:host-nn,stg:datastore-n@path_to_vmx_file.vmx]

Solution

This behavior can be caused by HP Agentless Management Services (AMS). These version are affected:

hp-ams 500.9.6.0-12.434156
hp-ams-550.9.6.0-12.1198610.

To resolve the issue you can:

Stop the service by executing /etc/init.d/hp-ams.sh stop (the next reboot will start the service again)
Uninstall the service by running esxcli software vib remove -n hp-ams
Upgrade to minimum version hp-ams 500.10.x respectively hp-ams 550.10.x using Update Manager or esxcli

More information on VMware KB you can finde here.

[August 2014]

Problem

There is a memory leak in HP NIC Management Agent. Now and then there are problems with HP Management Agents. Years ago SCSI Agent causes a lot of failed backups. According to HP Advisory NIC Agent version 9.4, 9.5 and 9.6 may allocate 5MB memory per hour on Windows 2012 and 2012 R2 Servers. Yes, this can just be the case when Windows is installed directly on HP ProLiant Servers. Because of virtualization most often Windows runs as a VM, but think about Hyper-V or in my case DataCore SANsymphony: the NIC Agent process consumed more then 2 GB of memory and causes a real performance impact. A mirror link could use a bandwidth of app. 2Gbit of a 10Gbit link in one direction, the other direction on the same link could use a bandwidth of app. 9Gbit. After disabling the NIC Agent in Windows system settings, 9Gbit could be used in both directions.

Solution

You can:

Disable NIC Agent in Windows system settings
Upgrade to an version that is not affected

[HP] bugs you should know about