Storage vMotion between arrays of same vendor
Recently I had to investigate an interesting behavior of ESXi hosts during Storage vMotion between two arrays of the same vendor – in this case HDS (G600 –> G700/F700). In this post I list observed symptoms and a very simple way to boost performance in this situation.
During Storage vMotion processes these symptoms can be seen:
- VAAI Clone error in
[to show VAAI counter in
esxtop: press ‘u’ (to show devices), ‘f’ (to select counter), option ‘O’ must be selected]
As you can see, all VAAI metrics are zero, except Clone_F which is the error counter for VAAI primitive XCopy. Furthermore Clone_F counts up continuously.
- IO errors in
[to show error counter in
esxtop: press ‘u’ (to show devices), ‘f’ (to select counter), option ‘L’ must be selected]
There are not just VAAI errors, also very high write failures (per second) are shown.
- Tons of SCSI sense codes in
There are a massive amount of SCSI sense codes in log. These codes mean basically that IO has to be repeated because of a problem.
H:0x0 D:0x8 P:0x0
H:0x7 D:0x0 P:0x0
- Path errors in vCenter.
Despite all these, Storage vMotion worked.
It seems to me that ESXi always tries to use VAAI to transmit data. VAAI tries to copy and returns an error. So software data-mover jumps in and copies data through the ESXi storage stack. VMware does not support VAAI to work between arrays – read here. So this should not happen at all. Maybe it is because ESXi is not able to distinguish arrays. In case of these arrays, Vendor- and Model-names are the same.
Solution is quite simple. Just make ESXi clear to use software data-mover instead of VAAI XCopy. This can easily be done by setting ESXi advanced parameter
0. This can be done without a reboot and even during Storage vMotion. Keep in mind, this is a host-global setting, so when migration is finished, reset value back to
1. Here you can read more about setting advanced parameter using command line.
When setting changed, not just none of above mentioned symptoms occurs any more, also performance increases! I have seen much more than 50% better throughput.
- This behavior was observed on 6.0 hosts.
- As mentioned this occurred on HDS arrays. I don’t know if this happens for other vendors too. At least there seems to be a special VAAI Plugin for HDS: VMW_VAAIP_HDS. I will check at next migrations.
- To show supported VAAI primitives per device, use command:
esxcli storage core device vaai status get.