Unit Voltage is Out of Range Status Warning after Install Firmware Image on QM9700

I installed the firmware (image-X86_64-3.10.4006.img) on my QM9700 InfiniBand Switch and received the following active alerts under Status upon rebooting the switch:

" Unit voltage is out of range "

Does anyone know what this alert means?

Revert back to my previous, working firmware version (image-X86_64-3.11.3002.img) on QM9700 clears this status alert.

Can you please upgrade it to the latest GA and see if it resolves the issue?

If the issue persists the switch may require a CPLD upgrade.
You would have to approach NVIDIA support for handling it further.

Thank you for your reply, dwaxman … but I don’t understand your acronym… what do you mean by “GA”?

I also tried to upgrade to the newer firmware (image-X86_64-3.11.3006.img) and it gave me a different status alerts:

" Some of the modules are not programmed with the default FW ()! "

I wish NVIDIA would create a document that explains what all these status alerts really mean … There is no mentioning or documentation about having to upgrade flash, SRAM or PLD on the QM9700 Switch, either.

How do you typically reach someone from NVIDIA support directly to resolve all these issues?

General Availability: In the context of software and technology, “GA” means that a product is fully developed, tested, and is now available for purchase or download by the general public. It marks the transition from beta or testing phases to a stable release.

Regarding the issue with the latest GA - have you followed the upgrade recipe for the MLNX OS? Upgrading to the latest GA may be allowed only from certain versions.

dear, has the power supply been installed?

I’m not matching it to any obvious known problem.
There was a voltage-related bug in MLNX-OS version 3.10.3100, but it was resolved in 3.10.4006. I see that you saw an error reported just as the switch was finishing up rebooting. If while on 3.10.4006 the same error does not continue to report itself, and it was only the one occurrence, it can be ignored. You can confirm voltage statuses with “show voltage” on the CLI.

You mentioned that switching to a newer version (3.11.3002) cleared the voltage error, suggesting the newer MLNX-OS software fixed the problem (if in fact it was a persistent problem on 3.10.4006).

Unless you must run the switch on a certain MLNX-OS version, to match other switches in the cluster, we strongly recommend using the latest available MLNX-OS version.

The “” Some of the modules are not programmed with the default FW ()! “” error you saw when trying 3.11.3006, is also not a common/known error. I’m not sure what caused this.

I recommend this:

  1. Default the switch (to erase all configuration), using the “reset factory keep-basic” CLI command. You may need to re-apply the Mgmt IP address and mask afterwards via console connection.
  2. If needed re-apply the Mgmt0 IP address/mask, to get SSH and web access again.
  3. Install the very latest available MLNX-OS version for the QM9700 available on our website.
  4. If Problems persist, you’d need to have an NVIDIA troubleshooting case opened for further investigation. Contact details found here:
    NVIDIA Enterprise Customer Support

Link to Support: