RTX A5000 stuck at 400-500MHz due to HW Power Brake Slowdown on Ubuntu 20.04.3

nvidia-bug-report.log.gz (2.2 MB)
I have an issue with the SM clocks on my RTX A5000. Under Cuda loads and openGL benchmarks, the GPU switches to power mode P2 but SM clock speeds remain below 500 MHz. The reason for this seems to be:
HW Power Brake Slowdown : Active

But actually it doesn’t reach the limit of 230 W and system-wise I have a 1.6kW power supply for 4 RTX cards.
Unfortunately I cannot resolve this issue. I would be grateful for any help.

Same issue same motherboard. Did you find a solution in the end?

HW power brake is a mainboard issue. Please check for a bios upgrade, use different pcie slot if possible, contact mainboard vendor support.

Thank you for your input. Unfortunately, I tried all PCI slots and I have the latest Bios/firmware. I have contacted Asus but haven’t gotten a useful reply from them yet. I guess they are still working on their bios update …

I didn’t get much further with Asus, they tell me to install Windows … In the end, after consulting several technical documents, I masked off pin 30 on the PCIE with some insulation tape. When this pin is disconnected the MB thinks the card does not support Power Braking. It all works fine now, but this is not a solution befitting such a high-end MB and Graphics card. Maybe as intermediate solution Nvidia can provide an option in nvidia-smi to just ignore the power braking pin for such cases.

Does disabling (or enabling) BMC support in bios have an influence on pwrbrk?

Running some heavy CUDA calculations now. I will try when they are finished. Completely disabling the BMC using the hardware switch did not have an effect.

Removed the tape to check. Unfortunately it makes no difference with or without BMC support. I do appreciate your effort though. I’m wondering whether this mainboard actually requires some sort of special power supply with “PSUSMB” support. If this is the case it is not well documented.

Hey I am having similar issues with the same Board.
Running two A6000 and they wont go over 480 mhz. But when I benchmark those GPUs with the render I am using they land in the predicted time estimate (even faster). So I wonder if thats something nvidia-smi related, with reporting wrong clocks?