WARNING: infoROM is corrupted at gpu 0000:3E:00.0 on NVIDIA RTX A5000

While running nvidia-smi to monitor my GPU’s utilisation during a machine learning workload today, I noticed a warning that I had never seen before:

WARNING: infoROM is corrupted at gpu 0000:3E:00.0.

This error message persists after reboot. What I have found on this forum suggests that infoROM corruption persisting after reboot likely means broken hardware. Are there any steps I can pursue to further troubleshoot this? If this is indeed a hardware defect, this card is not even six months old, and was working perfectly last week — I assume this would be covered under warranty from PNY. What steps would I need to take to obtain a replacement unit?

I am running driver version 510.47.03on Ubuntu 20.04.3 LTS (GNU/Linux 5.13.0-35-generic x86_64)

I have uninstalled then reinstalled nvidia-driver-510 and the issue persists.

For anyone possibly stumbling upon this in the future, PNY ended up issuing a RMA. It seems like this is indeed a hardware issue.

Thank you @utilisateur1907 for following up and sharing how you resolved this issue. I hope the downtime because of the RMA will not cause to big a disruption for you.

