While running nvidia-smi
to monitor my GPU’s utilisation during a machine learning workload today, I noticed a warning that I had never seen before:
WARNING: infoROM is corrupted at gpu 0000:3E:00.0
.
This error message persists after reboot. What I have found on this forum suggests that infoROM corruption persisting after reboot likely means broken hardware. Are there any steps I can pursue to further troubleshoot this? If this is indeed a hardware defect, this card is not even six months old, and was working perfectly last week — I assume this would be covered under warranty from PNY. What steps would I need to take to obtain a replacement unit?
I am running driver version 510.47.03on Ubuntu 20.04.3 LTS (GNU/Linux 5.13.0-35-generic x86_64)