The inforom is a non-volatile storage device on the GPU. It is used to store various data. There is no public specification for its contents.
Corrupted means the inforom did not pass some sort of sanity check (e.g. checksum). Therefore the GPU driver won’t use or trust its contents.
There is no publicly available utility to fix this. The card is damaged. Unless it is under warranty, there isn’t anything you can do to repair it. However, as you are aware, some aspects of the card functionality are still operational. There is no public specification for the behavior of the card with a corrupted inforom.
After some tests - this warning ONLY appears after linux hibernation and indeed after i wake up computer i am getting some corrupted UI elements on few applications.
BUT before hibernation, card works 100% correct and no warning is displayed prior hibernation.
Is there a way i could do card stress test and confirm that indeed it’s hardware corrupted or maybe just simply there is a bug that corrupts memory address during hibernation process ?
Yeah it does sound like that - therefore it might be just an issue with cuda/nvidia driver itself. In such case, where should i submit bug report ?
I had this issue on CentOS, Fedora 26, 27, 28, 29 and multiple different Nvidia/Cuda drivers (Gui corrupted after hibernation - which might be related to that mentioned warning)
Also both of those sites have different ‘status’ not matching between each other.
Second question is that I got email (referring to partners.nvidia.com, where i do not have access to). with changed status (on that site) “will not fix” without an explanation, so maybe any developer here could help and give some insight of why it won’t be fixed ?
The will not fix indication refers to a specific driver branch (R415).
Our internal testing indicates that this issue is fixed in 418.76 and later. I suggest you move forward and retest with a newer/later driver in the R418 branch, 418.76 or later.
Quick question - as I was using CUDA drivers and latest (fixed drivers are not “CUDA”) - can i mix them both or do I need for CUDA update which currently is 418.67 ?
±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
±----------------------------------------------------------------------------+
WARNING: infoROM is corrupted at gpu 0000:0B:00.0
Do we have any solution for this error ?
We are already using latest driver which mentioned on this post, but still issue remains.
I counter the same problem on windows (local) when interrupting my deep learning on jupyter lab. since the windows system cannot obtain the permission like Linux -sudo, likely? same that my Linux don’t have this problem. to solve this for mine is also ez on windows, just reboot the system and rerun the algorithm. my gpu is 3090, 350w, 24g.