warning message of nvidia-smi

Dear forum,

I use some K40 on CentOS6 workstation. I have a question about nvidia-smi report.
It always reports this warning.

±-----------------------------------------------------+
| NVIDIA-SMI 340.65 Driver Version: 340.65 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K40c Off | 0000:02:00.0 Off | 0 |
| 23% 34C P0 62W / 235W | 23MiB / 11519MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 1 Tesla K40c Off | 0000:03:00.0 Off | 0 |
| 23% 30C P0 61W / 235W | 23MiB / 11519MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 2 Tesla K40c Off | 0000:82:00.0 Off | 0 |
| 23% 30C P0 61W / 235W | 23MiB / 11519MiB | 0% Default |
±------------------------------±---------------------±---------------------+
| 3 Tesla K40c Off | 0000:83:00.0 Off | 0 |
| 23% 29C P0 67W / 235W | 23MiB / 11519MiB | 77% Default |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Compute processes: GPU Memory |
| GPU PID Process name Usage |
|=============================================================================|
| No running compute processes found |
±----------------------------------------------------------------------------+
WARNING: infoROM is corrupted at gpu 0000:03:00.0

I reinstalled os, changed cuda driver and changed server but it wasn’t fixed.
The warning is occured by one card. This card seems to work well.

Does anyone know about this warning ?
I would like to fix it if possible.

I’m not sure you’ll be able to fix it. Where did you get these K40’s?

Were they shipped as part of an OEM-qualified server?

Thank you for your reply.
K40s are AOC-GPU-NVK40C from SuperMicro.
The server is from SuperMicro too.
It is OEM-qualified.

It was normal when I used K40s on SYS-7048GR-TR.
The error was occured when I moved some of these on SYS-4028GR-TR.
I put K40s back into SYS-7048GR-TR, but the warning still appears.

You may want to ask SuperMicro about it.

Noted with thanks. I ask SuperMicro.

Since the OP didn’t provide a resolution, I’ll chime in.

For my case, I had changed a GTX Titan Black to use a Corsair Hydro Series HG10 N780 GPU Liquid Cooling Bracket. I should’ve known better than to do it with a motherboard with a known bad slot and/or PLX chip, however, and somewhere along a hard shutdown when hung, nvidia-smi prompted with the same “infoROM is corrupted” message upon reboot.

I was able to re-flash the same BIOS image my card originally came with w/ nvflash for Linux with the help of techPowerUp’s VGA Bios Collection, and the problem disappeared upon next boot. The Windows version of nvflash kept hanging the system when I was trying to save an image as a first test – most likely the bad motherboard’s fault, but the Linux one didn’t have the issue, so I took the risk and problem solved.