Hi,
We have been looking for leads on that errors for sometimes, apologizes if this is a known issue but I could not find any solution so far.
We have a processing server under debian12 with no display mounted with a nvidia card properly recognized by the system. We use this server for GPU processing not for display.
nvidia-detect
[root@bacterio2:~/NVIDIA]$ nvidia-detect
Detected NVIDIA GPUs:
d8:00.0 VGA compatible controller [0300]: NVIDIA Corporation GA102GL [RTX A5000] [10de:2231] (rev a1)
Checking card: NVIDIA Corporation GA102GL [RTX A5000] (rev a1)
Your card is supported by all driver versions.
Your card is also supported by the Tesla 470 drivers series.
It is recommended to install the
nvidia-driver
package.
We have installed the nvidia driver using debian nvidia-driver package.
Driver is installed correctly after reboot
lsmod
[root@bacterio2:~/NVIDIA]$ lsmod | grep nvidia
nvidia_uvm 1380352 0
nvidia_drm 73728 0
nvidia_modeset 1249280 1 nvidia_drm
drm_kms_helper 204800 5 drm_vram_helper,ast,nvidia_drm
video 65536 1 nvidia_modeset
nvidia 56410112 2 nvidia_uvm,nvidia_modeset
drm 614400 8 drm_kms_helper,drm_vram_helper,ast,nvidia,drm_ttm_helper,nvidia_drm,ttm
However we have no card enable
nvidia-smi
[root@bacterio2:~/NVIDIA]$ nvidia-smi
No devices were found
In the system log we have the following error
load error
Mar 01 09:55:28 bacterio2 kernel: NVRM: GPU 0000:d8:00.0: RmInitAdapter failed! (0x23:0xffff:1413)
Mar 01 09:55:28 bacterio2 kernel: NVRM: GPU 0000:d8:00.0: rm_init_adapter failed, device minor number 0
Mar 01 09:55:28 bacterio2 kernel: NVRM: GPU 0000:d8:00.0: RmInitAdapter failed! (0x23:0xffff:1413)
Mar 01 09:55:28 bacterio2 kernel: NVRM: GPU 0000:d8:00.0: rm_init_adapter failed, device minor number 0
Mar 01 10:05:26 bacterio2 kernel: NVRM: GPU 0000:d8:00.0: RmInitAdapter failed! (0x23:0xffff:1413)
Mar 01 10:05:26 bacterio2 kernel: NVRM: GPU 0000:d8:00.0: rm_init_adapter failed, device minor number 0
Mar 01 10:05:26 bacterio2 kernel: NVRM: GPU 0000:d8:00.0: RmInitAdapter failed! (0x23:0xffff:1413)
Mar 01 10:05:26 bacterio2 kernel: NVRM: GPU 0000:d8:00.0: rm_init_adapter failed, device minor number 0
Mar 01 10:05:28 bacterio2 kernel: NVRM: GPU 0000:d8:00.0: RmInitAdapter failed! (0x23:0xffff:1413)
Mar 01 10:05:28 bacterio2 kernel: NVRM: GPU 0000:d8:00.0: rm_init_adapter failed, device minor number 0
I have run the nvidia-bug-report.sh script in command line from ssh terminal.
nvidia-bug-report.log.gz (228.2 KB)