RmInitAdapter failed on Ubuntu 22.04 with Quadro GV100 and isn't detected by nvidia-smi

Hello all,
I’ve tried using both nvidia-driver-535 and nvidia-driver-470 from the Ubuntu graphics drivers ppa and the card isn’t picked up by nvidia-smi. I followed the “standard” procedure of blacklising nouveau, adding the ppa, and installing the driver. dmesg output shows the following which I’m not sure is relevant or not:

[    1.793598] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  470.223.02  Sat Oct  7 15:39:11 UTC 2023
[   96.250333] NVRM: GPU 0000:00:05.0: RmInitAdapter failed! (0x23:0xffff:1195)
[   96.250379] NVRM: GPU 0000:00:05.0: rm_init_adapter failed, device minor number 0
[   96.271440] NVRM: GPU 0000:00:05.0: RmInitAdapter failed! (0x23:0xffff:1195)
[   96.271479] NVRM: GPU 0000:00:05.0: rm_init_adapter failed, device minor number 0

nvidia-smi shows no devices found.

The same error exists with both driver versions. I’ve attached a nvidia-bug-report in hopes that I’m missing something obvious. Thanks for any help you can provide!

nvidia-bug-report.log (1.2 MB)

-kd

I suspect the error is related to gpu passthrough. Does the nvidia gpu work in the VM if you reboot the host?

Hi @generix ,

Our internal cluster setup which has GPU also faces a similar issue intermittently. We have a L4 gpu that is being passthrough from our physical server to VM created via KVM (virsh). The driver version is 535.104.05. We’re able to run GPU workloads on the VM, but suddenly we start getting this “NVRM: GPU 0000:07:00.0: RmInitAdapter failed!” error logs and as you exactly stated, the gpu starts working again once we reboot the VM. We’re not sure what could cause this frequent intermittent GPU failures. Can you explain more on why the GPU passthrough may induce this error?