I’m passing through RTX 6000 Ada Generation card as a PCIe device on Proxmox to the VM. The card is recognized by the guest VM as NVIDIA and I’ve installed the latest proprietary Linux driver 530 and also tried 525. However, I’m getting the error message below (dmesg).
[ 3.808057] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 530.30.02 Wed Feb 22 04:11:39 UTC 2023
...
[ 3.849336] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 530.30.02 Wed Feb 22 03:45:40 UTC 2023
...
[ 9.202528] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x11:0x45:2529)
[ 9.203943] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
[ 9.250008] [drm:nv_drm_load [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NvKmsKapiDevice
[ 9.250267] [drm:nv_drm_probe_devices [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to register device
[ 9.250317] BUG: kernel NULL pointer dereference, address: 0000000000000040
[ 9.250320] #PF: supervisor read access in kernel mode
[ 9.250324] #PF: error_code(0x0000) - not-present page
...
[ 9.250336] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 9.250337] RIP: 0010:_nv000460kms+0x11/0x50 [nvidia_modeset]
[ 9.250367] Code: f8 b9 08 00 00 00 89 45 f8 be 26 00 00 00 e8 d6 06 f6 ff c9 c3 0f 1f 40 00 f3 0f 1e fa 55 b8 01 00 00 00 48 89 e5 48 83 ec 10 <8b> 57 40 48 c7 45 f8 00 00 00 00 85 d2 75 08 c9 c3 66 0f 1f 44 00
[ 9.250369] RSP: 0018:ffffa7bdc4213aa8 EFLAGS: 00010282
[ 9.250371] RAX: 0000000000000001 RBX: ffff8d43c4f6b000 RCX: 0000000000000000
[ 9.250373] RDX: 0000000000000001 RSI: ffff8d43ca002c00 RDI: 0000000000000000
[ 9.250374] RBP: ffffa7bdc4213ab8 R08: 0000000000000000 R09: 0000000000000000
[ 9.250375] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8d43c4f6b000
[ 9.250377] R13: ffff8d440d6f2b40 R14: 0000000000000000 R15: ffff8d43ca002c10
[ 9.250380] FS: 00007fa3850d2740(0000) GS:ffff8d472fcc0000(0000) knlGS:0000000000000000
[ 9.250382] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 9.250383] CR2: 0000000000000040 CR3: 0000000112fea000 CR4: 0000000000750ee0
[ 9.250386] PKRU: 55555554
[ 9.250388] Call Trace:
[ 9.250390] <TASK>
[ 9.250393] nv_drm_master_set+0x25/0x50 [nvidia_drm]
[ 9.250399] drm_new_set_master+0xa9/0x130 [drm]
[ 9.250425] drm_master_open+0x93/0xc0 [drm]
[ 9.250446] drm_open+0xf8/0x270 [drm]
[ 9.250468] drm_stub_open+0xba/0x140 [drm]
[ 9.250492] chrdev_open+0xc7/0x250
[ 9.250496] ? cdev_device_add+0xa0/0xa0
[ 9.250499] do_dentry_open+0x16a/0x400
[ 9.250503] vfs_open+0x2d/0x40
[ 9.250506] do_open+0x223/0x490
[ 9.250508] path_openat+0x11d/0x2c0
[ 9.250511] do_filp_open+0xb2/0x160
[ 9.250514] ? __check_object_size+0x23/0x30
[ 9.250517] do_sys_openat2+0xb3/0x180
[ 9.250520] __x64_sys_openat+0x55/0xa0
[ 9.250522] do_syscall_64+0x5c/0x90
[ 9.250525] ? exit_to_user_mode_prepare+0x3b/0xd0
[ 9.250529] ? syscall_exit_to_user_mode+0x2a/0x50
[ 9.250532] ? do_syscall_64+0x69/0x90
[ 9.250534] ? exit_to_user_mode_prepare+0x3b/0xd0
[ 9.250536] ? syscall_exit_to_user_mode+0x2a/0x50
[ 9.250538] ? do_syscall_64+0x69/0x90
[ 9.250540] ? do_syscall_64+0x69/0x90
[ 9.250542] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 9.250545] RIP: 0033:0x7fa384f146eb
Driver versions tried:
Package: nvidia-driver-530
Version: 530.30.02-0ubuntu1
and
Package: nvidia-driver-525
Version: 525.85.12-0ubuntu1
OS details:
Linux cpu-vm1 5.19.0-35-generic #36~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Fri Feb 17 15:17:25 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04.2 LTS"
Just for reference, RTX 3090 is passed through to another VM and works fine on the same host system.
I’ve ran nvidia bug report and attached it here. Please suggest a fix or let me know if you require other information.
nvidia-bug-report.log.gz (1.8 MB)