Dear generix,
following your suggestion I added the soud devices to the guest machine:
[root@rocky92test ~]# lspci | grep NVIDIA
00:07.0 VGA compatible controller: NVIDIA Corporation AD102GL [L6000 / RTX 6000 Ada Generation] (rev a1)
00:08.0 VGA compatible controller: NVIDIA Corporation AD102GL [L6000 / RTX 6000 Ada Generation] (rev a1)
00:0b.0 Audio device: NVIDIA Corporation AD102 High Definition Audio Controller (rev a1)
00:0c.0 Audio device: NVIDIA Corporation AD102 High Definition Audio Controller (rev a1)
The dmesg shows the driver recognized them too
root@rocky92test ~]# dmesg | grep -i nvidia
[ 2.909121] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:0b.0/sound/card0/input6
[ 2.909264] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:0b.0/sound/card0/input7
[ 2.909289] input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:0b.0/sound/card0/input8
[ 2.909313] input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:0b.0/sound/card0/input9
[ 2.973503] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:0c.0/sound/card1/input10
[ 2.973620] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:0c.0/sound/card1/input11
[ 2.973715] input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:0c.0/sound/card1/input12
[ 2.973817] input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:0c.0/sound/card1/input13
[ 4.549633] nvidia: loading out-of-tree module taints kernel.
[ 4.549665] nvidia: module license ‘NVIDIA’ taints kernel.
[ 4.566586] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[ 4.629218] nvidia-nvlink: Nvlink Core is being initialized, major device number 236
[ 4.641680] nvidia 0000:00:07.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
[ 4.704232] nvidia 0000:00:08.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
[ 4.754876] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 545.23.06 Sun Oct 15 17:43:11 UTC 2023
[ 4.831141] nvidia_uvm: module uses symbols from proprietary module nvidia, inheriting taint.
[ 4.909491] nvidia-uvm: Loaded the UVM driver, major device number 234.
[ 4.946512] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 545.23.06 Sun Oct 15 17:22:43 UTC 2023
[ 4.951867] [drm] [nvidia-drm] [GPU ID 0x00000007] Loading driver
[ 4.951884] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:00:07.0 on minor 1
[ 4.952760] [drm] [nvidia-drm] [GPU ID 0x00000008] Loading driver
[ 4.952776] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:00:08.0 on minor 2
but again when I run nvidia-smi
[root@rocky92test ~]# nvidia-smi
No devices were found
and in the /var/log/messages I obtain
Nov 6 11:52:47 rocky92test kernel: ACPI Warning: _SB.PCI0.S38._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20211217/nsarguments-61)
Nov 6 11:52:52 rocky92test kernel: NVRM: GPU 0000:00:07.0: RmInitAdapter failed! (0x11:0x45:2550)
Nov 6 11:52:52 rocky92test kernel: NVRM: GPU 0000:00:07.0: rm_init_adapter failed, device minor number 0
Nov 6 11:52:56 rocky92test kernel: NVRM: GPU 0000:00:07.0: RmInitAdapter failed! (0x11:0x45:2550)
Nov 6 11:52:56 rocky92test kernel: NVRM: GPU 0000:00:07.0: rm_init_adapter failed, device minor number 0
Nov 6 11:52:56 rocky92test kernel: ACPI Warning: _SB.PCI0.S40._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20211217/nsarguments-61)
Nov 6 11:53:01 rocky92test kernel: NVRM: GPU 0000:00:08.0: RmInitAdapter failed! (0x11:0x45:2550)
Nov 6 11:53:01 rocky92test kernel: NVRM: GPU 0000:00:08.0: rm_init_adapter failed, device minor number 1
Nov 6 11:53:06 rocky92test kernel: NVRM: GPU 0000:00:08.0: RmInitAdapter failed! (0x11:0x45:2550)
Nov 6 11:53:06 rocky92test kernel: NVRM: GPU 0000:00:08.0: rm_init_adapter failed, device minor number 1
Last but not least, looking in the forum I saw another user having a different problem in my usage scenario
where in the last message he states:
Just to update on this and close the topic. I’ve talked to Nvidia and was informed that vGPU approach is required whether it’s a pass-through mode or splitting GPU to multiple users. vGPU requires a valid license and installation of video driver on the host and on the guest. To my best knowledge there is no way to directly pass through a professional GPU without using vGPU.
Before I’ve figured out the solution with vGPU I’ve tried all kinds of Proxmox tricks like setting kernel boot parameters, changing GPU PCIe physical slot, etc. GPU was visible in the guest OS but the driver would not work with it. On the same guest VM I can pass RTX 3090 without issues.
RTX 6000 Ada works fine with the same linux driver on bare metal.
May you confirm this is the case ?
Thanks in advance
Claudio