nvidia-gpu not accessable in virt-manager guest by pci-passthrough/no dev-handle for gpu

Hello guys
I am trying since many days now to get Nvidia’s 1050Ti up, running and accessable for cuda/Python/Tensorflow1.14 in a VMM guest machine via pci-passthrough.
Guest machine is generated by Qemu/KVM with virt-manager, Debian Stretch, cuda10.0-p1, Tensorflow1.14; these all match and work together on Debian Stretch (!no backports activated/installed -> no latest kernel), since I tested it before on bare metal machine (I know the is THE/one difference).
Looks as if all necessary virtio componentes work, also all necessary nvidia components and also CUDA installation did not show any error/suspicious messages like ‘no lib32 compatible directory found’ during CUDA install. Why I think it should work:
(and nvidia-gpu is not the primary one in host’s machine bios, if of a ny relevance)

| ~ @ StReTcH (debian)
| => lspci -Dnn | grep -i vga
0000:00:01.0 VGA compatible controller [0300]: Red Hat, Inc. QXL paravirtual graphic card [1b36:0100] (rev 04)
0000:05:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP107 [GeForce GTX 1050 Ti] [10de:1c82] (rev a1)


| ~ @ StReTcH (debian)
| => lsmod | grep virtio
virtio_rng 16384 0
rng_core 16384 1 virtio_rng
virtio_balloon 16384 0
virtio_console 28672 0
virtio_blk 20480 3
virtio_net 32768 0
virtio_pci 24576 0
virtio_ring 24576 6 virtio_blk,virtio_net,virtio_rng,virtio_balloon,virtio_console,virtio_pci
virtio 16384 6 virtio_blk,virtio_net,virtio_rng,virtio_balloon,virtio_console,virtio_pci


| ~ @ StReTcH (debian)
| => lsmod | grep nvidia
nvidia_drm 45056 0
nvidia_modeset 1044480 1 nvidia_drm
nvidia 16797696 1 nvidia_modeset
ipmi_msghandler 49152 1 nvidia
drm_kms_helper 155648 2 qxl,nvidia_drm
drm 360448 7 qxl,ttm,nvidia_drm,drm_kms_helper


| ~ @ StReTcH (debian)
| => nvidia-smi
Unable to determine the device handle for GPU 0000:05:00.0: Unknown Error


| ~ @ StReTcH (debian)
| =>

Anyone any Idea? Any help on this is highly appreciated.
Thank you all in advance.

Kind regards,
Roger
nvidia-bug-report.log.gz (1.01 MB)

Aug  5 20:03:40 StReTcH kernel: [  361.880526] NVRM: RmInitAdapter failed! (0x23:0x56:471)
Aug  5 20:03:40 StReTcH kernel: [  361.880785] NVRM: rm_init_adapter failed for device bearing minor number 0
Aug  5 20:03:40 StReTcH kernel: [  362.004386] NVRM: RmInitAdapter failed! (0x23:0x56:471)
Aug  5 20:03:40 StReTcH kernel: [  362.004583] NVRM: rm_init_adapter failed for device bearing minor number 0

Did you hide the hypervisor? If already done, did you check if the gpu works on bare metal?

Hello generix

Thanks for answering in general. To give you my response:
I am sure on 1050ti’s readiness and it is working as described, on baremetal installation of Cuda/Tensoflow and python-tools.
(BUT, I found it has a syncing problem via DVI connector when used with a monitor and NOT used headlessly, if this is of any interest)
I can use it with success on TF’s tutorial files and it then also computes TF examples via CUDA on 1050ti.
AND, also the ‘nvidia-smi’ gives me some useful output/infos on the 1050ti in that used case.

I am not sure on ‘hiding the hypervisor’.
What does it mean/how to check/configure doing this?
Do you have a (detailed?) HowTo on this you can lead me to?
So, 'til now, since not knowing better, I would answer this: NO.

Thanks in advance.
Regards,
Roger

Mark, right-click, select “search google for…-” gets me there:
https://forum.level1techs.com/t/hiding-hypervisor-from-vm-guest/132755

Hello generix

You think this is correct and at correct position in StReTcH.xml file:








?

Regards,
Roger

Ok, tried it with no success: destroyed my KVM/Qemu inst/virbr accessability.
Have to investigate on this and will be back on this.

Thanks to all for your help 'til now.

Regards,
Roger

It has to be added to the “cpu” section of the config. Here’s a more in-depth post about the needed changes:
https://superuser.com/questions/1387935/hiding-virtual-machine-status-from-guest-operating-system

Hello generix

I, now, successfully set the hypervisor’s hiding options as described in the last link you referenced to.
Unfortunately, I now get a kernel panic when starting the guest systems (now: Debian-10 Buster), so the guest does not come up:

Nevertheless, the new settings had been accepted by editing the BuSTeR.xml via virsh.
Also I set the ‘copy host CPU Config’ in virt-manager interface for this guest; otherwise changed XML will not be accepted, as I experienced.

Could you/someone else tell me what could be the (new) issue and its solution?

Thanks and kind regards,
Roger


The screen shot only shows the end of the panic, can you capture the start?