nvidia-gpu not accessable in virt-manager guest by pci-passthrough/no dev-handle for gpu

roger.weihrauch · August 5, 2019, 6:20pm

Hello guys
I am trying since many days now to get Nvidia’s 1050Ti up, running and accessable for cuda/Python/Tensorflow1.14 in a VMM guest machine via pci-passthrough.
Guest machine is generated by Qemu/KVM with virt-manager, Debian Stretch, cuda10.0-p1, Tensorflow1.14; these all match and work together on Debian Stretch (!no backports activated/installed → no latest kernel), since I tested it before on bare metal machine (I know the is THE/one difference).
Looks as if all necessary virtio componentes work, also all necessary nvidia components and also CUDA installation did not show any error/suspicious messages like ‘no lib32 compatible directory found’ during CUDA install. Why I think it should work:
(and nvidia-gpu is not the primary one in host’s machine bios, if of a ny relevance)

| ~ @ StReTcH (debian)
| => lspci -Dnn | grep -i vga
0000:00:01.0 VGA compatible controller [0300]: Red Hat, Inc. QXL paravirtual graphic card [1b36:0100] (rev 04)
0000:05:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP107 [GeForce GTX 1050 Ti] [10de:1c82] (rev a1)

| ~ @ StReTcH (debian)
| => lsmod | grep virtio
virtio_rng 16384 0
rng_core 16384 1 virtio_rng
virtio_balloon 16384 0
virtio_console 28672 0
virtio_blk 20480 3
virtio_net 32768 0
virtio_pci 24576 0
virtio_ring 24576 6 virtio_blk,virtio_net,virtio_rng,virtio_balloon,virtio_console,virtio_pci
virtio 16384 6 virtio_blk,virtio_net,virtio_rng,virtio_balloon,virtio_console,virtio_pci

| ~ @ StReTcH (debian)
| => lsmod | grep nvidia
nvidia_drm 45056 0
nvidia_modeset 1044480 1 nvidia_drm
nvidia 16797696 1 nvidia_modeset
ipmi_msghandler 49152 1 nvidia
drm_kms_helper 155648 2 qxl,nvidia_drm
drm 360448 7 qxl,ttm,nvidia_drm,drm_kms_helper

| ~ @ StReTcH (debian)
| => nvidia-smi
Unable to determine the device handle for GPU 0000:05:00.0: Unknown Error

| ~ @ StReTcH (debian)
| =>

Anyone any Idea? Any help on this is highly appreciated.
Thank you all in advance.

Kind regards,
Roger
nvidia-bug-report.log.gz (1.01 MB)

generix · August 6, 2019, 7:55am

Aug  5 20:03:40 StReTcH kernel: [  361.880526] NVRM: RmInitAdapter failed! (0x23:0x56:471)
Aug  5 20:03:40 StReTcH kernel: [  361.880785] NVRM: rm_init_adapter failed for device bearing minor number 0
Aug  5 20:03:40 StReTcH kernel: [  362.004386] NVRM: RmInitAdapter failed! (0x23:0x56:471)
Aug  5 20:03:40 StReTcH kernel: [  362.004583] NVRM: rm_init_adapter failed for device bearing minor number 0

Did you hide the hypervisor? If already done, did you check if the gpu works on bare metal?

roger.weihrauch · August 6, 2019, 4:12pm

Hello generix

Thanks for answering in general. To give you my response:
I am sure on 1050ti’s readiness and it is working as described, on baremetal installation of Cuda/Tensoflow and python-tools.
(BUT, I found it has a syncing problem via DVI connector when used with a monitor and NOT used headlessly, if this is of any interest)
I can use it with success on TF’s tutorial files and it then also computes TF examples via CUDA on 1050ti.
AND, also the ‘nvidia-smi’ gives me some useful output/infos on the 1050ti in that used case.

I am not sure on ‘hiding the hypervisor’.
What does it mean/how to check/configure doing this?
Do you have a (detailed?) HowTo on this you can lead me to?
So, 'til now, since not knowing better, I would answer this: NO.

Thanks in advance.
Regards,
Roger

generix · August 6, 2019, 4:39pm

Mark, right-click, select “search google for…-” gets me there:
[url]https://forum.level1techs.com/t/hiding-hypervisor-from-vm-guest/132755[/url]

roger.weihrauch · August 6, 2019, 5:31pm

Hello generix

You think this is correct and at correct position in StReTcH.xml file:

?

Regards,
Roger

roger.weihrauch · August 6, 2019, 6:40pm

Ok, tried it with no success: destroyed my KVM/Qemu inst/virbr accessability.
Have to investigate on this and will be back on this.

Thanks to all for your help 'til now.

Regards,
Roger

generix · August 6, 2019, 6:51pm

It has to be added to the “cpu” section of the config. Here’s a more in-depth post about the needed changes:
[url]linux - Hiding Virtual machine status from guest operating system - Super User

roger.weihrauch · August 11, 2019, 7:33pm

Hello generix

I, now, successfully set the hypervisor’s hiding options as described in the last link you referenced to.
Unfortunately, I now get a kernel panic when starting the guest systems (now: Debian-10 Buster), so the guest does not come up:

Nevertheless, the new settings had been accepted by editing the BuSTeR.xml via virsh.
Also I set the ‘copy host CPU Config’ in virt-manager interface for this guest; otherwise changed XML will not be accepted, as I experienced.

Could you/someone else tell me what could be the (new) issue and its solution?

Thanks and kind regards,
Roger

roger.weihrauch · August 11, 2019, 7:41pm

generix · August 14, 2019, 9:35am

The screen shot only shows the end of the panic, can you capture the start?

Topic		Replies	Views
Problem installing Nvidia driver with VM and GPU passthrough. Linux	1	2192	December 6, 2017
GTX 1080 & KVM PCI passthrough to guest CUDA Setup and Installation	12	17591	February 23, 2017
Unable to determine the device handle for GPU 0000:00:09.0 CUDA Setup and Installation	5	4932	December 22, 2017
How to enable GPU passthrought on ubuntu(Guest: windows) Linux	6	2752	May 20, 2021
X window can't get display with some nvidia GPU driver errors from virtual machine boot Linux	5	902	November 11, 2019
could not install nvidia(k620) driver in a guest linux vm CUDA Setup and Installation	2	1523	February 2, 2018
GPU hardware detected but unable to start (error code 10) NVIDIA Virtual GPU Technology	2	28335	June 1, 2015
Ubuntu not recognize NVIDIA GPU Linux	12	4912	August 22, 2019
GPU in a VM pass-through setting NVIDIA Virtual GPU Drivers	19	71768	April 29, 2021
ENOMEM when running CUDA sample on host GPU where another GPU is passed through via IOMMU/vfio-pci Linux	1	792	May 19, 2019

nvidia-gpu not accessable in virt-manager guest by pci-passthrough/no dev-handle for gpu

Related topics