GPU in a VM pass-through setting

Hi All,

My system environment is as below:
Host system: Windows Server 2019
GPU: NVIDIA Titan RTX
Guest system: Hyper-V Ubuntu Linux 18.04.2 LTS VM.

I have done the GPU pass-through on my Host system to dismount the GPU and mount it to my Hyper-V Ubuntu Linux VM by referring to this documentation: https://docs.nvidia.com/grid/5.0/grid-vgpu-user-guide/index.html#using-gpu-pass-through-windows-server-hyper-v.

And according to the documentation, I am required to install the NVIDIA vGPU software graphic driver.
But from https://docs.nvidia.com/grid/latest/grid-vgpu-release-notes-microsoft-windows-server/index.html#hardware-configuration
The Titan RTX GPU is not listed as the supported GPU, am I able to install the NVIDIA vGPU software with the Titan RTX GPU on the Hyper-V Ubuntu Linux VM?

I am working on installing CUDA on my Hyper-V Ubuntu Linux VM and I am encounter the error as show below when running the nvidia-smi:

test-Virtual-Machine:~/NVIDIA_CUDA-10.1_Samples/bin/x86_64/linux/release$ nvidia-smi 
Unable to determine the device handle for GPU 8C8A:00:00.0: Unknown Error

I am able to install CUDA but encounter the following error when running the CUDA sample binaries to verify the CUDA installation:

test-Virtual-Machine:~/NVIDIA_CUDA-10.1_Samples/bin/x86_64/linux/release$ ./deviceQuery
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 101
-> invalid device ordinal
Result = FAIL
test-Virtual-Machine:~/NVIDIA_CUDA-10.1_Samples/bin/x86_64/linux/release$ ./bandwidthTest
[CUDA Bandwidth Test] - Starting...
Running on...

cudaGetDeviceProperties returned 101
-> invalid device ordinal
CUDA error at bandwidthTest.cu:242 code=101(cudaErrorInvalidDevice) "cudaSetDevice(currentDevice)"

For my case, what should I do in order for me to enable my Hyper-V VM to be able to use the GPU and run CUDA successfully?

Thanks and regards,
YiYang

1 Like

Hi

The only GPUs that support vGPU are Tesla and the RTX 6000 & RTX 8000. The TITAN cannot be used with vGPU, so you will need to run the TITAN in Passthrough / DDA for your Ubuntu VM.

As the TITAN doesn’t support vGPU, you can use the standard driver from the NVIDIA drivers page: https://www.nvidia.co.uk/Download/index.aspx?lang=en-uk.

Regards

MG

Hi Sir,

I have successfully done the GPU pass-through by following the instruction here:https://docs.nvidia.com/grid/5.0/grid-vgpu-user-guide/index.html#using-gpu-pass-through-windows-server-hyper-v

I have tried to install the standard driver before.
What I did was I installed CUDA first by following this documentation:https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#ubuntu-installation.
Then I downloaded the NVIDIA drivers from the page you suggested, but after installing the NVIDIA driver and reboot my Hyper-V Ubuntu Linux VM, my VM be cannot be boot and just stuck at the black screen.
The NVIDIA drivers that I have downloaded and installed from the page for Titan RTX are the Linux 64-bit Operating system and Linux 64-bit Ubuntu 18.04 Operating System.

Thank You and Regards,
YiYang

Hi

You should install the NVIDIA drivers in the OS first, that way anything that relies on the GPU will have access to it.

Regarding the black screen … If you were using a Hyper-V Console to connect to the VM, after installing the NVIDIA driver, you may now need to use a proper remoting protocol to connect to it. Depending on your requirements, maybe something like VNC. Or if you’d prefer to use RDP, then connect to the VM via SSH and install XRDP to allow RDP connectivity.

Regards

MG

Titan doesn’t support virtualization at all.

Hi MrGRID,

I have install the NVIDIA driver in my hyper-V Ubuntu Linux 18.04.2 LTS OS VM and I am connecting to the VM using RDP.
The NVIDIA driver version I have installed is 430.34 for Linux 64-bit OS.
But I still encounter the same error when running the nvidia-smi command as below:

test-Virtual-Machine:~/Desktop$ nvidia-smi 
Unable to determine the device handle for GPU 8C8A:00:00.0: Unknown Error

Does this mean I am not able to use the Titan RTX GPU in my hyper-V Ubuntu Linux VM with the standard NVIDIA drivers?

Thanks and Regards,
YiYang

Hi sschaber,

Is it mean if I am using the Hyper-V Ubuntu Linux VM, I must have the vGPU drivers and NVIDIA GPU that support virtualization in order for the NVIDIA GPU to work in the VM?

Thanks and regards,
YiYang

It means that Titan (consumer product) is not enabled for virtualization (including Passthrough) and therefore it is not working or at least not supported.

Hi sschaber,

Does it means that only GPUs, like Tesla V100/P100, RTX6000/8000, can be passed through to the linux vm and work properly with standard NVIDIA driver?

Should I install vGPU driver in linux vm or standard GPU driver is OK?

Hi

The GPUs listed here are supported for vGPU: Supported Products :: NVIDIA Virtual GPU Software Documentation

Passthrough GPUs are Quadro P2000 and above.

Don’t confuse vGPU with Passthrough, they are not the same.

Regards

MG

Hello,

My environment is;
Host: Windows Server 2019
GPU: NVIDIA RTX 2080Ti
Guest system: Hyper-V Ubuntu Linux 20.04

I have the same problem with YiYang.

root@ds-prod:/home/arute# nvidia-smi
Unable to determine the device handle for GPU 421F:00:00.0: Unknown Error
root@ds-prod:/home/arute# dpkg -l | grep nvidia-
ii  libnvidia-cfg1-455:amd64                   455.38-0ubuntu0.20.04.1               amd64        NVIDIA binary OpenGL/GLX configuration library
ii  libnvidia-common-455                       455.38-0ubuntu0.20.04.1               all          Shared files used by the NVIDIA libraries
ii  libnvidia-compute-455:amd64                455.38-0ubuntu0.20.04.1               amd64        NVIDIA libcompute package
ii  libnvidia-decode-455:amd64                 455.38-0ubuntu0.20.04.1               amd64        NVIDIA Video Decoding runtime libraries
ii  libnvidia-encode-455:amd64                 455.38-0ubuntu0.20.04.1               amd64        NVENC Video Encoding runtime library
ii  libnvidia-extra-455:amd64                  455.38-0ubuntu0.20.04.1               amd64        Extra libraries for the NVIDIA driver
ii  libnvidia-fbc1-455:amd64                   455.38-0ubuntu0.20.04.1               amd64        NVIDIA OpenGL-based Framebuffer Capture runtime library
ii  libnvidia-gl-455:amd64                     455.38-0ubuntu0.20.04.1               amd64        NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and Vulkan ICD
ii  libnvidia-ifr1-455:amd64                   455.38-0ubuntu0.20.04.1               amd64        NVIDIA OpenGL-based Inband Frame Readback runtime library
ii  nvidia-compute-utils-455                   455.38-0ubuntu0.20.04.1               amd64        NVIDIA compute utilities
ii  nvidia-dkms-455                            455.38-0ubuntu0.20.04.1               amd64        NVIDIA DKMS package
ii  nvidia-driver-455                          455.38-0ubuntu0.20.04.1               amd64        NVIDIA driver metapackage
ii  nvidia-kernel-common-455                   455.38-0ubuntu0.20.04.1               amd64        Shared files used with the kernel module
ii  nvidia-kernel-source-455                   455.38-0ubuntu0.20.04.1               amd64        NVIDIA kernel source package
ii  nvidia-prime                               0.8.14                                all          Tools to enable NVIDIA's Prime
ii  nvidia-settings                            440.82-0ubuntu0.20.04.1               amd64        Tool for configuring the NVIDIA graphics driver
ii  nvidia-utils-455                           455.38-0ubuntu0.20.04.1               amd64        NVIDIA driver support binaries
ii  screen-resolution-extra                    0.18build1                            all          Extension for the nvidia-settings control panel
ii  xserver-xorg-video-nvidia-455              455.38-0ubuntu0.20.04.1               amd64        NVIDIA binary Xorg driver

I have spend like 18 hours to solve this problem. Nvidia X Server Settings is empty screen. Also I have attached some links of screenshot that can help solve the problem.
https://ibb.co/DrbkF4Y
https://ibb.co/grKP4TS
https://ibb.co/DMV57fL

Can you please help me about this issue?

Thank you.

Hi

The 2080Ti is a Consumer (GeForce) GPU. You should be running Quadro or Tesla for this to work.

Regards

MG

Thanks for the reply @MrGRID,

Is there any way to virtualize 2080Ti with different host like KVM, VMWare. Because as a company we bought 2 2080Ti for AI.

What we want is we need 1 prod VM and 1 test VM. Both GPUs should usable for those 2 VMs.

Regards

Hi

Unfortunately you should have purchased Quadro or Tesla GPUs if you need to virtualise them. GeForce are Consumer GPUs and do not support virtualisation.

Best advice, send them back to where you purchased them from and purchase a pair of RTX 5000 / 6000 or 8000 GPUs. These can be used in either Passthrough without a vGPU license and a standard Quadro driver, or you can use them with vGPU and run multiple instances per GPU (which does requires a license).

Regards

MG

Thank you so much for the advise.

We’ll change the GPU’s and probably we ll buy 2x RTX5000. Supported Products :: NVIDIA Virtual GPU Software Documentation at this document i can’t see RTX5000 i think this is not updated document.

So with RTX 5000 we can attach both GPU’s to multiple VMs.

For example:

1 Prod VM
1 Test VM

2 GPU’s memory is fully available to use sometimes for Prod VM and sometimes for Test VM.

Have I got it right?

I’ve bought a Quadro P2200 card and it is also not working with Discrete Device Assignment. Wheter in Windows nor in Linux! Could not recommend therfore to buy a Nvidia Quadro!

@ sschaberModerator

Seems that from " 30 March 2021" Nvidia support passthrough for Consumer GPU, like RTX2080 and RTX3080/3090.
It is real and possible on Hyper V with a win10 guest OS? or only under KVM hypervisor?

Hi Alessandro,

I don’t test consumer products. The announcement only mentions KVM as a tested environment so I would assume that could work on HyperV but for sure it would be untested.

Regards
Simon

tnks so much

Hello @sschaber. Im running the same setup as the author of this post, but with 8 x Tesla T4 cards on a bare metal AWS machine (g4dn.metal). So those are non-consumer cards. Windows Server 2019 with HyperV and Ubuntu 20.04 guest. I’m getting the following error before VM startup:

Virtual Pci Express Port (Instance ID 02583E40-07CB-4C02-9CF4-B4D3A55868F9): Failed to Power on with Error 'The hypervisor could not perform the operation because the object or value was either already in use or being used for a purpose that would not permit completing the operation.'.

We have a commercial product upcoming and rely on GPU passthrough to work properly. Any help and pointer highly appreciated.