deviceQuery hangs on gpu id 0

Hi everyone
We have encountered a problem in using Nvidia graphics cards, which we will explain in detail below.

We have installed Debian 11 on the servers with the Nvidia graphics card installed. then we installed Open Nebula for a hypervisor on it for virtualization.
Then, in order to use graphics cards in virtual machines created by Open Nebula, we proceeded according to the documentation and accessed and passthrough the GPU in the virtual machines.

debian servers cpu informations:

Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   43 bits physical, 48 bits virtual
CPU(s):                          48
On-line CPU(s) list:             0-47
Thread(s) per core:              2
Core(s) per socket:              24
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       AuthenticAMD
CPU family:                      23
Model:                           49
Model name:                      AMD Ryzen Threadripper 3960X 24-Core Processor
Stepping:                        0
Frequency boost:                 enabled
CPU MHz:                         2199.662
CPU max MHz:                     6635.1558
CPU min MHz:                     2200.0000
BogoMIPS:                        7600.17
Virtualization:                  AMD-V

vm on opennebula cpu informations:

Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   40 bits physical, 48 bits virtual
CPU(s):                          24
On-line CPU(s) list:             0-23
Thread(s) per core:              1
Core(s) per socket:              1
Socket(s):                       24
NUMA node(s):                    1
Vendor ID:                       AuthenticAMD
CPU family:                      23
Model:                           49
Model name:                      AMD Ryzen Threadripper 3960X 24-Core Processor
Stepping:                        0
CPU MHz:                         3797.790
BogoMIPS:                        7595.58
Virtualization:                  AMD-V
Hypervisor vendor:               KVM

the result of lspci in the debian server:

01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GA102 [GeForce RTX 3090] [10de:2204] (rev a1)
01:00.1 Audio device [0403]: NVIDIA Corporation GA102 High Definition Audio Controller [10de:1aef] (rev a1)
21:00.0 VGA compatible controller [0300]: NVIDIA Corporation GA102 [GeForce RTX 3090] [10de:2204] (rev a1)
21:00.1 Audio device [0403]: NVIDIA Corporation GA102 High Definition Audio Controller [10de:1aef] (rev a1)

the result of lspci in the vm:

01:01.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:2204] (rev a1)
01:02.0 Audio device [0403]: NVIDIA Corporation Device [10de:1aef] (rev a1)
01:03.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:2204] (rev a1)
01:04.0 Audio device [0403]: NVIDIA Corporation Device [10de:1aef] (rev a1)

To use the GPU in the virtual machine, the following drivers were first installed on Ubuntu 20.04, and then the Nvidia Docker driver was also installed.

nvidia-headless-510 nvidia-utils-510 cuda-toolkit-11-6

the result of nvidia-smi in the vm:


root@localhost:~# nvidia-smi
Mon Dec 26 11:24:54 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.108.03   Driver Version: 510.108.03   CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:01:01.0 Off |                  N/A |
|  0%   31C    P8    12W / 350W |      1MiB / 24576MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  On   | 00000000:01:03.0 Off |                  N/A |
|  0%   36C    P8    28W / 350W |      1MiB / 24576MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

result of nvidia-smi -L in VM:

root@localhost:~# nvidia-smi -L
GPU 0: NVIDIA GeForce RTX 3090 (UUID: GPU-9ba68e69-e2ce-5d2c-ad15-5884706fd049)
GPU 1: NVIDIA GeForce RTX 3090 (UUID: GPU-a71db27c-bec8-5f4b-cba3-b4c5a91cf19f)

docker driver : nvidia-docker2

According to the following command, we accessed the GPU using Docker:

docker run --rm --gpus all nvidia/cuda:11.2.0-runtime-ubuntu20.04 nvidia-smi

root@localhost:~# docker run --rm --gpus all nvidia/cuda:11.2.0-runtime-ubuntu20.04 nvidia-smi

Mon Dec 26 11:06:43 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.108.03   Driver Version: 510.108.03   CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:01:01.0 Off |                  N/A |
|  0%   32C    P8    12W / 350W |      1MiB / 24576MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  On   | 00000000:01:03.0 Off |                  N/A |
|  0%   36C    P8    27W / 350W |      1MiB / 24576MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

The problem is here:
when we access to gpu by id 0 in the docker, deviceQuery command hangs out, until we reboot the vm, other access to gpu not work, after we reboot the vm and the request access to gpu by id 1, the deviceQuery command worked!

for more detail:

first we run deviceQuery on gpu by id 1:

root@localhost:~# docker run -it --entrypoint /bin/bash --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=1 nvcr.io/nvidia/tensorflow:21.09-tf2-py3
root@47d6d53c6cde:/workspace# deviceQuery 
deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "NVIDIA GeForce RTX 3090"
  CUDA Driver Version / Runtime Version          11.6 / 9.0
  CUDA Capability Major/Minor version number:    8.6
  Total amount of global memory:                 24268 MBytes (25447170048 bytes)
MapSMtoCores for SM 8.6 is undefined.  Default to use 64 Cores/SM
MapSMtoCores for SM 8.6 is undefined.  Default to use 64 Cores/SM
  (82) Multiprocessors, ( 64) CUDA Cores/MP:     5248 CUDA Cores
  GPU Max Clock rate:                            1695 MHz (1.70 GHz)
  Memory Clock rate:                             9751 Mhz
  Memory Bus Width:                              384-bit
  L2 Cache Size:                                 6291456 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  1536
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 3
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.6, CUDA Runtime Version = 9.0, NumDevs = 1
Result = PASS

and when we run by if 0:

root@localhost:~# docker run -it --entrypoint /bin/bash --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=0 nvcr.io/nvidia/tensorflow:21.09-tf2-py3
root@77e5a4e8e1ad:/workspace# deviceQuery 
deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

in this step untile we reboot vm, cant run any access of gpu by id 1!