Error running cuda on VM with GPU passthrough. cuda.get_device_name() returns 802, not initialized

We have a h/w setup with multiple H100s ( from lspci -d 10de:). I have setup one GPU to passthrough to my Qemu/KVM VM. After installing drivers on the guest, I can see that it is attached:
From nvidia-smi in the guest, I can see the single GPU I attached.

name, pci.bus_id, vbios_version, driver_version
NVIDIA H100 80GB HBM3, 00000000:01:00.0, 96.00.61.00.01, 550.54.15

CUDA Version is 12.4, Driver Version: 550.54.15
MIG is disabled, Persistence-M is Off.
Guest is on Ubuntu 22.04.
But I still get errors trying to run some cuda samples directly or via pytorch.

>>> import os,torch
>>> torch.cuda.is_available()
.../python3.10/site-packages/torch/cuda/__init__.py:118: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 802: system not yet initialized (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
  return torch._C._cuda_getDeviceCount() > 0
False

Setting the following env variables help return is_avaialble() as True, but fails in the next one:

>>> os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"
>>> os.environ["CUDA_VISIBLE_DEVICES"]="0"
>>> os.environ["PYTORCH_NVML_BASED_CUDA_CHECK"]="1"
>>> torch.cuda.is_available()
True
>>> torch.cuda.get_device_name(0)
...
  File ".../site-packages/torch/cuda/__init__.py", line 293, in _lazy_init
    torch._C._cuda_init()
RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 802: system not yet initialized

Thank You

1 Like

I’m having the same issue, very similar setup:
VM OS: Ubuntu 22.4
NVIDIA-SMI 550.90.07
Driver Version: 550.90.07
CUDA Version: 12.4
GPU: 1 H100 SXM

>>> import torch
>>> torch.cuda.is_available()
/home/ubuntu/venv/lib/python3.10/site-packages/torch/cuda/__init__.py:118: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 802: system not yet initialized (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
  return torch._C._cuda_getDeviceCount() > 0
False

I’ve seen others say to install fabric manager but that should not be the case with 1 GPU. I’ve tried restarting and that did not solve it either.

Any help would be very appreciated!