Error running cuda on VM with GPU passthrough. cuda.get_device_name() returns 802, not initialized

julianjs1234 · May 8, 2024, 3:10pm

We have a h/w setup with multiple H100s ( from lspci -d 10de:). I have setup one GPU to passthrough to my Qemu/KVM VM. After installing drivers on the guest, I can see that it is attached:
From nvidia-smi in the guest, I can see the single GPU I attached.

name, pci.bus_id, vbios_version, driver_version
NVIDIA H100 80GB HBM3, 00000000:01:00.0, 96.00.61.00.01, 550.54.15

CUDA Version is 12.4, Driver Version: 550.54.15
MIG is disabled, Persistence-M is Off.
Guest is on Ubuntu 22.04.
But I still get errors trying to run some cuda samples directly or via pytorch.

>>> import os,torch
>>> torch.cuda.is_available()
.../python3.10/site-packages/torch/cuda/__init__.py:118: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 802: system not yet initialized (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
  return torch._C._cuda_getDeviceCount() > 0
False

Setting the following env variables help return is_avaialble() as True, but fails in the next one:

>>> os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"
>>> os.environ["CUDA_VISIBLE_DEVICES"]="0"
>>> os.environ["PYTORCH_NVML_BASED_CUDA_CHECK"]="1"
>>> torch.cuda.is_available()
True
>>> torch.cuda.get_device_name(0)
...
  File ".../site-packages/torch/cuda/__init__.py", line 293, in _lazy_init
    torch._C._cuda_init()
RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 802: system not yet initialized

Thank You

ciera.fowler · July 17, 2024, 3:01pm

I’m having the same issue, very similar setup:
VM OS: Ubuntu 22.4
NVIDIA-SMI 550.90.07
Driver Version: 550.90.07
CUDA Version: 12.4
GPU: 1 H100 SXM

>>> import torch
>>> torch.cuda.is_available()
/home/ubuntu/venv/lib/python3.10/site-packages/torch/cuda/__init__.py:118: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 802: system not yet initialized (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
  return torch._C._cuda_getDeviceCount() > 0
False

I’ve seen others say to install fabric manager but that should not be the case with 1 GPU. I’ve tried restarting and that did not solve it either.

Any help would be very appreciated!

yu2.wang · November 9, 2024, 9:32am

Have you fixed this issue?

If fixed, how to do that?

linnansuda · December 19, 2024, 6:18am

Hi, have you solved yet? what’s the root cause?

wiresurfer · December 28, 2024, 5:15am

Any updates? I’m having the same issue.

zhubx · June 19, 2025, 9:26am

Install NVIDIA driver and related nvidia-fabricmanager package in the physical host(which has GPUs). It works for me.

liang.125 · January 12, 2026, 2:50am

I have resolved this issue. My SXM H20 machine is equipped with 8 graphics cards, 7 of which are passed through to the QEMU/KVM virtual machine.

Install the graphics card driver and the corresponding version of Fabric Manager on the host machine and ensure that Fabric Manager runs properly. After installing the driver for the graphics cards inside the virtual machine, you can use CUDA without any issues.

It should be noted that when using PyTorch, there is still a chance of encountering the error Error 802: system not yet initialized. You only need to add the line torch.cuda.empty_cache() right after importing torch in the Python code, and PyTorch will then work normally.

Alternatively, you can also pass NVLink through to the virtual machine and install Fabric Manager inside it, which will also ensure smooth operation. ——a Chinese engineer

Topic		Replies	Views
Nvidia fabric manger initializing CUDA H100 Drivers - Linux, Windows, MacOS cuda , nvbugs , python	1	635	July 4, 2024
Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Erro CUDA Setup and Installation cuda	9	19730	March 25, 2024
Error with B200 cuda setup with torch.cuda cannot load CUDA Setup and Installation	1	397	July 16, 2025
CUDA initialization error on 8x A100 GPU HGX server CUDA Setup and Installation	7	7581	November 4, 2023
System Not Initialized (ReturnCodes 802 and 83) CUDA Setup and Installation	6	7697	January 22, 2022
GH100 deviceQuery got cudaGetDeviceCount returned 802 CUDA Setup and Installation	1	818	March 4, 2024
Torch crashes driver on H100 CUDA Setup and Installation kernel	1	271	June 27, 2025
CUDA initialization failure with error Error 802: system not yet initialized GPU - Hardware tensorrt , cuda , pytorch	9	2395	November 11, 2025
CUDA device not initialized error on all calls, HGX A100, Centos 7 Linux cuda	9	4964	December 6, 2021
Cuda 12.4 Driver Version: 565.57.0 CUDA Setup and Installation	1	717	December 19, 2024

Error running cuda on VM with GPU passthrough. cuda.get_device_name() returns 802, not initialized

Related topics