Cuda in KVM using VFIO device passthrough

atz1 · May 1, 2018, 12:21am

I have a system with 4 Tesla V100-SXM-16GB GPUs in it, and I am attempting to pass these devices through to virtual machines run by KVM. I am managing the VMs with OpenNebula and I have followed the instructions at https://docs.opennebula.org/5.4/deployment/open_cloud_host_setup/pci_passthrough.html to pass the device through to my VM. I am able to see the device in nvidia-smi, watch its power/temperature levels, change the persistence mode and compute mode, etc.

I can query the device to get properties and capabilities, but when I try to run a program on it that utilizes the device (beyond querying), I receive an error message about the device being unavailable.
To test, I am using simpleAtopmicIntrinsics out of the CUDA Samples. Here is the output I receive:

SimpleAtomicIntrinsics starting...
GPU Device 0: "Tesla V100-SXM2-16GB": with compute capability 7.0

> GPU device has 80 Multi-Processors, SM 7.0 compute capabilities

Cuda error at simpleAtomicIntrinsics.cu:108 code=46(cudaErrorDevicesUnavailable) "cudaMalloc((void **) &dOData, memsize)"

I have tried this with multiple devices (in case there was an issue with vfio on the first device) and had the same result on each of them.

The host OS is CentOS 7.4.1708. I upgraded the kernel to 4.15.15-1 from the elrepo to ensure that I had support for vfio_virqfd.
I am running the NVIDIA 390.15 driver and using cuda 9.1 (cuda-9-1-9.1.85-1.x86_64 rpm).

Does anyone have ideas on what could be causing this or what I could try next?

Thank you for your help and ideas,
Andy

m4oc · April 2, 2021, 2:05pm

I have the exact same problem, I am attempting to pass these devices through to virtual machines run by KVM.

Drivers work fine in guests, but when I run CUDA Sample there is an error:

CUDA error at bandwidthTest.cu:686 code=46(cudaErrorDevicesUnavailable) “cudaEventCreate(&start)”

nvidia-smi (inside guest)

±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
±----------------------------------------------------------------------------+

./bandwidthTest
[CUDA Bandwidth Test] - Starting…
Running on…

Device 0: Quadro RTX 6000
Quick Mode

CUDA error at bandwidthTest.cu:686 code=46(cudaErrorDevicesUnavailable) “cudaEventCreate(&start)”
[Exit 1]

Host: CentOS Linux release 7.9.2009 - 5.4.109-1.el7.elrepo.x86_64
Guest: CentOS 8 and Ubuntu 20.04

Topic		Replies	Views
ENOMEM when running CUDA sample on host GPU where another GPU is passed through via IOMMU/vfio-pci Linux	1	864	May 19, 2019
Error running cuda on VM with GPU passthrough. cuda.get_device_name() returns 802, not initialized CUDA Setup and Installation	6	1099	January 12, 2026
could not install nvidia(k620) driver in a guest linux vm CUDA Setup and Installation	2	1593	February 2, 2018
Host System Fails to Detect CUDA Device After GPU Passthrough Unbind, but nvidia-smi Works Linux	0	59	October 18, 2025
PCI passthrough KVM for CUDA usage CUDA Setup and Installation	6	6796	April 5, 2016
Unable to determine the device handle for GPU 0000:00:09.0 CUDA Setup and Installation	5	5023	December 22, 2017
Could not install Nvidia driver and CUDA 7.5 on Openstack VM (with Tesla K80/k40 passthrough) CUDA Setup and Installation	0	1575	August 30, 2016
cudaMalloc failed when GPU passthrough by qemu General Discussion	0	673	October 31, 2022
Could not install Nvidia driver and CUDA 7.5 on Openstack VM (with Tesla K80/k40 passthrough) Linux	0	1722	August 30, 2016
GPU device assignment to KVM guest — VFIO on host OK, QEMU fails (error getting device from group: Invalid argument) Linux kernel , gpu , kvm	1	60	April 19, 2026

Cuda in KVM using VFIO device passthrough

Related topics