Cuda in KVM using VFIO device passthrough

I have a system with 4 Tesla V100-SXM-16GB GPUs in it, and I am attempting to pass these devices through to virtual machines run by KVM. I am managing the VMs with OpenNebula and I have followed the instructions at https://docs.opennebula.org/5.4/deployment/open_cloud_host_setup/pci_passthrough.html to pass the device through to my VM. I am able to see the device in nvidia-smi, watch its power/temperature levels, change the persistence mode and compute mode, etc.

I can query the device to get properties and capabilities, but when I try to run a program on it that utilizes the device (beyond querying), I receive an error message about the device being unavailable.
To test, I am using simpleAtopmicIntrinsics out of the CUDA Samples. Here is the output I receive:

SimpleAtomicIntrinsics starting...
GPU Device 0: "Tesla V100-SXM2-16GB": with compute capability 7.0

> GPU device has 80 Multi-Processors, SM 7.0 compute capabilities

Cuda error at simpleAtomicIntrinsics.cu:108 code=46(cudaErrorDevicesUnavailable) "cudaMalloc((void **) &dOData, memsize)"

I have tried this with multiple devices (in case there was an issue with vfio on the first device) and had the same result on each of them.

The host OS is CentOS 7.4.1708. I upgraded the kernel to 4.15.15-1 from the elrepo to ensure that I had support for vfio_virqfd.
I am running the NVIDIA 390.15 driver and using cuda 9.1 (cuda-9-1-9.1.85-1.x86_64 rpm).

Does anyone have ideas on what could be causing this or what I could try next?

Thank you for your help and ideas,
Andy

I have the exact same problem, I am attempting to pass these devices through to virtual machines run by KVM.

Drivers work fine in guests, but when I run CUDA Sample there is an error:

CUDA error at bandwidthTest.cu:686 code=46(cudaErrorDevicesUnavailable) “cudaEventCreate(&start)”

nvidia-smi (inside guest)

±----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Quadro RTX 6000 Off | 00000000:05:00.0 Off | Off |
| 34% 47C P0 42W / 260W | 0MiB / 24220MiB | 0% Default |
| | | N/A |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
±----------------------------------------------------------------------------+

./bandwidthTest
[CUDA Bandwidth Test] - Starting…
Running on…

Device 0: Quadro RTX 6000
Quick Mode

CUDA error at bandwidthTest.cu:686 code=46(cudaErrorDevicesUnavailable) “cudaEventCreate(&start)”
[Exit 1]

Host: CentOS Linux release 7.9.2009 - 5.4.109-1.el7.elrepo.x86_64
Guest: CentOS 8 and Ubuntu 20.04