Cuda in KVM using VFIO device passthrough

I have a system with 4 Tesla V100-SXM-16GB GPUs in it, and I am attempting to pass these devices through to virtual machines run by KVM. I am managing the VMs with OpenNebula and I have followed the instructions at https://docs.opennebula.org/5.4/deployment/open_cloud_host_setup/pci_passthrough.html to pass the device through to my VM. I am able to see the device in nvidia-smi, watch its power/temperature levels, change the persistence mode and compute mode, etc.

I can query the device to get properties and capabilities, but when I try to run a program on it that utilizes the device (beyond querying), I receive an error message about the device being unavailable.
To test, I am using simpleAtopmicIntrinsics out of the CUDA Samples. Here is the output I receive:

SimpleAtomicIntrinsics starting...
GPU Device 0: "Tesla V100-SXM2-16GB": with compute capability 7.0

> GPU device has 80 Multi-Processors, SM 7.0 compute capabilities

Cuda error at simpleAtomicIntrinsics.cu:108 code=46(cudaErrorDevicesUnavailable) "cudaMalloc((void **) &dOData, memsize)"

I have tried this with multiple devices (in case there was an issue with vfio on the first device) and had the same result on each of them.

The host OS is CentOS 7.4.1708. I upgraded the kernel to 4.15.15-1 from the elrepo to ensure that I had support for vfio_virqfd.
I am running the NVIDIA 390.15 driver and using cuda 9.1 (cuda-9-1-9.1.85-1.x86_64 rpm).

Does anyone have ideas on what could be causing this or what I could try next?

Thank you for your help and ideas,
Andy