I have a GPU ‘HPE NVIDIA Tesla V100 PCIe 32GB Computational Accelerator’ installed in a server. And I created a virtual machine of Ubuntu18.04 with ‘VMware ESXi 6.7’ and assigned an GPU to this VM. But I failed to install GPU drivers on this VM.
Below are all the approaches I tried.
- Download GPU driver (418.87) for Tesla V100 from Nvidia website, which is a .deb file. Installation of driver is OK, but when I use nvidia-smi, an error msg of ‘NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.’ comes out
- Download GPU driver (418.43) from HPE website, which is a .run file. Installation fails with msg ‘ERROR: unable to load the ‘nvidia-drm’ kernel module.’
- Install a GRID GPU driver: NVIDIA-Linux-x86_64-430.46-grid.run. Installation is OK. nvidia-smi also can work, but some information of GPU such as Fan, Temp or power usage, are N/A. And I tried to run a CNN model, it failed with msg ‘check failed: error == cudaSuccess (46 vs. 0) all CUDA-capable devices are busy or unavailable’.
Could you please help me with this issue?