Nvidia Driver Not Recognized in VMware VM on a vGPU(Tesla T4, Ubuntu 20.04)
Hi everyone,
I’m having trouble getting my Nvidia drivers working on a Ubuntu 20.04 VM running on VMware with a vGPU (Tesla T4).
I installed the latest drivers from the official Nvidia website (CUDA Toolkit 12.5 Downloads | NVIDIA Developer).
The installation seems successful, but nvidia-smi fails with the message:
“NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.”
Looking at the system information, the Nvidia GPU (Tesla T4) shows as “UNCLAIMED,” while the VMware SVGA controller is active.
*-display
description: VGA compatible controller
product: SVGA II Adapter
vendor: VMware
physical id: f
bus info: pci@0000:00:0f.0
version: 00
width: 32 bits
clock: 33MHz
capabilities: vga_controller bus_master cap_list rom
configuration: driver=vmwgfx latency=64
resources: irq:16 ioport:1070(size=16) memory:e8000000-efffffff memory:fe000000-fe7fffff memory:c0000-dffff
*-display UNCLAIMED
description: VGA compatible controller
product: TU104GL [Tesla T4]
vendor: NVIDIA Corporation
physical id: 1
bus info: pci@0000:02:01.0
version: a1
width: 64 bits
clock: 66MHz
capabilities: msi vga_controller cap_list
configuration: latency=0
resources: memory:fc000000-fcffffff memory:d0000000-dfffffff memory:fa000000-fbffffff
I’ve tried installing various driver versions listed by ubuntu-drivers devices, including the latest recommended one (nvidia-driver-550).
ubuntu-drivers devices
== /sys/devices/pci0000:00/0000:00:0f.0 ==
modalias : pci:v000015ADd00000405sv000015ADsd00000405bc03sc00i00
vendor : VMware
model : SVGA II Adapter
driver : open-vm-tools-desktop - distro free
== /sys/devices/pci0000:00/0000:00:11.0/0000:02:01.0 ==
modalias : pci:v000010DEd00001EB8sv000010DEsd00001310bc03sc00i00
vendor : NVIDIA Corporation
model : TU104GL [Tesla T4]
driver : nvidia-driver-470 - distro non-free
driver : nvidia-driver-450-server - distro non-free
driver : nvidia-driver-470-server - distro non-free
driver : nvidia-driver-418-server - distro non-free
driver : nvidia-driver-550 - third-party non-free recommended
driver : nvidia-driver-535 - distro non-free
driver : nvidia-driver-535-server - distro non-free
driver : xserver-xorg-video-nouveau - distro free builtin
Between driver installations, I’ve thoroughly removed all existing Nvidia packages using the following commands:
sudo apt-get --purge remove "*cublas*" "cuda*" "nsight*"
sudo apt-get remove --purge '^nvidia-.*'
sudo apt-get remove --purge '^libnvidia-.*'
sudo apt-get remove --purge '^cuda-.*'
sudo apt-get --purge remove "*cuda*" "*cublas*" "*cufft*" "*cufile*" "*curand*" \
"*cusolver*" "*cusparse*" "*gds-tools*" "*npp*" "*nvjpeg*" "nsight*" "*nvvm*"
and also
dpkg -l | grep -i nvidia
and deleting all the packages that appear.
Additionally, I’ve tried:
Reinstalling kernel headers
(sudo apt-get install linux-headers-$(uname -r))
and also upgrading GCC to version 10
and installing the drivers again but nothing works
System Information:
OS: Ubuntu 20.04
VM software: VMware (hw version 19)
RAM: 64GB
vCPU: 8
GPU: Nvidia Tesla T4 (vGPU)
Additional Notes:
Despite these efforts, the Nvidia driver remains unclaimed.
I would appreciate any help getting the Nvidia drivers working correctly on my VM.