Nvidia-smi not working - VMWare ESXI Ubuntu Server 20.04.04 with Tesla V100

I have a VM running with Ubuntu Server 20.04.04
installed driver version 515.86.01
nvidia-smi output

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

sudo lshw -c display

       description: VGA compatible controller
       product: SVGA II Adapter
       vendor: VMware
       physical id: f
       bus info: pci@0000:00:0f.0
       version: 00
       width: 32 bits
       clock: 33MHz
       capabilities: vga_controller bus_master cap_list rom
       configuration: driver=vmwgfx latency=64
       resources: irq:16 ioport:1070(size=16) memory:e8000000-efffffff memory:fe000000-fe7fffff memory:c0000-dffff
  *-display UNCLAIMED
       description: 3D controller
       product: GV100GL [Tesla V100 PCIe 32GB]
       vendor: NVIDIA Corporation
       physical id: 0
       bus info: pci@0000:0b:00.0
       version: a1
       width: 64 bits
       clock: 33MHz
       capabilities: pm msi pciexpress cap_list
       configuration: latency=64
       resources: memory:fc000000-fcffffff

sudo dkms status

nvidia-srv, 515.86.01, 5.4.0-135-generic, x86_64: installed

sudo modprobe nvidia -vv

modprobe: INFO: ../libkmod/libkmod.c:365 kmod_set_log_fn() custom logging function 0x5586ee9f1b90 registered
insmod /lib/modules/5.4.0-135-generic/updates/dkms/nvidia.ko
modprobe: INFO: ../libkmod/libkmod-module.c:892 kmod_module_insert_module() Failed to insert module '/lib/modules/5.4.0-135-generic/updates/dkms/nvidia.ko': No such device
modprobe: ERROR: could not insert 'nvidia': No such device
modprobe: INFO: ../libkmod/libkmod.c:332 kmod_unref() context 0x5586f086e440 released

found in nvidia bug report (attached)
nvidia-bug-report.log.gz (54.7 MB)

[  481.256869] nvidia: probe of 0000:0b:00.0 failed with error -1
[  481.256886] NVRM: The NVIDIA probe routine failed for 1 device(s).
[  481.256886] NVRM: None of the NVIDIA devices were initialized.
[  481.257103] nvidia-nvlink: Unregistered Nvlink Core, major device number 239
[  481.394419] nvidia-nvlink: Nvlink Core is being initialized, major device number 239
[  481.396377] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
               NVRM: BAR1 is 0M @ 0x0 (PCI:0000:0b:00.0)

any suggestions, please?

Incorrectly set up VM. Delete, recreate, set EFI instead of BIOS Boot, then install ubuntu.

1 Like

Thanks that works

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.