NVIDIA-SMI has failed on Remote Server

Hello,

I have a fresh server on-site which I can remotely connect to via ssh…

I have two A16 in the server. If I run lspci -v | grep -i nvidia I get:

2a:00.0 3D controller: NVIDIA Corporation GA107GL [A2 / A16] (rev a1)
        Subsystem: NVIDIA Corporation Device 14a9
        Kernel modules: nvidiafb, nouveau
2b:00.0 3D controller: NVIDIA Corporation GA107GL [A2 / A16] (rev a1)
        Subsystem: NVIDIA Corporation Device 14a9
        Kernel modules: nvidiafb, nouveau
2c:00.0 3D controller: NVIDIA Corporation GA107GL [A2 / A16] (rev a1)
        Subsystem: NVIDIA Corporation Device 14a9
        Kernel modules: nvidiafb, nouveau
2d:00.0 3D controller: NVIDIA Corporation GA107GL [A2 / A16] (rev a1)
        Subsystem: NVIDIA Corporation Device 14a9
        Kernel modules: nvidiafb, nouveau
b8:00.0 3D controller: NVIDIA Corporation GA107GL [A2 / A16] (rev a1)
        Subsystem: NVIDIA Corporation Device 14a9
        Kernel modules: nvidiafb, nouveau
b9:00.0 3D controller: NVIDIA Corporation GA107GL [A2 / A16] (rev a1)
        Subsystem: NVIDIA Corporation Device 14a9
        Kernel modules: nvidiafb, nouveau
ba:00.0 3D controller: NVIDIA Corporation GA107GL [A2 / A16] (rev a1)
        Subsystem: NVIDIA Corporation Device 14a9
        Kernel modules: nvidiafb, nouveau
bb:00.0 3D controller: NVIDIA Corporation GA107GL [A2 / A16] (rev a1)
        Subsystem: NVIDIA Corporation Device 14a9
        Kernel modules: nvidiafb, nouveau

Running sudo mokutil --sb-state​ it told me that it is off…

Then, what I did was:

  1. sudo ubuntu-drivers install --gpgpu nvidia:535-server
  2. sudo apt install nvidia-utils-535-server

reboot…

nvidia-smi → Error: NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running

  1. sudo apt install --reinstall linux-headers-$(uname -r)

reboot…

nvidia-smi → Error: NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running

Attached is the bug report…
nvidia-bug-report.log.gz (234.9 KB)

I could not run nvidia-settings command as it was not installed…

Could you please help what I need to do?
Attached is also the bug report.

Solved… do not use basic drivers from ubuntu… use NVIDIA ones… datacenter-driver Downloads | NVIDIA Developer

1 Like

Just for reference, Data Center driver documentation: