[Ubuntu 18.04] NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

When I run command nvidia-smi on my laptop, it shows: NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running. I have tried some methods, but still couldn’t work.

Here are some information in my nvidia-bug-report.log which get from command nvidia-bug-report.sh.

uname: Linux xxxx 5.0.0-32-generic #34~18.04.2-Ubuntu SMP Thu Oct 10 10:36:02 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

command line flags:


*** /etc/issue
*** ls: -rw-r–r-- 1 root root 26 2019-08-05 18:43:50.000000000 +0800 /etc/issue
Ubuntu 18.04.3 LTS \n \l


*** /etc/debian_version
*** ls: -rw-r–r-- 1 root root 11 2017-06-26 06:18:00.000000000 +0800 /etc/debian_version
buster/sid

ake[1]: Entering directory ‘/usr/src/linux-headers-5.0.0-32-generic’
test -e include/generated/autoconf.h -a -e include/config/auto.conf || (
echo >&2;
echo >&2 " ERROR: Kernel configuration is invalid.";
echo >&2 " include/generated/autoconf.h or include/config/auto.conf are missing.";
echo >&2 " Run ‘make oldconfig && make prepare’ on kernel src to fix it.";
echo >&2 ;
/bin/false)
mkdir -p /var/lib/dkms/nvidia/418.87.00/build/.tmp_versions ; rm -f /var/lib/dkms/nvidia/418.87.00/build/.tmp_versions/*
make -f ./scripts/Makefile.build obj=/var/lib/dkms/nvidia/418.87.00/build
ln -sf /var/lib/dkms/nvidia/418.87.00/build/nvidia/nv-kernel.o_binary /var/lib/dkms/nvidia/418.87.00/build/nvidia/nv-kernel.o
ln -sf /var/lib/dkms/nvidia/418.87.00/build/nvidia-modeset/nv-modeset-kernel.o_binary /var/lib/dkms/nvidia/418.87.00/build/nvidia-modeset/nv-modeset-kernel.o
(cat /dev/null; echo kernel//var/lib/dkms/nvidia/418.87.00/build/nvidia.ko; echo kernel//var/lib/dkms/nvidia/418.87.00/build/nvidia-uvm.ko; echo kernel//var/lib/dkms/nvidia/418.87.00/build/nvidia-modeset.ko; echo kernel//var/lib/dkms/nvidia/418.87.00/build/nvidia-drm.ko;) > /var/lib/dkms/nvidia/418.87.00/build/modules.order

Nov 10 08:59:24 andy-Aspire-E1-471G kernel: [ 530.650535] NVRM: No NVIDIA graphics adapter found!
Nov 10 08:59:24 andy-Aspire-E1-471G kernel: [ 530.669272] nvidia-nvlink: Unregistered the Nvlink Core, major device number 236
Nov 10 08:59:24 andy-Aspire-E1-471G kernel: [ 530.752715] nvidia-nvlink: Nvlink Core is being initialized, major device number 236
Nov 10 08:59:24 andy-Aspire-E1-471G kernel: [ 530.753106] NVRM: The NVIDIA GeForce GT 630M GPU installed in this system is
Nov 10 08:59:24 andy-Aspire-E1-471G kernel: [ 530.753106] NVRM: supported through the NVIDIA 390.xx Legacy drivers. Please
Nov 10 08:59:24 andy-Aspire-E1-471G kernel: [ 530.753106] NVRM: visit http://www.nvidia.com/object/unix.html for more
Nov 10 08:59:24 andy-Aspire-E1-471G kernel: [ 530.753106] NVRM: information. The 418.87.00 NVIDIA driver will ignore
Nov 10 08:59:24 andy-Aspire-E1-471G kernel: [ 530.753106] NVRM: this GPU. Continuing probe…

1月 11 20:07:33 andy-Aspire-E1-471G kernel: NVRM: No NVIDIA graphics adapter found!
11月 11 20:07:33 andy-Aspire-E1-471G kernel: nvidia-nvlink: Unregistered the Nvlink Core, major device number 237
11月 11 20:07:33 andy-Aspire-E1-471G systemd-udevd[499]: Process ‘/sbin/modprobe nvidia-uvm’ failed with exit code 1.
11月 11 20:07:33 andy-Aspire-E1-471G kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 237
11月 11 20:07:33 andy-Aspire-E1-471G kernel: NVRM: The NVIDIA GeForce GT 630M GPU installed in this system is
NVRM: supported through the NVIDIA 390.xx Legacy drivers. Please
NVRM: visit http://www.nvidia.com/object/unix.html for more
NVRM: information. The 418.87.00 NVIDIA driver will ignore
NVRM: this GPU. Continuing probe…

11月 11 20:07:42 andy-Aspire-E1-471G kernel: NVRM: No NVIDIA graphics adapter found!
11月 11 20:07:42 andy-Aspire-E1-471G systemd[1]: nvidia-persistenced.service: Service hold-off time over, scheduling restart.
11月 11 20:07:42 andy-Aspire-E1-471G systemd[1]: nvidia-persistenced.service: Scheduled restart job, restart counter is at 1.
11月 11 20:07:42 andy-Aspire-E1-471G nvidia-persistenced[5431]: Verbose syslog connection opened
11月 11 20:07:42 andy-Aspire-E1-471G nvidia-persistenced[5431]: Started (5431)
11月 11 20:07:42 andy-Aspire-E1-471G kernel: nvidia-nvlink: Unregistered the Nvlink Core, major device number 237
11月 11 20:07:42 andy-Aspire-E1-471G nvidia-persistenced[5431]: Failed to query NVIDIA devices. Please ensure that the NVIDIA device files (/dev/nvidia*) exist, and that user 0 has read and write permissions for those files.
11月 11 20:07:42 andy-Aspire-E1-471G nvidia-persistenced[5431]: PID file unlocked.
11月 11 20:07:42 andy-Aspire-E1-471G nvidia-persistenced[5430]: nvidia-persistenced failed to initialize. Check syslog for more details.
11月 11 20:07:42 andy-Aspire-E1-471G nvidia-persistenced[5431]: PID file closed.
11月 11 20:07:42 andy-Aspire-E1-471G systemd[1]: nvidia-persistenced.service: Control process exited, code=exited status=1
11月 11 20:07:42 andy-Aspire-E1-471G nvidia-persistenced[5431]: Shutdown (5431)
11月 11 20:07:42 andy-Aspire-E1-471G systemd[1]: nvidia-persistenced.service: Failed with result ‘exit-code’.
11月 11 20:07:42 andy-Aspire-E1-471G systemd-udevd[499]: Process ‘/usr/bin/nvidia-smi’ failed with exit code 9.
11月 11 20:07:42 andy-Aspire-E1-471G nvidia-persistenced[5442]: Verbose syslog connection opened
11月 11 20:07:42 andy-Aspire-E1-471G nvidia-persistenced[5442]: Started (5442)

*** /proc/asound/card1/codec#3
*** ls: -r–r--r-- 1 root root 0 2019-11-11 20:57:11.310962256 +0800 /proc/asound/card1/codec#3
Codec: Nvidia GPU 14 HDMI/DP
Address: 3
AFG Function Id: 0x1 (unsol 0)
Vendor Id: 0x10de0014
Subsystem Id: 0x10de0101
Revision Id: 0x100100
No Modem Function Group found
Default PCM:
rates [0x0]:
bits [0x0]:
formats [0x0]:
Default Amp-In caps: N/A
Default Amp-Out caps: N/A
State of AFG node 0x01:
Power states: D0 D1 D2 D3
Power: setting=D0, actual=D0
GPIO: io=0, o=0, i=0, unsolicited=0, wake=0
Node 0x04 [Audio Output] wcaps 0x72b1: 8-Channels Digital Stripe CP
Converter: stream=0, channel=0
Digital: Enabled
Digital category: 0x0
IEC Coding Type: 0x0
PCM:
rates [0x7f0]: 32000 44100 48000 88200 96000 176400 192000
bits [0xe]: 16 20 24
formats [0x5]: PCM AC3
Unsolicited: tag=00, enabled=0
Node 0x05 [Pin Complex] wcaps 0x407381: 8-Channels Digital CP
Control: name=“IEC958 Playback Con Mask”, index=3, device=0
Control: name=“IEC958 Playback Pro Mask”, index=3, device=0
Control: name=“IEC958 Playback Default”, index=3, device=0
Control: name=“IEC958 Playback Switch”, index=3, device=0
Pincap 0x09000094: OUT Detect HBR HDMI DP
Pin Default 0x18560010: [Jack] Digital Out at Int HDMI
Conn = Digital, Color = Unknown
DefAssociation = 0x1, Sequence = 0x0
Pin-ctls: 0x00:
Unsolicited: tag=01, enabled=1
Connection: 1
0x04

/usr/bin/nvidia-debugdump -D

Error: nvmlInit(): Driver Not Loaded

You have a Fermi device which is only supported by the 390 legacy driver. Uninstall/purge the current nvidia driver, then install nvidia-driver-390 from Software&Updates application.

Thanks @generix
Which version of cuda that I should to use for 390 legacy driver?

Fermi is only supported up to cuda 8.