I recently discovered nvidia-smi
no longer works on one of my lab’s computers running Ubuntu 20.04 with a RTX 2080 Ti, whereas it used to work fine with nvidia-driver-470 and nvidia-cuda-toolkit installed directly from Ubuntu’s repos. Last time I updated the nvidia driver was 3 months ago.
When I ran nvidia-smi
it produced the following output:
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
I tried uninstalling and reinstalling nvidia stuff like so:
sudo apt remove --autoremove nvidia*
sudo apt install nvidia-driver-470
sudo apt install nvidia-cuda-toolkit
sudo reboot
I tried repeating this process using nvidia-utils-470
instead of nvidia-driver-470
, but it didn’t make a difference.
When this didn’t fix the problem, I uninstalled again and tried installing the nvidia driver and cuda toolkit directly from this link using the runfile:
This gave the output:
Installation failed. See log at /var/log/cuda-installer.log for details.
[INFO]: Driver not installed.
[INFO]: Checking compiler version...
[INFO]: gcc location: /usr/bin/gcc
[INFO]: gcc version: gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)
[INFO]: Initializing menu
[INFO]: Setup complete
[INFO]: Components to install:
[INFO]: Driver
[INFO]: 495.29.05
[INFO]: Executing NVIDIA-Linux-x86_64-495.29.05.run --ui=none --no-questions --accept-license --disable-nouveau --no-cc-version-check --install-libglvnd 2>&1
[INFO]: Finished with code: 256
[ERROR]: Install of driver component failed.
[ERROR]: Install of 495.29.05 failed, quitting
I then uninstalled like so:
sudo rm /etc/apt/sources.list.d/cuda*
sudo apt remove --autoremove nvidia-cuda-toolkit
sudo apt remove --autoremove nvidia-*
sudo apt-get purge nvidia*
sudo apt-get autoremove
sudo apt-get autoclean
sudo rm -rf /usr/local/cuda*
And then tried reinstalling with an older runfile I still had for driver version 470.42.01, which produced the same error.
[INFO]: Driver not installed.
[INFO]: Checking compiler version...
[INFO]: gcc location: /usr/bin/gcc
[INFO]: gcc version: gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)
[INFO]: Initializing menu
[INFO]: Setup complete
[INFO]: Components to install:
[INFO]: Driver
[INFO]: 470.42.01
[INFO]: Executing NVIDIA-Linux-x86_64-470.42.01.run --ui=none --no-questions --accept-license --disable-nouveau --no-cc-version-check --install-libglvnd 2>&1
[INFO]: Finished with code: 256
[ERROR]: Install of driver component failed.
[ERROR]: Install of 470.42.01 failed, quitting
I am unsure how to proceed in getting a working nvidia driver. Any help would be appreciated.
Note that cuda-toolkit seems to work still. When I run nvcc -V
, it produces the following output:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243
Misc information
uname -m && cat /etc/*release
:
x86_64
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=20.04
DISTRIB_CODENAME=focal
DISTRIB_DESCRIPTION="Ubuntu 20.04.3 LTS"
NAME="Ubuntu"
VERSION="20.04.3 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.3 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal
sudo lshw -C display
:
*-display UNCLAIMED
description: VGA compatible controller
product: TU102 [GeForce RTX 2080 Ti Rev. A]
vendor: NVIDIA Corporation
physical id: 0
bus info: pci@0000:01:00.0
version: a1
width: 64 bits
clock: 33MHz
capabilities: pm msi pciexpress vga_controller cap_list
configuration: latency=0
resources: memory:a4000000-a4ffffff memory:90000000-9fffffff memory:a0000000-a1ffffff ioport:3000(size=128) memory:a5000000-a507ffff
*-display
description: VGA compatible controller
product: UHD Graphics 630 (Desktop)
vendor: Intel Corporation
physical id: 2
bus info: pci@0000:00:02.0
version: 00
width: 64 bits
clock: 33MHz
capabilities: pciexpress msi pm vga_controller bus_master cap_list rom
configuration: driver=i915 latency=0
resources: irq:129 memory:a3000000-a3ffffff memory:80000000-8fffffff ioport:4000(size=64) memory:c0000-dffff