Problem starting Cuda Driver on Ubuntu 20.04

rho.pi · June 10, 2021, 6:36pm

Hi guys.

I am to install a T4 on a Ubuntu 20.04 server which has no monitor.

Thereby, I run into problems.

When I type

find /usr/lib/modules -name nvidia.ko -exec modinfo {} \ ;

into the console, I get informed, that I successfully installed version 460.84.

When I write

nvidia-smi

I get returned that

NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running

From what I understand, this means, that the driver has been installed, yet it is not running.
When I type

nvcc -V

I get informed, that I have Cuda compilation tools, release 10.1, V10.1.143.

From what I understand, I somehow have to start the driver. How do I do this?

Edit: I did not realize, that i have to write “\” with double “\”.

rho.pi · June 14, 2021, 2:49pm

Hi, meanwhile, I followed the Linux Installation Guide to no avail.

I began with a cleanup as in the end of the installation guide.

sudo apt-get --purge remove “cublas” “cufft” “curand” “cusolver” “cusparse” “npp” “nvjpeg” “cuda” “nsight”

sudo apt-get --purge remove “nvidia”

sudo apt-get autoremove

sudo reboot

Following the reboot, I follow the install instrcutions from chapter 2 onward:

find out, what GPU I got

lspci | grep -i nvidia

find out, what Linux version I am running

uname -m && cat /etc/*release

find out my gcc version

gcc --version

install the kernel headers and install packages for my version of Linux

sudo apt-get install linux-headers-$(uname -r)

get the .pin-file

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin

move that file

sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600

get the installer

wget https://developer.download.nvidia.com/compute/cuda/11.3.1/local_installers/cuda-repo-ubuntu2004-11-3-local_11.3.1-465.19.01-1_amd64.deb

run the installer

sudo dpkg -i cuda-repo-ubuntu2004-11-3-local_11.3.1-465.19.01-1_amd64.deb

install the gpg key

sudo apt-key add /var/cuda-repo-ubuntu2004-11-3-local/7fa2af80.pub

update

sudo apt-get update

install

sudo apt-get -y install cuda

during the install, there was a warning: “the home dir /nonexistent you specified can’t be accessed, no such file or directory”.

update PATH

export PATH=/usr/local/cuda-11.3/bin${PATH:+:${PATH}}

reboot

sudo reboot

verify installation

cat /proc/driver/nvidia/version

This returns "NVRM version: NVIDIA UNIX x86_64 Kernel Module 465.19.01 Fri Mar 19 07:44:41 UTC 2021
GCC version: gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)
"

install nvcc

sudo apt install nvidia-cuda-toolkit

reboot

sudo reboot

Following the installation guide, I attempt to compile the samples:

cd /NVIDIA_CUDA-11.3_Samples

here, I get the error message, that this directory does not exist.

nvidia-smi

fails “NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.”

I would be delighted, if you could help me.

rho.pi · June 15, 2021, 11:13am

Hi there.

In the attachment, you can find my bug-report generated using

sudo nvidia-bug-report.sh

nvidia-bug-report.log.gz (14.5 MB)

Topic		Replies	Views
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver Linux	4	742	October 12, 2021
Ubuntu 20.04 - NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver Linux	0	670	December 28, 2022
Problem Installing Drivers on Ubuntu 20.04 using: nvidia-driver-455, on Lenovo T490 with MX250 dGPU Linux ubuntu	5	4925	October 12, 2021
NVIDIA-SMI no longer works and fresh nvidia-driver installs fail CUDA Setup and Installation cuda , ubuntu	1	1761	January 16, 2024
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver Linux cuda , ubuntu	4	2019	May 4, 2021
Cuda Installation on Ubuntu 18.04 Failing CUDA Setup and Installation	8	2789	March 26, 2020
Nvidia-smi prints "Failed to initialize NVML: Driver/library version mismatch" (Ubuntu 20.04.5 LTS) Linux cuda , ubuntu , nvidia-smi	0	1110	March 5, 2023
Nvidia-smi cant communicate with the driver Linux	1	438	March 13, 2023
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver Linux	9	8126	October 14, 2021
Ubuntu 18.04 Quadro P2000 "NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver" Linux	6	2956	May 4, 2019

Problem starting Cuda Driver on Ubuntu 20.04

sudo apt-get --purge remove “cublas” “cufft” “curand” “cusolver” “cusparse” “npp” “nvjpeg” “cuda*” “nsight*”

sudo apt-get --purge remove “nvidia”

sudo apt-get autoremove

sudo reboot

lspci | grep -i nvidia

uname -m && cat /etc/*release

gcc --version

sudo apt-get install linux-headers-$(uname -r)

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin

sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600

wget https://developer.download.nvidia.com/compute/cuda/11.3.1/local_installers/cuda-repo-ubuntu2004-11-3-local_11.3.1-465.19.01-1_amd64.deb

sudo dpkg -i cuda-repo-ubuntu2004-11-3-local_11.3.1-465.19.01-1_amd64.deb

sudo apt-key add /var/cuda-repo-ubuntu2004-11-3-local/7fa2af80.pub

sudo apt-get update

sudo apt-get -y install cuda

export PATH=/usr/local/cuda-11.3/bin${PATH:+:${PATH}}

sudo reboot

cat /proc/driver/nvidia/version

sudo apt install nvidia-cuda-toolkit

sudo reboot

cd /NVIDIA_CUDA-11.3_Samples

nvidia-smi

sudo nvidia-bug-report.sh

Related topics

sudo apt-get --purge remove “cublas” “cufft” “curand” “cusolver” “cusparse” “npp” “nvjpeg” “cuda” “nsight”