rho.pi
June 10, 2021, 6:36pm
1
Hi guys.
I am to install a T4 on a Ubuntu 20.04 server which has no monitor.
Thereby, I run into problems.
When I type
find /usr/lib/modules -name nvidia.ko -exec modinfo {} \ ;
into the console, I get informed, that I successfully installed version 460.84.
When I write
nvidia-smi
I get returned that
NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running
From what I understand, this means, that the driver has been installed, yet it is not running.
When I type
nvcc -V
I get informed, that I have Cuda compilation tools, release 10.1, V10.1.143.
From what I understand, I somehow have to start the driver. How do I do this?
Edit: I did not realize, that i have to write “\” with double “\”.
1 Like
rho.pi
June 14, 2021, 2:49pm
2
Hi, meanwhile, I followed the Linux Installation Guide to no avail.
I began with a cleanup as in the end of the installation guide .
sudo apt-get --purge remove “cublas ” “cufft ” “curand ” “cusolver ” “cusparse ” “npp ” “nvjpeg ” “cuda*” “nsight*”
sudo apt-get --purge remove “nvidia ”
sudo apt-get autoremove
sudo reboot
Following the reboot, I follow the install instrcutions from chapter 2 onward:
find out, what GPU I got
lspci | grep -i nvidia
find out, what Linux version I am running
uname -m && cat /etc/*release
find out my gcc version
gcc --version
install the kernel headers and install packages for my version of Linux
sudo apt-get install linux-headers-$(uname -r)
get the .pin-file
move that file
sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
get the installer
run the installer
sudo dpkg -i cuda-repo-ubuntu2004-11-3-local_11.3.1-465.19.01-1_amd64.deb
install the gpg key
sudo apt-key add /var/cuda-repo-ubuntu2004-11-3-local/7fa2af80.pub
update
sudo apt-get update
install
sudo apt-get -y install cuda
during the install, there was a warning: “the home dir /nonexistent you specified can’t be accessed, no such file or directory”.
update PATH
export PATH=/usr/local/cuda-11.3/bin${PATH:+:${PATH}}
reboot
sudo reboot
verify installation
cat /proc/driver/nvidia/version
This returns "NVRM version: NVIDIA UNIX x86_64 Kernel Module 465.19.01 Fri Mar 19 07:44:41 UTC 2021
GCC version: gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)
"
install nvcc
sudo apt install nvidia-cuda-toolkit
reboot
sudo reboot
Following the installation guide , I attempt to compile the samples:
cd /NVIDIA_CUDA-11.3_Samples
here, I get the error message, that this directory does not exist.
nvidia-smi
fails “NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.”
I would be delighted, if you could help me.
rho.pi
June 15, 2021, 11:13am
3
Hi there.
In the attachment, you can find my bug-report generated using
sudo nvidia-bug-report.sh
nvidia-bug-report.log.gz (14.5 MB)