Installing the latest NVIDIA drivers, CUDA, and cuDNN in Ubuntu 22.04 LTS

I have been at this for days. I have a Lenovo Legion y520. It has 32GB of RAM, a 1TB ssd, and an NVIDIA GeForce GTX 1050 ti with 4GB VRAM. I am trying to configure my computer for machine learning according to the following program.

I create a fresh install of Ubuntu 22.04.

I select Ubuntu Pro for security and allow the software updater to update the software.

Install Google Chrome –

wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb

sudo dpkg -i google-chrome-stable_current_amd64.deb

sudo apt --fix-broken install

sudo apt update && sudo apt upgrade

Install DEAD SNAKES repository -

sudo apt install software-properties-common

sudo add-apt-repository ppa:deadsnakes/ppa

sudo apt update && sudo apt upgrade

Install PYTHON 3.12.1 -

sudo apt install python3.12

sudo apt update && sudo apt upgrade

Install Git Repository -

sudo add-apt-repository ppa:git-core/ppa

sudo apt update && sudo apt upgrade

Install Git CLI version 2.43.0 -

sudo apt install git

sudo apt update && sudo apt upgrade

sudo git –version

Install Curl 7.81.0 -

sudo apt update && sudo apt upgrade

sudo apt install curl

sudo curl --version

Install Homebrew -

/bin/bash -c “$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)”

(echo; echo ‘eval “$(/home/linuxbrew/.linuxbrew/bin/brew shellenv)”’) >> /home/tsisaris/.bashrc

eval “$(/home/linuxbrew/.linuxbrew/bin/brew shellenv)”

sudo apt-get install build-essential

sudo apt update && sudo apt upgrade

Install DBUS-X11 -

sudo apt-get install dbus-x11

sudo apt update && sudo apt upgrade

At this point everything is working just fine

I then try to install the latest version of NVIDIA drivers for my particular GPU which is now a legacy card.

I have attempted to use the ubuntu drivers tool but it installs the wrong driver.

I go on NVIDIAS website and select the correct driver and download it.

This happens to be NVIDIA-Linux-x86_64-535.146.02.run

I want to update my driver to the latest possible version and install the latest versions of CUDA, CUDNN, and Pytorch that will work with my machine so that I can begin to study and practise machine learning.

In my case it would seem that CUDA 11.8 and CUDNN 8.9.7 are the latest versions that will Work with Pytorch 2.1.1 and on my video card.

This is where the problem comes in.

After following almost every permutation and order of installation process and they all fail to update the driver because nvidia drm is in use?

I finally try this procedure…

Switch to tty3 by pressing Ctl+Alt+F3 -

Unload nvidia-drm before proceeding -

Isolate multi-user.target -
sudo systemctl isolate multi-user.target

Note that nvidia-drm is currently in use -
lsmod | grep nvidia.drm

Unload nvidia-drm -
sudo modprobe -r nvidia-drm

Note that nvidia-drm is not in use anymore -
lsmod | grep nvidia.drm

Install Newest Nvidia GPU Drivers 535.146.02 -
cd ~/Downloads
sudo chmod +x NVIDIA-Linux-x86_64-535.146.02.run
sudo ./NVIDIA-Linux-x86_64-535.146.02.run

I answer all prompts during installation. It still seems like there is come kind of conflict.

I have to input the keyring key

When installation has finished, confirm that the new driver is installed
nvidia-smi

I get that the Driver Version is 535.146.02 and the CUDA version is 12.2? I haven’t even installed CUDA yet…

Start the GUI again -
sudo systemctl start graphical.target

I now want to install CUDA

Switch to tty3 by pressing Ctl+Alt+F3 -

Unload nvidia-drm before proceeding -

Isolate multi-user.target -
sudo systemctl isolate multi-user.target

Note that nvidia-drm is currently in use -
lsmod | grep nvidia.drm

Unload nvidia-drm -
sudo modprobe -r nvidia-drm

Note that nvidia-drm is not in use anymore -
lsmod | grep nvidia.drm

Go to your download folder and run the cuda installation -
sudo dpkg -i cuda-repo-ubuntu2204-11-8-local_11.8.0-520.61.05-1_amd64.deb

Answer any prompts during installation -

When installation has finished, confirm that the CUDA Version has been updated -
nvidia-smi

I start to understand this less and less

when I run the nvidia-smi command I get that the Driver Version is 535.146.02 and the CUDA version is 12.2?

I have to install the NVIDIA CUDA toolkit so that I can run nvss --version -
sudo apt install nvidia-cuda-toolkit

when I run the nvcc --version command I get this garbage…

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:45:30_PST_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0

Where does CUDA 11.5 come from? I installed CUDA 11.8 and in nvidia-smi it says CUDA version 12.2

How can I go through all of this process without a hangup?

I am sure that there is a bunch of broken garbage in my Ubuntu system now. I want a complete step by step method that corrects my code if necessary to finish installing CUDA 11.8 and CUDNN 8.9.7 without a hiccup and without creating a bunch of broken garbage…

What am I missing?

Thank you in advance,

Shawn

Hi Shawn,

Edit: I think you have 11.5 because you installed two Cuda toolkits:

Here you installed 11.8 and then a few steps later:

I think this is where you installed 11.5, presumably installed from some ubuntu repository. So you have both installed unless the 11.5 install removed the 11.8 one. Having more than one installed is permitted so a search should show where they are.

Regarding:

The cuda version show in nvidia-smi is the version of Cuda toolkit that was used to compile the driver and indicates the cuda version up to which this driver will support.

1 Like

How can we install Nvidia Drivers, CUDA packages, CUDNN packages on Azure NC A100 v4 VM Ubuntu 22.04 Linux OS with GPU capabilities? I have tried installing different versions on my VM including nvidia-driver-535, nvidia-driver-550, nvidia-driver-535-server, etc. But each time am facing an issue:
nvidia-smi
NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

I have read all the blogs pertaining to reinstallation of kernels and disabling secure boot. I have already taken care of all these steps. Looking forward to get some support and guidance from Nvidia Team.

did you resolved your issue? I also faced same issue.

1 Like

Did we find a solution for this? I am running into the same problem.