Installing new nvidia drivers and cuda and cudnn on an nvidia geforce 1050 ti?

I have been at this for days. I have a Lenovo Legion y520. It has 32GB of RAM, a 1TB ssd, and an NVIDIA GeForce GTX 1050 ti with 4GB VRAM. I am trying to configure my computer for machine learning according to the following program.

I create a fresh install of Ubuntu 22.04.
I select Ubuntu Pro for security and allow the software updater to update the software.

Install Google Chrome –
wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
sudo dpkg -i google-chrome-stable_current_amd64.deb
sudo apt --fix-broken install
sudo apt update && sudo apt upgrade

Install DEAD SNAKES repository -
sudo apt install software-properties-common
sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt update && sudo apt upgrade

Install PYTHON 3.12.1 -
sudo apt install python3.12
sudo apt update && sudo apt upgrade

Install Git Repository -
sudo add-apt-repository ppa:git-core/ppa
sudo apt update && sudo apt upgrade

Install Git CLI version 2.43.0 -
sudo apt install git
sudo apt update && sudo apt upgrade
sudo git –version

Install Curl 7.81.0 -
sudo apt update && sudo apt upgrade
sudo apt install curl
sudo curl --version

Install Homebrew -
/bin/bash -c “$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)”
(echo; echo ‘eval “$(/home/linuxbrew/.linuxbrew/bin/brew shellenv)”’) >> /home/tsisaris/.bashrc
eval “$(/home/linuxbrew/.linuxbrew/bin/brew shellenv)”
sudo apt-get install build-essential
sudo apt update && sudo apt upgrade

Install DBUS-X11 -
sudo apt-get install dbus-x11
sudo apt update && sudo apt upgrade

At this point everything is working just fine

I then try to install the latest version of NVIDIA drivers for my particular GPU which is now a legacy card.

I have attempted to use the ubuntu drivers tool but it installs the wrong driver.

I go on NVIDIAS website and select the correct driver and download it.

This happens to be NVIDIA-Linux-x86_64-535.146.02.run

I want to update my driver to the latest possible version and install the latest versions of CUDA, CUDNN, and Pytorch that will work with my machine so that I can begin to study and practise machine learning.

In my case it would seem that CUDA 11.8 and CUDNN 8.9.7 are the latest versions that will Work with Pytorch 2.1.1 and on my video card.

This is where the problem comes in.

After following almost every permutation and order of installation process and they all fail to update the driver because nvidia drm is in use?

I finally try this procedure…

Switch to tty3 by pressing Ctl+Alt+F3 -

Unload nvidia-drm before proceeding -

Isolate multi-user.target -
sudo systemctl isolate multi-user.target

Note that nvidia-drm is currently in use -
lsmod | grep nvidia.drm

Unload nvidia-drm -
sudo modprobe -r nvidia-drm

Note that nvidia-drm is not in use anymore -
lsmod | grep nvidia.drm

Install Newest Nvidia GPU Drivers 535.146.02 -
cd ~/Downloads
sudo chmod +x NVIDIA-Linux-x86_64-535.146.02.run
sudo ./NVIDIA-Linux-x86_64-535.146.02.run

I answer all prompts during installation. It still seems like there is come kind of conflict.
I have to input the keyring key

When installation has finished, confirm that the new driver is installed
nvidia-smi

I get that the Driver Version is 535.146.02 and the CUDA version is 12.2? I haven’t even installed CUDA yet…

Start the GUI again -
sudo systemctl start graphical.target

I now want to install CUDA

Switch to tty3 by pressing Ctl+Alt+F3 -

Unload nvidia-drm before proceeding -

Isolate multi-user.target -
sudo systemctl isolate multi-user.target

Note that nvidia-drm is currently in use -
lsmod | grep nvidia.drm

Unload nvidia-drm -
sudo modprobe -r nvidia-drm

Note that nvidia-drm is not in use anymore -
lsmod | grep nvidia.drm

Go to your download folder and run the cuda installation -
sudo dpkg -i cuda-repo-ubuntu2204-11-8-local_11.8.0-520.61.05-1_amd64.deb

Answer any prompts during installation -

When installation has finished, confirm that the CUDA Version has been updated -
nvidia-smi

I start to understand this less and less

when I run the nvidia-smi command I get that the Driver Version is 535.146.02 and the CUDA version is 12.2?

I have to install the NVIDIA CUDA toolkit so that I can run nvss --version -
sudo apt install nvidia-cuda-toolkit

when I run the nvcc --version command I get this garbage…

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:45:30_PST_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0

Where does CUDA 11.5 come from? I installed CUDA 11.8 and in nvidia-smi it sayd CUDA version 12.2

How can I go through all of this process without a hangup?

I am sure that there is a bunch of broken garbage in my Ubuntu system now. I want a complete step by step method that corrects my code if necessary to finish installing CUDA 11.8 and CUDNN 8.9.7 without a hiccup and without creating a bunch of broken garbage…

What am I missing?

Thank you in advance,
Shawn

Hi there @tsisaris and welcome to the NVIDIA developer forums.

Since you spent so much time describing the issue I won’t just point you to our very good and helpful CUDA section of the forums, but I recommend keeping it bookmarked.

To answer some of your questions. The nvidia-smi output with the seemingly confusing CUDA version simply states that the GPU driver supports CUDA 12.2. The core drivers or if you will functionality of CUDA is already part of the driver. If you install CUDA as such it will include a fitting GPU driver and if you install an NVIDIA GPU driver as such it will contain support for appropriate CUDA version.

What allows you to do Compute or Machine Learning tasks is the CUDA toolkit. That means after installing a driver for your system you could simply check which (max) toolkit version is supported for that driver and install that.

Installing the cuda toolkit from the distribution will not necessarily be the latest supported one, that is why you had this 11.5 vs 11.8 mismatch.

Taking a step back, if you are sure you want to develop something using CUDA (not just run CUDA apps) the recommended way to install everything necessary is to do it based on the CUDA installation instructions. Those include steps that explain how the GPU driver gets installed and what pre-requisites are needed. You should also, when starting from a fresh Ubuntu, do the GPU/CUDA installation first, before changing package sources, kernel dependencies and DBUS-X11 configurations.

Quite honestly, looking at your current status, if it were me, I would start from scratch with a fresh Ubuntu.

I hope that helped a bit!

I wrote this for another forum after people responded to my posting. I copied and pasted it here to better explain the issue and final desired result. Please let me know if I’m on the right track. Thank you.

I’m trying to reply to all of these for further help? Some of this was in what I posted? I was told to use the latest NVIDIA drivers for my card. I used the Ubuntu recommended drivers with the Ubuntu installation tool and they were not the ones that NVIDIA recommends… So I downloaded the current most up to date drivers directly from NVIDIA’s website as I said. This is really where the issue exists. I had to do a manual install of the most recent driver which seems to be 535.146.02 as of today as per NVIDIA’s website… I need people to be very explicit here. Are you saying that I should ignore NVIDIA’s website and just use whatever the Ubuntu driver installation tool happens to install or recommend?

Using the NVIDIA driver from any other source than the standard repositories

(- what standard repositories? NVIDIA or Ubuntu? - CODE PLEASE… -)

will likely fail upon a kernel or driver upgrade – avoid that by just getting the Nvidia drivers installed first (535.129.03 probably) (– wrong number according to NVIDIA -)

then use the .run script from Nvidia and reject the offer of Nvidia drivers.

(- How? CODE PLEASE… -)

Override the system locations for bin and lib files too – all that can go under cuda/lib and cuda/bin.

(- How? CODE PLEASE… -)

Please use Code tags on terminal or longer text output.
Easy to add code tags with Forum’s advanced editor and # icon.

(- I have no idea how to di this… -)

The nVidia search says this is correct driver: 535.146.02

(- Yes. I said this. This was the issue. I couldn’t just install that driver. It kept giving me errors… I spent hours using Bing Chat to try new things and when I would get an error I would cut and paste the error into Bing Chat for it to be analyzed. I ended up trying nearly all of this. Bing Chat would give me the code and direct me to forums and websites to verify it. Low and behold it was the same advice that was being given on the forums… -)

Ubuntu should give you the same:

#What is installed
dkms status

list drivers available, same list as system settings, software updates, additional drivers or last tab

ubuntu-drivers devices

or

ubuntu-drivers devices | grep recommended

sudo apt-get remove --purge nvidia-*
sudo ubuntu-drivers devices
sudo ubuntu-drivers autoinstall

man mkinitramfs
sudo update-initramfs -u
or
sudo update-initramfs -k all -c

(- Nearly all of this code I got from Bing Chat and the forums and tried it multiple times in multiple ways to no avail… -)

(- If python 3.11.7 is truly the best most stable version of python out then thank you. This was the kind of advice that I am looking for… -)

(- In a similar manner I need to know that the combined relationship between CUDA, cuDNN, and Pytorch that is optimal for my machine. I read that I could do CUDA 11.8, cuDNN 8.9.7, and Pytorch 2.1.1. I also read that CUDA 12.2, cuDNN 8.9.7, and an older Pytorch 1.10.0 would be much better with this machine… How do I verify this? -)

(- In fact, how do I answer this type of question in general? I want to install docker, nvidiacudatoolkit, nvidia container toolkit, conda, jupyter notebooks, and whatever other tools or toolkits that will be helpful. I want all of them to be compatible with each other but also the most efficient, fastest, most stable version of each that is 100% compatible with all of the other programs and toolkits… -)

(- Given that I am functionally illiterate in most of the mumbo jumbo, I need step by step in baby steps that even a caveman using fist sized buttons could implement with minimal effort and error and maximal success… -)

(- CODE PLEASE… -) (- CODE PLEASE… -) (- CODE PLEASE… -)

Thank you for your help

Why is it that I get this -

sudo apt-cache policy nvidia-driver-535
[sudo] password for tsisaris:
nvidia-driver-535:
Installed: 535.129.03-0ubuntu0.22.04.1
Candidate: 535.129.03-0ubuntu0.22.04.1
Version table:
*** 535.129.03-0ubuntu0.22.04.1 500
500 Index of /ubuntu jammy-updates/restricted amd64 Packages
500 Index of /ubuntu jammy-security/restricted amd64 Packages
100 /var/lib/dpkg/status

and I get this

nvidia-smi
Sun Jan 7 16:39:31 2024
±--------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.146.02 Driver Version: 535.146.02 CUDA Version: 12.2 |
|-----------------------------------------±---------------------±---------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======= ===============+======================|
| 0 NVIDIA GeForce GTX 1050 Ti Off | 00000000:01:00.0 Off | N/A |
| N/A 33C P8 N/A / ERR! | 4MiB / 4096MiB | 0% Default |
| | | N/A |
±----------------------------------------±---------------------±---------------------+

±--------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|================================================= ======================================|
| 0 N/A N/A 1725 G /usr/lib/xorg/Xorg 4MiB |
±--------------------------------------------------------------------------------------+

I want to know how to start over with a fresh installation of Ubuntu 22.04 LTS and add these things in an exact order where I get no conflicts.

I still need to know which are the best versions of cuda, cudnn, and pytorch to use together with this machine.