Installing Cuda on Ubuntu 20.04 with Older GPUs

Hello,

I have a number of Ubuntu 20.04.6 LTS hosts with older GPUs (Tesla P100-PCIE-16GB, GeForce GTX 1080 Ti, and Quadro RTX 8000) that don’t appear to be compatible with the new nvidia-open drivers. The cuda and cuda-12-6 packages in apt all seem to have dependencies on the open driver branch now, though, breaking the usual process I’ve been using for setting up cuda on them.

Is there a way to get the tools in cuda-12-6 (especially nvidia-smi and nvcc) and their dependencies installed on these systems that doesn’t automatically uninstall the cuda-drivers packages required by my GPUs or supplant them with incompatible open drivers?

Thanks!

There are several issues here.

  1. open drivers are very restrictive as to supported cards and don’t support pascal ( GT10xxx) and earlier I believe, so they are out for the 1080s etc.
  2. For each of the card types involved you need to discover the latest driver/ version of cuda that supports them. E.g. for a pascal card. I am running 12.6.2 cuda with a pascal card but with the proprietary driver with the 560.35.03 version number you are seeing suggested for the open one. Depending on the full mix of cards you might find one version to support them all (neat) or you might have to choose different ones (messy)
  3. Read the right sections of the guides!
  4. It has been my experience (albeit using the fedora equivalent methods) that the easiest way to cope with these issues is to install by going here:on
    1. Introduction — NVIDIA Driver Installation Guide r560 documentation
    and look at section 4.2. NVIDIA Proprietary GPU Kernel Modules Installation
  5. sort out the repo and if necessary keys etc
  6. install the proprietary driver
  7. check nvidia-smi at the command prompt is happy. Dont be confused by the “version” - its the latest cuda version it will support. Then go here:
    CUDA Installation Guide for Linux
    look at 3. Package Manager Installation

and then you should be able to:
sudo apt install cuda-toolkit
by doing toolkit my experience is that it leaves drivers alone and installs everything else.
apt upgrade will keep you up to date.

Hope this helps

FWIW, at this point it looks like there are usable cuda-drivers packages available for Ubuntu 20.04 in apt, so I’m able to get everything I need running on my older systems.

Here’s the full upgrade process that has worked on all of my servers, in case anyone else is stuck with the same issue.

Install the latest nvidia keyring and its prerequisites:
apt install linux-headers-$(uname -r)
distribution=$(. /etc/os-release;echo $ID$VERSION_ID | sed -e ‘s/.//g’)
wget https://developer.download.nvidia.com/compute/cuda/repos/$distribution/x86_64/cuda-keyring_1.1-1_all.deb
dpkg -i cuda-keyring_1.1-1_all.deb
apt update

Upgrade the system, let apt clean out the old drivers and cuda packages, and install a clean cuda-drivers package:
apt update && apt dist-upgrade
reboot
apt update && apt dist-upgrade
apt autoremove -y
apt install cuda-drivers
reboot

Create the libnvidia-ml.so symlink, without which slurm will keep complaining about nvml not being found on the system:
cd /lib/x86_64-linux-gnu/
ln -s libnvidia-ml.so.1 /lib/x86_64-linux-gnu/libnvidia-ml.so
ldconfig

That last issue was causing the majority of my problems. Not sure why the current packages link only libnvidia-ml.so.1.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.