Installing additional CUDA versions

florian.pertschy · October 3, 2023, 1:09pm

I want to install multiple software on our Rocky Linux 8 cluster that is reliant on pytorch, dgl-cuda and cuda - more specifically versions 11.6-11.8 - but the only cuda versions I have currently installed are 11.2, 12.1 and 12.2

I read that usually having multiple cuda versions installed shouldn’t be a problem, but I want to ensure that 12.1 remains the “main” one in use and installing a new one doesn’t brick the system. When I tried to install it with the runfile it showed

Existing package manager installation of the driver found. It is strongly recommended that you remove this before continuing.

And when I tried the rpm instead I got this:

[admin@cluster newcuda]$ sudo dnf -y module install nvidia-driver:latest-dkms
Warning: failed loading '/etc/yum.repos.d/oneAPI.repo', skipping.
Rocky Linux 8 - AppStream                        17 MB/s |  11 MB     00:00    
Rocky Linux 8 - BaseOS                           14 MB/s | 7.1 MB     00:00    
Rocky Linux 8 - PowerTools - Source             1.8 MB/s | 655 kB     00:00    
Rocky Linux 8 - Extras                           53 kB/s |  14 kB     00:00    
Rocky Linux 8 - PowerTools                      5.7 MB/s | 2.8 MB     00:00    
Rocky Linux 8 - PowerTools - Source             557 kB/s | 197 kB     00:00    
cuda-rhel8-x86_64                                16 MB/s | 2.7 MB     00:00    
cuda-rhel8-11-1-local                            26 MB/s |  70 kB     00:00    
cuda-rhel8-11-2-local                            30 MB/s |  72 kB     00:00    
cuda-rhel8-11-7-local                            44 MB/s |  87 kB     00:00    
cuda-rhel8-12-1-local                            36 MB/s |  94 kB     00:00    
ELRepo.org Community Enterprise Linux Repositor 399 kB/s | 243 kB     00:00    
Extra Packages for Enterprise Linux 8 - x86_64   12 MB/s |  16 MB     00:01    
Extra Packages for Enterprise Linux 8 - Next -  1.4 MB/s | 368 kB     00:00    
NVIDIA HPC SDK                                   19 MB/s | 3.1 MB     00:00    
NOTE: Skipping kernel installation since no kernel module package kmod-nvidia-530.30.02-4.18.0-477.27.1 for kernel version 4.18.0-477.27.1.el8_8 and NVIDIA driver 535.86.10 could be found
Error: 
 Problem: problem with installed package kmod-nvidia-535.86.10-4.18.0-477.21.1-3:535.86.10-3.el8_8.x86_64
  - package kmod-nvidia-535.86.10-4.18.0-477.21.1-3:535.86.10-3.el8_8.x86_64 conflicts with kmod-nvidia-latest-dkms provided by kmod-nvidia-latest-dkms-3:535.104.12-1.el8.x86_64
  - cannot install the best candidate for the job
(try to add '--allowerasing' to command line to replace conflicting packages or '--skip-broken' to skip uninstallable packages or '--nobest' to use not only best candidate packages)

Which i fear might cause problems because it wants to erase kernel drivers for 12.1 (?)

Do you have a suggestion on how I can proceed? The software in particular I was trying to install is called RFdiffusion and the error message said this:

 File "/software/anaconda/envs/SE3nv/lib/python3.9/site-packages/torch/cuda/nvtx.py", line 9, in _fail
    raise RuntimeError("NVTX functions not installed. Are you sure you have a CUDA build?")
RuntimeError: NVTX functions not installed. Are you sure you have a CUDA build?

Robert_Crovella · October 3, 2023, 2:18pm

If I were doing this I would use the runfile installer to install older versions (feel free to use the RPM/package manager method if you wish, for your latest version, etc.)

A full CUDA install, whether by package manager or runfile installer, can/will install both the CUDA toolkit as well as the GPU driver. The driver install is the sticky point, that can present conflicts between the runfile and package manager methods. The CUDA toolkit portion generally won’t conflict. This is more-or-less evident in the warning message you excerpted:

Existing package manager installation of the driver found.

So:

Install the latest version of CUDA (and the GPU driver) using either package manager or runfile installer method.
Install older versions of CUDA (toolkit) using runfile installers. Deselect the option to install the driver during this step.

I don’t really know how to do this using purely package manager methods. There may be a way, I just don’t know it. Using the package manager, if you already have a suitable GPU driver installed, you can use the package manager to install only the cuda toolkit portion (not the GPU driver) using instead of dnf install cuda, you guessed it, dnf install cuda-toolkit. You can also install older versions of the toolkit using other meta packages (e.g. dnf install cuda-toolkit-10-2). However I personally don’t know how to install multiple cuda toolkits this way. It may just work, but I think it does not, without extra magic.

There is an install guide available. I suggest reading it. The post install steps are one example of something I have not covered here, but are usually necessary for best functionality.

Topic		Replies	Views
Issues with cuda-12.6.0-1.x86_64 from RHEL8 repo CUDA Setup and Installation	12	3945	September 4, 2024
CUDA Repo. Update Issues - NVIDIA-RedHat Linux CUDA Setup and Installation cuda	3	1197	September 25, 2024
Setup CUDA Toolkit after CUDA drivers are already installed CUDA Setup and Installation	11	41276	December 14, 2021
Fedora 38 and CUDA Tookit issues installing/compiling CUDA Setup and Installation cuda , linux , fedora	6	2135	January 9, 2024
CUDA Toolkit on Rocky Linux 9 nvidia-smi Fails Linux cuda	9	4098	October 5, 2022
nvidia-driver conflicts with cuda-drivers CUDA Setup and Installation	6	11607	July 14, 2018
CUDA driver version is insufficient for CUDA runtime version CUDA Setup and Installation	14	35130	December 9, 2016
Nvidia Cuda Compiler not showing up in Linux 22.04 Linux cuda , linux , nvcc	24	19314	May 30, 2022
Runfile Installer error for Cuda 10.1 on Ubuntu 18.04 CUDA Setup and Installation	4	11667	May 28, 2019
Can't install cuda 11.8 on ubuntu 22.04 lts CUDA Setup and Installation	9	28468	November 8, 2024

Installing additional CUDA versions

Related topics