Compatibility issue with CUDA, NVIDIA drivers, TensorRT and other tools

I installed the recommended GPU driver for my system (Ubuntu 22.04, x86_64 with a NVIDIA RTX 3070 8GB) which is 535 at the time of writing. The installation was automatically done using the “Additional Drivers” pane of the “Software & Update” Ubuntu app.

Apparently this installed CUDA 12.2 as reported by nvidia-smi:

| NVIDIA-SMI 535.86.05              Driver Version: 535.86.05    CUDA Version: 12.2     |
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  NVIDIA GeForce RTX 3070 ...    Off | 00000000:01:00.0 Off |                  N/A |
| N/A   37C    P0             ERR! /  80W |     10MiB /  8192MiB |      0%      Default |
|                                         |                      |                  N/A |
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|    0   N/A  N/A      3078      G   /usr/lib/xorg/Xorg                            4MiB |

However, I read in another post that the displayed CUDA version number just informs on the CUDA version used for compiling the driver or something like that. Thus, CUDA may not be installed on my system. There is no CUDA folder in /usr/local/, thus I think that I may be right.

I use PyTorch which embeds the required CUDA/CUDNN dependencies so that fine not having CUDA installed system wide for this. However, I want to use TensorRT and other tools that require CUDA installed on my system.

Strangely TensorRT and most other tools are not compatible with the last CUDA version available: 12.2. So I have to install CUDA 11.8 which seems to be the most compatible version at that time.

However when I install CUDA 11.8 using the official installation guide, this changes the GPU driver installed on my machine. After reboot the driver version is 520 instead of 535. This causes many issues on my system including many screen glitches and errors. I tried to install back GPU driver version 535 but that fails with black screens and more bugs.

How to solve this issue ?

This kind of issue is not new and I think many people struggle with installing NVIDIA drivers and tools. This would be great to improve the developer experience on this aspect.

First of all, after installing the NVIDIA driver you need to install CUDA toolkit. You can choose the operating system, architecture, distribution etc from the NVIDIA website. As you’re using CUDA version 12.2, you have to download and install the toolkit from official NVIDIA developer website after installing the NVIDIA drivers. After that you may see the cuda folder into this directory /usr/local/, don’t forget to export the PATH after installation in your ~/.bashrc script. For instance for add the PATH and LD_LIBRARY_PATH, you may add this line at the end of your bashrc script

export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda/lib${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

CUDA 12.2 toolkit:

If you’re using tensorrt, please follow this tensorrt documentation from NVIDIA, for avoiding the dependency issues.

Tenossrt documentation:

Let’s say you want to install tensorrt version 8.6.1, then the support matrix from tensorrt on NVIDIA developer website help you to into the supported platforms, features, and hardware capabilities of the NVIDIA TensorRT 8.6.1 APIs, parsers, and layers. For tensorrt 8.6.1, you need cuDNN 8.9.0.

cuDNN Archive:

I don’t have issues following the documentation for installing CUDA or TensorRT, please read my post.

The issues I face are:

  • CUDA12.2 is not compatible with anything right now, not even TensortRT, yet, this is the most recent version and the (somewhat) recommended CUDA version.
  • Installing CUDA11.8, which seems to be the most compatible right now, modifies the installed NVIDIA driver on my system (from 535 to 520) which causes other issues.

I need to install CUDA11.8 while keeping NVIDIA Driver 535, which seems to be possible since the minimum required driver version is >= 450.80.02 as per the documentation.

If you install via the runfile installer, keep your 535 driver install by deselecting the option to install the driver.

If you install via the package manager methods, keep your 535 driver by installing the cuda-toolkit meta package rather than cuda.

More info is available in the linux CUDA install guide.

This seems to solve the issue, though I have to specify the exact cuda-toolkit version, e.g. sudo apt install cuda-toolkit-11.8 for CUDA 11.8.

The default behavior of installing a driver during the installation of CUDA is confusing. I don’t think that this is clearly stated in the documentation. There is a “Driver Installation” section but this is not clear that the Package Installation method will override this installation. Also, this seems to imply that the driver will not be installed by the CUDA installation methods.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.