I installed the recommended GPU driver for my system (Ubuntu 22.04, x86_64 with a NVIDIA RTX 3070 8GB) which is 535 at the time of writing. The installation was automatically done using the “Additional Drivers” pane of the “Software & Update” Ubuntu app.
Apparently this installed CUDA 12.2 as reported by nvidia-smi
:
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.86.05 Driver Version: 535.86.05 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3070 ... Off | 00000000:01:00.0 Off | N/A |
| N/A 37C P0 ERR! / 80W | 10MiB / 8192MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 3078 G /usr/lib/xorg/Xorg 4MiB |
+---------------------------------------------------------------------------------------+
However, I read in another post that the displayed CUDA version number just informs on the CUDA version used for compiling the driver or something like that. Thus, CUDA may not be installed on my system. There is no CUDA folder in /usr/local/
, thus I think that I may be right.
I use PyTorch which embeds the required CUDA/CUDNN dependencies so that fine not having CUDA installed system wide for this. However, I want to use TensorRT and other tools that require CUDA installed on my system.
Strangely TensorRT and most other tools are not compatible with the last CUDA version available: 12.2. So I have to install CUDA 11.8 which seems to be the most compatible version at that time.
However when I install CUDA 11.8 using the official installation guide, this changes the GPU driver installed on my machine. After reboot the driver version is 520 instead of 535. This causes many issues on my system including many screen glitches and errors. I tried to install back GPU driver version 535 but that fails with black screens and more bugs.
How to solve this issue ?
This kind of issue is not new and I think many people struggle with installing NVIDIA drivers and tools. This would be great to improve the developer experience on this aspect.