Mismatch in CUDA driver and runtime versions

sine.arc · August 30, 2024, 5:37am

I have numba and CUDA installed on my linux ubuntu.
When I run numba -s, I see the following mismatch in CUDA driver and runtime versions. I think it is strange. How can I fix?

$ numba -s| grep CUDA
CUDA Information
CUDA Device Initialized : True
CUDA Driver Version : 12.2
CUDA Runtime Version : 12.4
CUDA NVIDIA Bindings Available : False
CUDA NVIDIA Bindings In Use : False
CUDA Minor Version Compatibility Available : False
CUDA Minor Version Compatibility Needed : True
CUDA Minor Version Compatibility In Use : False
CUDA Detect Output:
Found 2 CUDA devices

Some more informaion

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Mar_28_02:18:24_PDT_2024
Cuda compilation tools, release 12.4, V12.4.131
Build cuda_12.4.r12.4/compiler.34097967_0

$ lspci | grep -i nvidia
01:00.0 3D controller: NVIDIA Corporation GA100GL [A30 PCIe] (rev a1)
05:00.0 VGA compatible controller: NVIDIA Corporation Device 2702 (rev a1)
05:00.1 Audio device: NVIDIA Corporation Device 22bb (rev a1)

I can run CUDA on GPU for example using taichi library, but I get following error when I run a script based on numba
numba.cuda.cudadrv.driver.CudaAPIError: [222] Call to cuLinkAddData results in CUDA_ERROR_UNSUPPORTED_PTX_VERSION

I wonder if this is caused by mismatched CUDA driver and runtime versions. Any suggestions would be greatly appreciated

Archana

Robert_Crovella · August 30, 2024, 4:12pm

Yes, to run things the depend on CUDA 12.4 runtime version, your driver version must also be 12.4 or higher. The fix I recommend is to update the GPU driver. You can find drivers using the wizard here.

If your previous driver install was via a package manager method (e.g. apt, dnf, etc.) then easiest path is to use the same method to install an updated driver. In this case you may just wish to use a CUDA installer package for CUDA 12.4 or newer.

If your previous driver install was via runfile installer, then easiest path is to install the new driver via runfile installer. Again, you could use a CUDA 12.4 runfile installer for this, if you wished.

The CUDA linux install guide has additional info.

sine.arc · September 3, 2024, 1:09am

Thanks Robert. As you suggested I downloaded and installed 12.4 version using the following commands

$wget https://developer.download.nvidia.com/compute/cuda/12.4.0/local_installers/cuda_12.4.0_550.54.14_linux.run
$sudo sh cuda_12.4.0_550.54.14_linux.run

After rebooting, I still see the same driver version, cuda 12…2, in numba -s and nvidia-smi commands. Perhaps I am missing something?

I have cuda 12.4 at the following path
/usr/local/cuda-12.4

/usr/local/cuda-12.4/bin is included in my PATH environment variable

I searched in my system for any older version - I could not find cuda12.2 anywhere

Archana

Robert_Crovella · September 3, 2024, 1:23am

You did something wrong. It’s difficult to diagnose without inspecting every step and the output.

During execution of the runfile, you should have been offered an option (in a text menu) to install the 550.54.14 driver - did you select that option?

What was the actual output of nvidia-smi after rebooting?

The driver install process associated with the runfile installer should have created a /var/log/nvidia-installer.log file. Does that file contents show any issues?

If you’re not able to fix things, another option is to reload a fresh copy of your linux OS, then install the runfile installer using the same steps.

sine.arc · September 3, 2024, 1:36am

During execution of the runfile, you should have been offered an option (in a text menu) to install the 550.54.14 driver - did you select that option?
No I did not get any such option. It just finished quietly in a few seconds. /var/log/nvidia-installer.log is not created. I will think about reloading linux OS
$nvidia-smi
Tue Sep 3 10:33:22 2024
±--------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.06 Driver Version: 535.183.06 CUDA Version: 12.2 |
|-----------------------------------------±---------------------±---------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA A30 On | 00000000:01:00.0 Off | 0 |
| N/A 32C P0 31W / 165W | 13MiB / 24576MiB | 0% Default |
| | | Disabled |
±----------------------------------------±---------------------±---------------------+
| 1 NVIDIA GeForce RTX 4080 … On | 00000000:05:00.0 Off | N/A |
| 0% 28C P8 2W / 320W | 24MiB / 16376MiB | 0% Default |
| | | N/A |
±----------------------------------------±---------------------±---------------------+

±--------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 1524 G /usr/lib/xorg/Xorg 4MiB |
| 1 N/A N/A 1524 G /usr/lib/xorg/Xorg 9MiB |
| 1 N/A N/A 1701 G /usr/bin/gnome-shell 6MiB |
±--------------------------------------------------------------------------------------

Robert_Crovella · September 3, 2024, 1:42am

The installer failed then. I probably won’t be able to diagnose further. Maybe you are out of disk space. A proper run of the runfile installer will offer you a menu and will take at least 30 seconds to complete its tasks, if not longer. Just the decompression/extraction step before the menu will probably take ~30 seconds.

sine.arc · September 3, 2024, 2:07am

Thanks Robert! I tried downloading and installing an update
$wget https://developer.download.nvidia.com/compute/cuda/12.4.1/local_installers/cuda_12.4.1_550.54.15_linux.run
$sudo sh cuda_12.4.1_550.54.15_linux.run

It shows the text menu and created a log file as you pointed out. I cannot understand though why the installation failed

$ cat /var/log/cuda-installer.log
[INFO]: Driver installation detected by command: apt list --installed | grep -e nvidia-driver-[0-9][0-9][0-9] -e nvidia-[0-9][0-9][0-9]
[INFO]: Cleaning up window
[INFO]: Complete
[INFO]: Checking compiler version…
[INFO]: gcc location: /usr/bin/gcc

[INFO]: gcc version: gcc version 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04)

[INFO]: Initializing menu
[INFO]: nvidia-fs.setKOVersion(2.19.7)
[INFO]: Setup complete
[INFO]: Installing: Driver
[INFO]: Installing: 550.54.15
[INFO]: Executing NVIDIA-Linux-x86_64-550.54.15.run --ui=none --no-questions --accept-license --disable-nouveau --no-cc-version-check --install-libglvnd 2>&1
[INFO]: Finished with code: 256
[ERROR]: Install of driver component failed. Consult the driver log at /var/log/nvidia-installer.log for more details.
[ERROR]: Install of 550.54.15 failed, quitting

$cat /var/log/nvidia-installer.log
nvidia-installer log file ‘/var/log/nvidia-installer.log’
creation time: Tue Sep 3 11:01:31 2024
installer version: 550.54.15

PATH: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin

nvidia-installer command line:
./nvidia-installer
–ui=none
–no-questions
–accept-license
–disable-nouveau
–no-cc-version-check
–install-libglvnd

Using built-in stream user interface
→ Detected 32 CPUs online; setting concurrency level to 32.
→ Scanning the initramfs with lsinitramfs…
→ Executing: /usr/bin/lsinitramfs -l /boot/initrd.img-6.5.0-35-generic
WARNING: An NVIDIA kernel module ‘nvidia-drm’ appears to be already loaded in your kernel. This may be because it is in use (for example, by an X server, a CUDA program, or the NVIDIA Persistence Daemon), but this may also happen if your kernel was configured without support for module unloading. Some of the sanity checks that nvidia-installer performs to detect potential installation problems are not possible while an NVIDIA kernel module is running.
→ Would you like to continue installation and skip the sanity checks? If not, please abort the installation, then close any programs which may be using the NVIDIA GPU(s), and attempt installation again. (Answer: Abort installation)
ERROR: Installation has failed. Please see the file ‘/var/log/nvidia-installer.log’ for details. You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com

Topic		Replies	Views
Install CUDA-9 on Ubuntu 16.04 with the runfile and pre-installed drivers CUDA Setup and Installation	15	58596	February 28, 2020
Problems with CUDA 9.1 in Ubuntu 16.04 CUDA Setup and Installation	36	24292	May 15, 2018
CUDA 10 installation problems on Ubuntu 18.04 CUDA Setup and Installation	24	94587	December 11, 2020
installation fails with kernels >= 5.1.x CUDA Setup and Installation	7	6480	July 5, 2019
NVIDIA driver is not confirmed on Ubuntu 14.04 CUDA Setup and Installation	4	2859	January 8, 2015
CUDA, Linux Ubuntu 10.04 and strange mismatch version CUDA Programming and Performance	26	19090	November 18, 2010
cuda install fail - ubuntu 14.04 CUDA Setup and Installation	8	3716	February 4, 2016
Cuda Installer error for Linux Ubuntu 18.04 x86_64 CUDA Setup and Installation	2	2495	March 11, 2020
[INFO]: Finished with code: 256 , [ERROR]: Install of driver component failed CUDA Setup and Installation	24	180173	September 29, 2024
installing Cuda 7.5 fails on ubuntu 14.0.4.5 with error driver installation is unable to locate the kernel source CUDA Setup and Installation	6	19122	November 29, 2016

Mismatch in CUDA driver and runtime versions

Related topics