I have numba and CUDA installed on my linux ubuntu.
When I run numba -s, I see the following mismatch in CUDA driver and runtime versions. I think it is strange. How can I fix?
$ numba -s| grep CUDA CUDA Information
CUDA Device Initialized : True
CUDA Driver Version : 12.2
CUDA Runtime Version : 12.4
CUDA NVIDIA Bindings Available : False
CUDA NVIDIA Bindings In Use : False
CUDA Minor Version Compatibility Available : False
CUDA Minor Version Compatibility Needed : True
CUDA Minor Version Compatibility In Use : False
CUDA Detect Output:
Found 2 CUDA devices
Some more informaion
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Mar_28_02:18:24_PDT_2024
Cuda compilation tools, release 12.4, V12.4.131
Build cuda_12.4.r12.4/compiler.34097967_0
I can run CUDA on GPU for example using taichi library, but I get following error when I run a script based on numba
numba.cuda.cudadrv.driver.CudaAPIError: [222] Call to cuLinkAddData results in CUDA_ERROR_UNSUPPORTED_PTX_VERSION
I wonder if this is caused by mismatched CUDA driver and runtime versions. Any suggestions would be greatly appreciated
Yes, to run things the depend on CUDA 12.4 runtime version, your driver version must also be 12.4 or higher. The fix I recommend is to update the GPU driver. You can find drivers using the wizard here.
If your previous driver install was via a package manager method (e.g. apt, dnf, etc.) then easiest path is to use the same method to install an updated driver. In this case you may just wish to use a CUDA installer package for CUDA 12.4 or newer.
If your previous driver install was via runfile installer, then easiest path is to install the new driver via runfile installer. Again, you could use a CUDA 12.4 runfile installer for this, if you wished.
You did something wrong. It’s difficult to diagnose without inspecting every step and the output.
During execution of the runfile, you should have been offered an option (in a text menu) to install the 550.54.14 driver - did you select that option?
What was the actual output of nvidia-smi after rebooting?
The driver install process associated with the runfile installer should have created a /var/log/nvidia-installer.log file. Does that file contents show any issues?
If you’re not able to fix things, another option is to reload a fresh copy of your linux OS, then install the runfile installer using the same steps.
During execution of the runfile, you should have been offered an option (in a text menu) to install the 550.54.14 driver - did you select that option?
No I did not get any such option. It just finished quietly in a few seconds. /var/log/nvidia-installer.log is not created. I will think about reloading linux OS
$nvidia-smi
Tue Sep 3 10:33:22 2024
±--------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.06 Driver Version: 535.183.06 CUDA Version: 12.2 |
|-----------------------------------------±---------------------±---------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA A30 On | 00000000:01:00.0 Off | 0 |
| N/A 32C P0 31W / 165W | 13MiB / 24576MiB | 0% Default |
| | | Disabled |
±----------------------------------------±---------------------±---------------------+
| 1 NVIDIA GeForce RTX 4080 … On | 00000000:05:00.0 Off | N/A |
| 0% 28C P8 2W / 320W | 24MiB / 16376MiB | 0% Default |
| | | N/A |
±----------------------------------------±---------------------±---------------------+
±--------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 1524 G /usr/lib/xorg/Xorg 4MiB |
| 1 N/A N/A 1524 G /usr/lib/xorg/Xorg 9MiB |
| 1 N/A N/A 1701 G /usr/bin/gnome-shell 6MiB |
±--------------------------------------------------------------------------------------
The installer failed then. I probably won’t be able to diagnose further. Maybe you are out of disk space. A proper run of the runfile installer will offer you a menu and will take at least 30 seconds to complete its tasks, if not longer. Just the decompression/extraction step before the menu will probably take ~30 seconds.
Using built-in stream user interface
→ Detected 32 CPUs online; setting concurrency level to 32.
→ Scanning the initramfs with lsinitramfs…
→ Executing: /usr/bin/lsinitramfs -l /boot/initrd.img-6.5.0-35-generic
WARNING: An NVIDIA kernel module ‘nvidia-drm’ appears to be already loaded in your kernel. This may be because it is in use (for example, by an X server, a CUDA program, or the NVIDIA Persistence Daemon), but this may also happen if your kernel was configured without support for module unloading. Some of the sanity checks that nvidia-installer performs to detect potential installation problems are not possible while an NVIDIA kernel module is running.
→ Would you like to continue installation and skip the sanity checks? If not, please abort the installation, then close any programs which may be using the NVIDIA GPU(s), and attempt installation again. (Answer: Abort installation)
ERROR: Installation has failed. Please see the file ‘/var/log/nvidia-installer.log’ for details. You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com
This mismatch is confusing because CUDA should have backward compatibility. The issue might be with how your environment variables are set or which CUDA installation your tools are picking up. I built env-doctor specifically for this - it shows you exactly which CUDA versions your system sees vs what your drivers support.