CUDA driver and runtime mismatch

mahmood.nt · April 11, 2020, 11:46am

I get this error while trying to run gromacs

NOTE: Detection of GPUs failed. The API reported:
      CUDA driver version is insufficient for CUDA runtime version
      GROMACS cannot run tasks on a GPU.

I am working on a shared computer with non-root access. I see these cuda versions in the system paths:

$ ls /usr/local/cuda -l
lrwxrwxrwx. 1 root root 9 Oct 17  2018 /usr/local/cuda -> cuda-10.0

But I have my own version in the home:

$ echo $LD_LIBRARY_PATH
/storage/users/mahmood/cuda-10.1.168/lib64
$ which nvcc
~/cuda-10.1.168/bin/nvcc
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Apr_24_19:10:27_PDT_2019
Cuda compilation tools, release 10.1, V10.1.168

The output of nvidia-smi looks like

$ which nvidia-smi
/bin/nvidia-smi
$ nvidia-smi
Sat Apr 11 16:11:30 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.48                 Driver Version: 410.48                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 00000000:02:00.0 Off |                  N/A |
|  0%   36C    P8    14W / 250W |      0MiB / 11178MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 980 Ti  Off  | 00000000:81:00.0 Off |                  N/A |
| 75%   82C    P2   164W / 250W |    819MiB /  6083MiB |     98%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

Also, deviceQuery works properly

$ ~/NVIDIA_CUDA-10.1_Samples/1_Utilities/deviceQuery/deviceQuery
/storage/users/mahmood/NVIDIA_CUDA-10.1_Samples/1_Utilities/deviceQuery/deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 1080 Ti"
  CUDA Driver Version / Runtime Version          10.0 / 10.0
  CUDA Capability Major/Minor version number:    6.1
  Total amount of global memory:                 11178 MBytes (11721506816 bytes)
  (28) Multiprocessors, (128) CUDA Cores/MP:     3584 CUDA Cores
  GPU Max Clock rate:                            1683 MHz (1.68 GHz)
  Memory Clock rate:                             5505 Mhz

And CUDA_VISIBLE_DEVICES is fine

$ echo $CUDA_VISIBLE_DEVICES
0

The configure command was

$ cmake .. -DCMAKE_INSTALL_PREFIX=/storage/users/mahmood/cactus/gromacs/gromacs-2019.4-1080ti/single -DGMX_GPU=on -DGMX_CUDA_TARGET_SM=61
...
-- Looking for NVIDIA GPUs present in the system
-- Number of NVIDIA GPUs detected: 2
-- Found CUDA: /storage/users/mahmood/cuda-10.1.168 (found suitable version "10.1", minimum required is "7.0")

So, everything looks normal. I wonder why the binary is unable to use the device 0?

Any idea for more debugging?

mahmood.nt · April 11, 2020, 12:53pm

So, in the gromacs log I see

CUDA compiler:      /storage/users/mahmood/cuda-10.1.168/bin/nvcc nvcc: NVIDIA (R) Cuda compiler driver;Copyright (c) 2005-2019 NVIDIA Corporation;Built on Wed_Apr_24_19:10:27_PDT_2019;Cuda compilation tools, release 10.1, V10.1.168
CUDA compiler flags:-gencode;arch=compute_61,code=sm_61;-use_fast_math;;; ;-mavx2;-mfma;-std=c++11;-O3;-DNDEBUG;-funroll-all-loops;-fexcess-precision=fast;
CUDA driver:        10.0
CUDA runtime:       N/A

I wonder why that is shown.

It seems that there is a global variable which uses 10.0 for the driver.
I can not find that. Any idea?

njuffa · April 11, 2020, 9:37pm

Every CUDA version has a minimum driver version that is required. You appear to have installed CUDA 10.1. Any executable produced with that will likewise require the same minimum driver version. The minimum driver version on Linux for CUDA 10.1 is:

https://stackoverflow.com/questions/30820513/what-is-the-correct-version-of-cuda-for-my-nvidia-driver

CUDA 10.1: 418.39

Your currently installed driver is 410.48, so that won’t work. Either upgrade the driver to at least the minimum version required by CUDA 10.1 (or a higher version of the driver), or revert to an earlier version of CUDA (it seems CUDA 10.0 should work, per the list).

Topic		Replies	Views
[Solved] Why is my CUDA driver version is insufficient for CUDA runtime version? CUDA Setup and Installation	2	11231	February 19, 2020
CUDA driver version is insufficient for CUDA runtime version for Quadro P420 CUDA Setup and Installation	7	1925	October 13, 2017
CUDA driver version is insufficient for CUDA runtime version CUDA Programming and Performance	3	8800	March 23, 2012
CUDA driver version is insufficient for CUDA runtime version CUDA Setup and Installation	0	2083	February 18, 2019
CUDA Driver and Runtime version mismatch problem CUDA Programming and Performance	15	20307	September 20, 2010
CUDA toolKit and Drivers problem-reg CUDA Developer Tools	0	413	September 19, 2020
Error: CUDA driver version is insufficient for CUDA runtime version CUDA Setup and Installation	11	41201	February 23, 2021
CUDA driver version is insufficient for CUDA runtime version TensorRT	3	2394	October 12, 2021
CUDA driver version is insufficient for CUDA runtime version CUDA Setup and Installation	2	7906	May 17, 2019
Problem installing Cuda 9.1 on MacOS 10.13.6 - cuda driver version insufficient for cuda runtime version CUDA Setup and Installation	0	718	October 10, 2018

CUDA driver and runtime mismatch

Related topics