I get this error while trying to run gromacs
NOTE: Detection of GPUs failed. The API reported:
CUDA driver version is insufficient for CUDA runtime version
GROMACS cannot run tasks on a GPU.
I am working on a shared computer with non-root access. I see these cuda versions in the system paths:
$ ls /usr/local/cuda -l
lrwxrwxrwx. 1 root root 9 Oct 17 2018 /usr/local/cuda -> cuda-10.0
But I have my own version in the home:
$ echo $LD_LIBRARY_PATH
/storage/users/mahmood/cuda-10.1.168/lib64
$ which nvcc
~/cuda-10.1.168/bin/nvcc
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Apr_24_19:10:27_PDT_2019
Cuda compilation tools, release 10.1, V10.1.168
The output of nvidia-smi looks like
$ which nvidia-smi
/bin/nvidia-smi
$ nvidia-smi
Sat Apr 11 16:11:30 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.48 Driver Version: 410.48 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... Off | 00000000:02:00.0 Off | N/A |
| 0% 36C P8 14W / 250W | 0MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 980 Ti Off | 00000000:81:00.0 Off | N/A |
| 75% 82C P2 164W / 250W | 819MiB / 6083MiB | 98% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
Also, deviceQuery works properly
$ ~/NVIDIA_CUDA-10.1_Samples/1_Utilities/deviceQuery/deviceQuery
/storage/users/mahmood/NVIDIA_CUDA-10.1_Samples/1_Utilities/deviceQuery/deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce GTX 1080 Ti"
CUDA Driver Version / Runtime Version 10.0 / 10.0
CUDA Capability Major/Minor version number: 6.1
Total amount of global memory: 11178 MBytes (11721506816 bytes)
(28) Multiprocessors, (128) CUDA Cores/MP: 3584 CUDA Cores
GPU Max Clock rate: 1683 MHz (1.68 GHz)
Memory Clock rate: 5505 Mhz
And CUDA_VISIBLE_DEVICES is fine
$ echo $CUDA_VISIBLE_DEVICES
0
The configure command was
$ cmake .. -DCMAKE_INSTALL_PREFIX=/storage/users/mahmood/cactus/gromacs/gromacs-2019.4-1080ti/single -DGMX_GPU=on -DGMX_CUDA_TARGET_SM=61
...
-- Looking for NVIDIA GPUs present in the system
-- Number of NVIDIA GPUs detected: 2
-- Found CUDA: /storage/users/mahmood/cuda-10.1.168 (found suitable version "10.1", minimum required is "7.0")
So, everything looks normal. I wonder why the binary is unable to use the device 0?
Any idea for more debugging?