CUDA with lammps. Cuda driver error 4 in call at file 'geryon/nvd_device.h' in line 135.

Hi there,

I’m trying to use CUDA 10.2 with LAMMPS for GPU accelerated computing. But when I submit a job I got following error:

LAMMPS (7 Aug 2019)
ERROR: Unable to initialize accelerator for use (…/gpu_extra.h:45)
Last command: package gpu 1
Cuda driver error 4 in call at file ‘geryon/nvd_device.h’ in line 135.
Cuda driver error 4 in call at file ‘geryon/nvd_device.h’ in line 135.
Cuda driver error 4 in call at file ‘geryon/nvd_device.h’ in line 135.
Cuda driver error 4 in call at file ‘geryon/nvd_device.h’ in line 135.
Cuda driver error 4 in call at file ‘geryon/nvd_device.h’ in line 135.

Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.


mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

Process name: [[40980,1],3]
Exit code: 1

I installed NVIDIA Driver first, the driver version is 440.44. Then I installed CUDA 10.2. Since the driver has already been installed so I did not install the driver implemented in the installation program of CUDA, which I think is 440.32 or something, different from the version I installed.

After that, I compiled LAMMPS-7Aug19(a recent stable release) with the installed CUDA 10.2 and the compilation was successful. I can run LAMMPS without using GPU but when I use GPU, I got the error above. OpenMPI used here is 4.0.2.

I’ve tried to run this with Nsights Compute 2019.4 but I got the same error.

Should I tried an older version of LAMMPS or install the driver implemented in CUDA installation program?

Following is my GPU and OS info:

OS: ubuntu 18.04
GPU: ASUS RTX 2080 Ti Turbo

verify your CUDA install using the instructions given in the CUDA linux install guide, before trying to use lammps

Thank you for your reply. I conducted all the mandatory post installation process and verified the installation of cuda. I then compiled LAMMPS again with ‘make’(maybe I should use ‘cmake’?) but the error info is the same. I tried Nsight Compute again. This time I used the Nsight coming with CUDA 10.2, which is Nsight Compute 2019.5.

I’ll try some LAMMPS GPU example to verify if my input is correct.

Here are the results of cuda installation verification:

run deviceQuery:

./deviceQuery Starting…

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: “GeForce RTX 2080 Ti”
CUDA Driver Version / Runtime Version 10.2 / 10.2
CUDA Capability Major/Minor version number: 7.5
Total amount of global memory: 11016 MBytes (11551440896 bytes)
(68) Multiprocessors, ( 64) CUDA Cores/MP: 4352 CUDA Cores
GPU Max Clock rate: 1545 MHz (1.54 GHz)
Memory Clock rate: 7000 Mhz
Memory Bus Width: 352-bit
L2 Cache Size: 5767168 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 1024
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 3 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Compute Preemption: Yes
Supports Cooperative Kernel Launch: Yes
Supports MultiDevice Co-op Kernel Launch: Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 103 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.2, CUDA Runtime Version = 10.2, NumDevs = 1
Result = PASS

run bandwidthTest:

[CUDA Bandwidth Test] - Starting…
Running on…

Device 0: GeForce RTX 2080 Ti
Quick Mode

Host to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(GB/s)
32000000 12.2

Device to Host Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(GB/s)
32000000 13.1

Device to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(GB/s)
32000000 516.8

Result = PASS

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

Your CUDA install looks fine. You may want to ask for help on a lammps forum.

Thank you for your help!

I compiled LAMMPS again with cmake. Now it’s working properly.