pgaccelinfo error code=999

DavidGutzwiller · March 18, 2020, 10:23pm

One of my colleagues is able to build with PGI19.4 on his local workstation but is encountering a crash at runtime. I was able to reproduce the same error with pgaccelinfo:

nint0112:~/BUILD21/> /common/pgi/linux86-64/19.4/bin/pgaccelinfo -v
CUDA Driver Version: 10020
NVRM version: NVIDIA UNIX x86_64 Kernel Module 440.64 Fri Feb 21 01:17:26 UTC 2020
could not initialize CUDA runtime, error code=999
No accelerators found.
Check the permissions on your CUDA device

Interestingly, nvidia-smi does not indicate any problems

It looks like the local nvidia drivers are quite new, version 440.64 with CUDA 10.2. I don’t think this should be a problem for a PGI19.4 executable. Is this correct? I saw some other postings that mentioned some file permission issues, but I don’t see any problems in this regard.

int0112:~/BUILD21> ls -lah /dev/nvidia0
crw-rw-rw- 1 root root 195, 0 Mär 18 08:22 /dev/nvidia0

Are you aware of any other workarounds for this issue?

Thanks,
David

MatColgrove · March 19, 2020, 4:23pm

Hi David,

I don’t think there’s an incompatibility with using 19.4, which supports up to CUDA 10.1, and using a CUDA 10.2 driver. At least I didn’t see any issues on a system with CUDA 10.2 driver, albeit a slightly older version, 440.33. Granted, there have been driver issues in the past, so it could be a specific problem with 440.64. I’ll see if my IT folks can install 440.64 on a system for me to test.

Though, this looks similar to issues I’ve seen in the past where libcuda.so isn’t installed properly in the system’s lib directory (or maybe has the wrong permissions) and pgaccelinfo is picking up the OpenCL driver. Are you able to run a simple CUDA code?

Note that PGI 20.1 does support CUDA 10.2, so you might try updating the compiler version as well if it does turn out to be a CUDA 10.1 vs 10.2 compatibility issue.

-Mat

DavidGutzwiller · March 19, 2020, 5:21pm

Hi Mat,

Thanks for the response. I tested a PGI 19.4 build of our solver on a separate node also running CUDA 10.2 and it worked, so indeed there does not seem to be a fundamental compatibility issue.

I’ll check on libcuda and see if it has changed recently. Unfortunately I don’t have root access this system so making changes will be painful. The developer reports that he was able to run his code a few days ago, but unfortunately it is not clear how he had his system configured at the time.

To be continued…

-David

Topic		Replies	Views
Error when verifying GPU is on when using PGI Legacy PGI Compilers	1	7327	February 18, 2020
Accelerator not found: EC2 p2.xlarge, PGI Community Edition Legacy PGI Compilers	3	3057	April 15, 2019
pgaccelinfo does not list gpu device Legacy PGI Compilers	2	2754	April 24, 2019
No accelerators found nvc, nvc++ and nvfortran	4	719	October 6, 2023
No GPU acceleration in blackschole sample? Legacy PGI Compilers	5	4369	April 11, 2012
Accelerators not found Legacy PGI Compilers	8	14319	June 6, 2010
Error Message: call to cuInit returned error 100: No device Legacy PGI Compilers	3	19328	July 12, 2010
could not initialize CUDA runtime, error code=100 Legacy PGI Compilers	2	11718	February 6, 2012
pgc++-Error-CUDA version 8.0 is not available in this instal Legacy PGI Compilers	2	5078	February 16, 2017
need to run pgaccelinfo as root once Legacy PGI Compilers	4	11624	August 10, 2020

pgaccelinfo error code=999

Related topics