total/free CUDA memory: 0/0 using openacc with PGI 17.5

Hi,

I am having issues with PGI version 17.5 while the code works fine with version 16.10. Hence, I am wondering if there have been some changes of the compiler that might cause that. It is reproducible on two different systems.

The crash occurs the first time openacc is used (see below). If I disable the first openacc statement it crashes at the next one and the backtrace of the __pgi_uacc functions is the same.

total/free CUDA memory: 0/0
Application 3779914 is crashing. ATP analysis proceeding...

ATP Stack walkback for Rank 0 starting:
....
  __pgi_uacc_initialize@init.c:701
  __pgi_uacc_enumerate@init.c:538
  __pgi_uacc_cuda_init@cuda_init.c:369
  __pgi_uacc_cuda_error_handler@cuda_error.c:64

The code use MPI, OpenACC, CUFFT and some CUDA kernels for important code parts. Unified memory is not used. The compiler and linking flags are

                "FFLAGS= -O2 -acc -ta=tesla:cc60,cuda8.0 -Minfo=accel -Mcuda=cuda8.0  " \
                "LFLAGS=  " \
                "LIBS= -acc -ta=tesla:cc60,cuda8.0 -Minfo=accel -Mcuda=cuda8.0  -lcufft " \

Thanks for your help,
Richard

Hi Richard,

There have been many improvements with the compilers between 17.5 and 16.10, though what’s causing this problem, I unfortunately don’t know.

The crash seems to be occurring when the runtime is first initializing the device but I can’t think of any change in the compiler that would cause this.

Can you post or send to PGI Customer Service (trs@pgroup.com) a reproducing example?

Thanks,
Mat

Hi Mat,

thanks for the answer. I will get in touch with them.

In the meantime I had another look at the code. When I disable all the CUDA code and remove the -Mcuda=cuda8.0 it works. Only removing the CUDA code doesn’t help so the issue seems to be coming from the -Mcuda=cuda8.0

Best regards,
Richard