Strange error

Hi, all:

I’m developing a Fortran GPU code with mix usage of cuda Fortran and OpenACC on Windows 10 with PGI 19.10 community version. All my data arrays are allocated with cuda fortran allocate statement. The allocatable arrays are defined in a separated module. The code compiles and runs with correct results. But today after a rebuild, suddenly the code cannot run and gives me following error.

0: ALLOCATE: copyin Symbol Memcpy FAILED:13(invalid device symbol)

I’m relatively new to cuda fortran and OpenACC. I don’t know what this error means and how to fix it.

My GPU is Tesla K40. The compiler options I used is:
F90FLAGS =-g -acc -ta=tesla:cuda10.1 -Minfo=accel
CUDAFLAGS = -Mcuda=cc35,rdc -Mcudalib=cublas

Any help and advice will be deeply appreicated.

John

Hi John,

This is somewhat of a generic error meaning that there’s something wrong with one of the device variables or routine names.

You said that this was working before you rebuilt. Has there been any changes on the system between the two builds? New CUDA driver? Updated compiler version? Different device?

I ask because you have slight mismatch in your flags:

F90FLAGS =-g -acc -ta=tesla:cuda10.1 -Minfo=accel
CUDAFLAGS = -Mcuda=cc35,rdc -Mcudalib=cublas

The OpenACC flags do not include “cc35” and the CUDA Fortan flags don’t include CUDA 10.1.

Not setting cc35 probably doesn’t matter since the compiler will default to the device on the build system. Though the compiler will also default to the CUDA version of the driver. So if the CUDA driver was updated, to let’s say 10.2, the CUDA Fortran files would be compiled with a different CUDA version than the OpenACC files. This mismatch could cause this error.

If this isn’t the problem, we’ll need more information and if possible, a reproducing example so we can investigate.

-Mat

Hi, Mat:

Thanks for pointing out the mismatch between OpenAcc flag and Cuda flag. I fixed it.

I fixed the invalid device symbol problem by restoring my system to a day ago. I was building magma library with cmake. Somehow during the configure and build process, the system got changed ( most likely the cuda 10.1 toolkit coming with PGI 19.10 as I used it to build magma ).
The build of magma failed though.

Can I ask another question? My application uses ZGEEV lapack subroutine. I cannot find any GPU/device version of ZGEEV. CUSOLVER does not have ZGEEV. Magma has an implementation but it take host arrays as input. Do you know which library has a device implementation of ZGEEV?

Thanks,

John

Hi John,

Sorry, I don’t know of any libraries that allows for device data to be passed to ZGEEV. cuSolver on implements a few LAPACK routines. MAGMA is most comprehensive, but as you point out, manages it’s own data. There was CULA, but it hasn’t been maintained for many years, so I can’t recommend using it.

-Mat