nvlink error with Fortran, OpenACC, PGI 15.10

I’m continuing with the development I referenced in another forums post. With the help of the forums and others I’ve managed to resolve that issue.

Now I’m running into a nvlink error. I have tried to make as simple a reproduction as practical. You can find the code for the reproduction here. Simply git clone that and type make in the directory on a linux workstation with PGI 15.10 and you should get the same error. The error I get is:

pgf95 -module build -Ibuild -acc -Minfo=acc -g t1.f90 build/bl_types.o build/bl_constants.o build/vddot.o build/idamax.o build/dscal.o build/daxpy.o build/dgefa.o build/dgesl.o build/bdf.o -o t1.exe
t1.f90:
nvlink error   : Undefined reference to 'cudaMalloc' in 'build/bdf.o'
nvlink error   : Undefined reference to 'cudaFree' in 'build/bdf.o'
pgacclnk: child process exit status 2: /opt/pgi/linux86-64/15.10/bin/pgnvd
Makefile:13: recipe for target 't1.exe' failed
make: *** [t1.exe] Error 2

Beyond the fact that there’s an error occurring in bdf.f90, I cannot really parse exactly what’s causing this problem. I’m using OpenACC, so I have no real explicit control over any cudaMalloc or other such calls generated by the compiler. I’m hoping someone here can assist me in locating what about bdf.f90 is causing this nvlink error. Thank you for your assistance.

I got it to link with some difficulty.

My guess is there are some seq routines in bdf.f90 with automatic arrays.
We didn’t support this until just recently. But, there are restrictions.

  1. When you compile and link with pgfortran, you should use the -ta=tesla,cc35 option. We have to use the device side cudaMalloc and cudaFree to create the automatic arrays (alloca doesn’t really exist in device code). These device side calls were added with dynamic parallelism, and I believe require cc35 or higher.

  2. Link with -Mcuda. This will pull in the correct libraries.

Thanks so much Brent!

I was able to get this to compile with the following flags:

FFLAGS  = -module build -Ibuild -acc -Minfo=acc -Mcuda=cuda7.0 -ta=tesla,cc35

If I leave the -g flag in I get an error that indeed appears to be related to automatic (or, in particular in this case, assumed shape) arrays:

PGF90-S-0000-Internal compiler error. size_of:bad dtype     102 (bdf.f90: 1025)

The function that triggers this errors has the following input array:

   real(kind=dp_t), intent(in   ) :: arr(:)