I’m working on accelerating the nuclear reaction integrator in the open source low Mach astrophysics code Maestro.
In the code we include the fixed-form Fortran source of LINPACK routines like daxpy*. I’m trying to mark it up with OpenACC as a simple sequential routine. However, I’m getting opaque errors that I’m hoping the forums can help me with. You can see the marked up code here. The error I’m getting is:
pgf95 -module t/Linux.PGI.omp.acc/m -It/Linux.PGI.omp.acc/m -mp=nonuma -Minfo=mp -acc -Minfo=acc -O2 -I/home/ajacobs/Codebase/MAESTRO/Microphysics/EOS/helmeos -c -o t/Linux.PGI.omp.acc/o/daxpy.o /home/ajacobs/Codebase/MAESTRO/Util/BLAS/daxpy.f
PGF90-S-0155-Compiler failed to translate accelerator region (see -Minfo messages): Matching ref index not found (/home/ajacobs/Codebase/MAESTRO/Util/BLAS/daxpy.f: 1)
daxpy:
1, Generating acc routine seq
37, Loop is parallelizable
51, Loop is parallelizable
56, Loop is parallelizable
0 inform, 0 warnings, 1 severes, 0 fatal for daxpy
This was compiled using PGI 15.10 via the OpenACCToolkit. It’s on a Linux workstation with a Maxwell GPU (GeForce GTX 960). I use this for development, the ultimate target is OLCF’s Titan supercomputer. Thanks for any assistance! This is my first time posting in the forums, so please let me know if I left out any essential info.
*I know there are GPU-aware libraries for such routines, but it’s not clear to me that we can compile them for the device and call from device. Using these libraries also complicates the need for our code to be architecture-agnostic, maximally portable, and maintained as a single codebase. Maybe we should use them, but we’d like to avoid it if possible.