Problem with fixed-form Fortran OpenACC, daxpy

AMJacobs · December 9, 2015, 7:54pm

I’m working on accelerating the nuclear reaction integrator in the open source low Mach astrophysics code Maestro.

In the code we include the fixed-form Fortran source of LINPACK routines like daxpy*. I’m trying to mark it up with OpenACC as a simple sequential routine. However, I’m getting opaque errors that I’m hoping the forums can help me with. You can see the marked up code here. The error I’m getting is:

pgf95   -module t/Linux.PGI.omp.acc/m -It/Linux.PGI.omp.acc/m  -mp=nonuma -Minfo=mp -acc -Minfo=acc -O2  -I/home/ajacobs/Codebase/MAESTRO/Microphysics/EOS/helmeos  -c -o t/Linux.PGI.omp.acc/o/daxpy.o /home/ajacobs/Codebase/MAESTRO/Util/BLAS/daxpy.f
PGF90-S-0155-Compiler failed to translate accelerator region (see -Minfo messages): Matching ref index not found (/home/ajacobs/Codebase/MAESTRO/Util/BLAS/daxpy.f: 1)
daxpy:
      1, Generating acc routine seq
     37, Loop is parallelizable
     51, Loop is parallelizable
     56, Loop is parallelizable
  0 inform,   0 warnings,   1 severes, 0 fatal for daxpy

This was compiled using PGI 15.10 via the OpenACCToolkit. It’s on a Linux workstation with a Maxwell GPU (GeForce GTX 960). I use this for development, the ultimate target is OLCF’s Titan supercomputer. Thanks for any assistance! This is my first time posting in the forums, so please let me know if I left out any essential info.

*I know there are GPU-aware libraries for such routines, but it’s not clear to me that we can compile them for the device and call from device. Using these libraries also complicates the need for our code to be architecture-agnostic, maximally portable, and maintained as a single codebase. Maybe we should use them, but we’d like to avoid it if possible.

brentl · December 9, 2015, 9:58pm

Looks like a real bug in our device code generator. I can get it to compile by commenting out these two lines:

! IF (INCX.LT.0) IX = (-N+1)*INCX + 1
! IF (INCY.LT.0) IY = (-N+1)*INCY + 1

which doesn’t make any sense other than there’s a bug.

If I find a work-around I’ll let you know.

There is a device-side cublas, as you note. The calling conventions change, (cublasDaxpy instead of daxpy) and they take a handle as the first argument. These functions may launch other kernels, so if you have “lots” of work to do in these functions, you might consider that. It you want the work done by every thread, then compiling the existing library code using acc seq is probably the way to go.

AMJacobs · December 10, 2015, 9:24pm

Thank you for your reply and insight.

Changing the offending code to the following, for some reason, appears to fix the issue:

      if (incx.lt.0) then
         ix = 1
         ix = ix+(-n+1)*incx
      endif

So this is a workaround for the moment.

Topic		Replies	Views
Problem with openacc routine and character variable Legacy PGI Compilers	1	2754	September 8, 2015
acc routine and Fortran Legacy PGI Compilers	6	14076	March 13, 2014
undefined reference to `__pgi_uacc_computestart' Legacy PGI Compilers	8	7618	June 14, 2018
OpenACC with cuBLAS and cuSPARSE in Fortran code Legacy PGI Compilers	7	8443	February 22, 2016
Fortran OpenACC code compiles, but does not use the device Legacy PGI Compilers	2	2607	September 5, 2017
Unsupported local variable Legacy PGI Compilers	8	5034	January 26, 2018
Error - undefined reference to __pgi_uacc_ Legacy PGI Compilers	3	7391	December 3, 2014
pgc++ -c -acc failed to compile with -O2 Legacy PGI Compilers	2	2542	August 26, 2019
simple multi-gpu test program not working Legacy PGI Compilers	4	4093	June 14, 2013
Compiling and linking OpenACC in different files Legacy PGI Compilers	1	3803	March 11, 2014

Problem with fixed-form Fortran OpenACC, daxpy

Related topics