acc routine and Fortran

All,

I’m hoping someone here can help me with this. Long ago, I used the PGI Accelerator directives, but limitations with those led me to CUDA Fortran. But now I’m trying to venture back into the brave new world of OpenACC. Well, OpenACC 2.0 because my simplest accelerator kernel has subroutine calls within. Thus, I need !$acc routine. My main question, though, is how exactly do you use it?

I’ve tried searching around the web for ‘acc routine’ and I see quite a few examples in C, but I’ve only ever seen one for Fortran at this page. (And since that has a subroutine call that has a brace at the end:

subroutine foo(v, i, n) {

and isn’t even valid Fortran (anyone see where “j” is declared?) I’m not too confident of it.) Still, it’s an example.

So, my code looks something like:

module soradmod
...
contains
subroutine sorad(...)
...
   call deledd(...)
   call deledd(...)
...
end subroutine sorad

subroutine deledd(...)
...
end subroutine deledd

end module soradmod

Now, it’s much more complex, and in truth there are subroutine calls to subroutines external to soradmod, but for now, let’s deal with deledd.

So, after adding some !$acc kernels, a few !$acc loop private to deal with some -Minfo messages, I get:

pgfortran -fast -r4 -Mextend -Mpreprocess -Ktrap=fp -Kieee -Minfo=all -tp=sandybridge-64 -acc -ta=nvidia:5.5,cc35 -DNITERS=6 -DGPU_PRECISION=8 -c src/sorad.acc.F90
PGF90-S-0155-Accelerator region ignored; see -Minfo messages  (src/sorad.acc.F90: 327)
sorad:
    327, Accelerator region ignored
    341, Loop not vectorized/parallelized: too deeply nested
    362, Loop not vectorized: data dependency
    387, Loop unrolled 4 times (completely unrolled)
    396, Memory zero idiom, loop replaced by call to __c_mzero4
    397, Memory zero idiom, loop replaced by call to __c_mzero4
    398, Memory zero idiom, loop replaced by call to __c_mzero4
    399, Memory zero idiom, loop replaced by call to __c_mzero4
    400, Memory zero idiom, loop replaced by call to __c_mzero4
    402, Memory zero idiom, loop replaced by call to __c_mzero4
    403, Memory zero idiom, loop replaced by call to __c_mzero4
    405, Memory zero idiom, loop replaced by call to __c_mzero4
    406, Memory zero idiom, loop replaced by call to __c_mzero4
    407, Memory zero idiom, loop replaced by call to __c_mzero4
    413, Loop not fused: different loop trip count
         Loop not vectorized: may not be beneficial
    423, Loop not fused: function call before adjacent loop
         Loop not vectorized: may not be beneficial
         Loop unrolled 8 times (completely unrolled)
    505, Loop not fused: different controlling conditions
    518, Generated 4 alternate versions of the loop
         Generated vector sse code for the loop
         Generated 8 prefetch instructions for the loop
    519, Loop unrolled 4 times (completely unrolled)
    531, Loop not vectorized/parallelized: too deeply nested
    538, Accelerator restriction: function/procedure calls are not supported
         Loop not vectorized/parallelized: contains call
    558, Accelerator restriction: unsupported call to 'deledd'
...

And, of course, it sees the deledd call. So, I then try, a la the link above:

subroutine deledd(...)
!$acc routine
...
end subroutine deledd

and:

pgfortran -fast -r4 -Mextend -Mpreprocess -Ktrap=fp -Kieee -Minfo=all -tp=sandybridge-64 -acc -ta=nvidia:5.5,cc35 -DNITERS=6 -DGPU_PRECISION=8 -c src/sorad.acc.F90
PGF90-S-0070-Incorrect sequence of statements  (src/sorad.acc.F90: 1669)
  0 inform,   0 warnings,   1 severes, 0 fatal for deledd

Hmm. I also try:

!$acc routine vector
!$acc routine worker
!$acc routine gang

but each one gives me the same error. I’ve tried putting the !$acc statements above the subroutine declaration, no go. I’ve tried:

!$acc routine(deledd)

in various places, no go.

Any help? I’m hoping if I can figure this out, I can then try and figure out how to then use routines that are in different files. (Heck, I can’t even get -Mextract/-Minline to work, so !$acc routine across different files is daunting!)

Thanks,
Matt

Hi Matt,

Thanks for pointing out the typo with the “{” on ParallelForAll. I’ll let Jeff know. Though, “j” doesn’t need to be declared due to implicit typing.

As for routine, first make sure you have PGI 14.1 or later. OpenACC “routine” directive support for subroutines was added then. Function support was added in 14.2. From what I can tell, it appears that you’re using the directive correctly but may just be using 13.10.

Here’s a very simple example.

% cat testr.f90
module testme
    integer, parameter :: N = 16
contains
subroutine testit
    real*4 :: a0(N), b0(N), b1(N)
    integer :: acc(6), exp(6)
    integer :: i
    do i = 1, N
       a0(i) = real(i) * 10.0
       b0(i) = -1.0
       b1(i) = -2.0
    enddo

    do i=1,N
       call doit(b1,a0,i)
    enddo

    !$acc parallel
    !$acc loop
    do i = 1, N
       call doit( b0, a0, i )
    enddo
    !$acc end parallel
    do i = 1, N
       print *, b0(i), b1(i)
    enddo

end

subroutine doit( b, a, i)
!$acc routine vector
    real*4 :: b(*), a(*)
    integer :: i
    b(i) = a(i)*a(i)
end
end module testme

program main()
    use openacc
    use testme
    call testit()
end
sbe02:/local/home/colgrove% pgf90 testr.f90 -V14.1 -acc -Minfo=accel; a.out
testit:
     18, Accelerator kernel generated
         20, !$acc loop gang ! blockidx%x
     18, Generating copy(a0(:))
         Generating copy(b0(:))
         Generating NVIDIA code
doit:
     30, Generating acc routine vector
         Generating NVIDIA code
    100.0000        100.0000
    400.0000        400.0000
    900.0000        900.0000
    1600.000        1600.000
    2500.000        2500.000
    3600.000        3600.000
    4900.000        4900.000
    6400.000        6400.000
    8100.000        8100.000
    10000.00        10000.00
    12100.00        12100.00
    14400.00        14400.00
    16900.00        16900.00
    19600.00        19600.00
    22500.00        22500.00
    25600.00        25600.00

Thanks for pointing out the typo with the “{” on ParallelForAll. I’ll let Jeff know. Though, “j” doesn’t need to be declared due to implicit typing.

Oh yeah, implicit typing…I always forget you can do that. Mainly because I was taught small puppies are sad when you don’t use “implicit none” in Fortran or “default(none)” in OpenMP.

But, as we’ll see soon, this matters!

As for routine, first make sure you have PGI 14.1 or later. OpenACC “routine” directive support for subroutines was added then. Function support was added in 14.2. From what I can tell, it appears that you’re using the directive correctly but may just be using 13.10.

Oh, I’m using PGI 14.1, but just to be sure, here’s without !$acc routine:

$ pgfortran -fast -r4 -Mextend -Mpreprocess -Ktrap=fp -Kieee -Minfo=all -tp=sandybridge-64 -V14.1 -acc -ta=nvidia:5.5,cc35 -DNITERS=6 -DGPU_PRECISION=8 -c src/sorad.acc.F90
PGF90-S-0155-Accelerator region ignored; see -Minfo messages  (src/sorad.acc.F90: 327)
sorad:
    327, Accelerator region ignored
...
    538, Accelerator restriction: function/procedure calls are not supported
         Loop not vectorized/parallelized: contains call
    558, Accelerator restriction: unsupported call to 'deledd'
...

and now with:

$ pgfortran -fast -r4 -Mextend -Mpreprocess -Ktrap=fp -Kieee -Minfo=all -tp=sandybridge-64 -V14.1 -acc -ta=nvidia:5.5,cc35 -DNITERS=6 -DGPU_PRECISION=8 -c src/sorad.acc.F90
PGF90-S-0070-Incorrect sequence of statements  (src/sorad.acc.F90: 1669)
  0 inform,   0 warnings,   1 severes, 0 fatal for deledd
make[1]: *** [sorad.acc.o] Error 2

Or if I boil it down to what your example uses:

$ pgfortran -Minfo=all -V14.1 -acc -c src/sorad.acc.F90 
PGF90-S-0070-Incorrect sequence of statements  (src/sorad.acc.F90: 1669)
  0 inform,   0 warnings,   1 severes, 0 fatal for deledd

PGI 14.2 also leads to the same error. (Though I’ve had to pretty much stop using PGI 14.2 due to its apparent fragility when it comes to linking as seen here.)

That said, here’s something that I noticed. Let’s make a couple versions of your testr.f90 code with one added line (disregarding spaces):

$ diff -u testr.f90 testr_in.f90
--- testr.f90	2014-03-11 07:04:45.393678000 -0400
+++ testr_in.f90	2014-03-11 07:18:52.267942000 -0400
@@ -29,6 +29,9 @@
 
 subroutine doit( b, a, i) 
 !$acc routine vector 
+
+    implicit none
+
     real*4 :: b(*), a(*) 
     integer :: i 
     b(i) = a(i)*a(i) 
$ pgfortran -V14.1 -acc -Minfo=accel testr_in.f90
PGF90-S-0070-Incorrect sequence of statements  (testr_in.f90: 33)
  0 inform,   0 warnings,   1 severes, 0 fatal for doit
$ diff -u testr.f90 testr_in2.f90
--- testr.f90	2014-03-11 07:04:45.393678000 -0400
+++ testr_in2.f90	2014-03-11 07:21:12.545263000 -0400
@@ -28,7 +28,11 @@
 end 
 
 subroutine doit( b, a, i) 
+
+    implicit none
+
 !$acc routine vector 
+
     real*4 :: b(*), a(*) 
     integer :: i 
     b(i) = a(i)*a(i) 
$ pgfortran -V14.1 -acc -Minfo=accel testr_in2.f90
testit:
     18, Accelerator kernel generated
         20, !$acc loop gang ! blockidx%x
     18, Generating copy(a0(:))
         Generating copy(b0(:))
         Generating NVIDIA code
doit:
     30, Generating acc routine vector
         Generating NVIDIA code
$ ./a.out
    100.0000        100.0000    
    400.0000        400.0000    
    900.0000        900.0000    
    1600.000        1600.000    
    2500.000        2500.000    
    3600.000        3600.000    
    4900.000        4900.000    
    6400.000        6400.000    
    8100.000        8100.000    
    10000.00        10000.00    
    12100.00        12100.00    
    14400.00        14400.00    
    16900.00        16900.00    
    19600.00        19600.00    
    22500.00        22500.00    
    25600.00        25600.00

So, it looks like !$acc routine must come after implicit none and not before.

I’ve pored over the OpenACC 2.0a API standard and I don’t see anything about order of “implicit none”, but there are a lot of “implicit” in the document. Is there something I’m violating?

Matt

I’ve pored over the OpenACC 2.0a API standard and I don’t see anything about order of “implicit none”, but there are a lot of “implicit” in the document. Is there something I’m violating?

I hadn’t encountered this before (routine is new for me too) but will ask our compiler folks about it. The OpenACC standard just states that it needs to be in the specification part, but since “implicit none” is part of the specification, it seems to allow it.

What I don’t know is if the authors of the standard didn’t account for “Use” and “implicit” and meant to say before the definition part, or if PGI is being too strict.

In any event, we do need to have better documentation for “routine” as more folks begin to use it.

  • Mat

Hiya,

I think I may just found a related issue, if it will help TheMatt, I don’t know!

The following will only work if the comments (!XXX) are removed:

!XXXmodule gpu_subs
subroutine simple
!$ACC ROUTINE
  stuff
end subroutine simple
!XXXend module gpu_subs

subroutine complicated
!XXXuse module gpu_subs
stuff
!$ACC LOOP INDEPENDENT
do i = small, big
    call simple
end do
stuff
end subroutine complicated

The use, or non-use, of “implicit none” and “use module xyz” within simple have no effect on the success of the compilation.

I can understand why this might be expected, but thought it may help someone.

Cheers,

Karl

Hi Karl,

Good to hear from you!

In order to use OpenACC “routine” in Fortran, there must be a F90 interface and why this wont work without “simple” being a module. Without an interface, the compiler doesn’t know that the “routine” directive has been used.

Instead of a module, you could create an explicit interface block inside of “complicated”, provided the interface also contains the “routine” directive.

  • Mat

Hi Mat,

It’s been a while :-) Too many projects, too little time…

I did try this with no joy. I also tried adding !$ACC ROUTINE in the interface but that didn’t help. I’ll revisit the issue later and get back to you with more details.

Cheers,

Karl