All,
I’m hoping someone here can help me with this. Long ago, I used the PGI Accelerator directives, but limitations with those led me to CUDA Fortran. But now I’m trying to venture back into the brave new world of OpenACC. Well, OpenACC 2.0 because my simplest accelerator kernel has subroutine calls within. Thus, I need !$acc routine. My main question, though, is how exactly do you use it?
I’ve tried searching around the web for ‘acc routine’ and I see quite a few examples in C, but I’ve only ever seen one for Fortran at this page. (And since that has a subroutine call that has a brace at the end:
subroutine foo(v, i, n) {
and isn’t even valid Fortran (anyone see where “j” is declared?) I’m not too confident of it.) Still, it’s an example.
So, my code looks something like:
module soradmod
...
contains
subroutine sorad(...)
...
call deledd(...)
call deledd(...)
...
end subroutine sorad
subroutine deledd(...)
...
end subroutine deledd
end module soradmod
Now, it’s much more complex, and in truth there are subroutine calls to subroutines external to soradmod, but for now, let’s deal with deledd.
So, after adding some !$acc kernels, a few !$acc loop private to deal with some -Minfo messages, I get:
pgfortran -fast -r4 -Mextend -Mpreprocess -Ktrap=fp -Kieee -Minfo=all -tp=sandybridge-64 -acc -ta=nvidia:5.5,cc35 -DNITERS=6 -DGPU_PRECISION=8 -c src/sorad.acc.F90
PGF90-S-0155-Accelerator region ignored; see -Minfo messages (src/sorad.acc.F90: 327)
sorad:
327, Accelerator region ignored
341, Loop not vectorized/parallelized: too deeply nested
362, Loop not vectorized: data dependency
387, Loop unrolled 4 times (completely unrolled)
396, Memory zero idiom, loop replaced by call to __c_mzero4
397, Memory zero idiom, loop replaced by call to __c_mzero4
398, Memory zero idiom, loop replaced by call to __c_mzero4
399, Memory zero idiom, loop replaced by call to __c_mzero4
400, Memory zero idiom, loop replaced by call to __c_mzero4
402, Memory zero idiom, loop replaced by call to __c_mzero4
403, Memory zero idiom, loop replaced by call to __c_mzero4
405, Memory zero idiom, loop replaced by call to __c_mzero4
406, Memory zero idiom, loop replaced by call to __c_mzero4
407, Memory zero idiom, loop replaced by call to __c_mzero4
413, Loop not fused: different loop trip count
Loop not vectorized: may not be beneficial
423, Loop not fused: function call before adjacent loop
Loop not vectorized: may not be beneficial
Loop unrolled 8 times (completely unrolled)
505, Loop not fused: different controlling conditions
518, Generated 4 alternate versions of the loop
Generated vector sse code for the loop
Generated 8 prefetch instructions for the loop
519, Loop unrolled 4 times (completely unrolled)
531, Loop not vectorized/parallelized: too deeply nested
538, Accelerator restriction: function/procedure calls are not supported
Loop not vectorized/parallelized: contains call
558, Accelerator restriction: unsupported call to 'deledd'
...
And, of course, it sees the deledd call. So, I then try, a la the link above:
subroutine deledd(...)
!$acc routine
...
end subroutine deledd
and:
pgfortran -fast -r4 -Mextend -Mpreprocess -Ktrap=fp -Kieee -Minfo=all -tp=sandybridge-64 -acc -ta=nvidia:5.5,cc35 -DNITERS=6 -DGPU_PRECISION=8 -c src/sorad.acc.F90
PGF90-S-0070-Incorrect sequence of statements (src/sorad.acc.F90: 1669)
0 inform, 0 warnings, 1 severes, 0 fatal for deledd
Hmm. I also try:
!$acc routine vector
!$acc routine worker
!$acc routine gang
but each one gives me the same error. I’ve tried putting the !$acc statements above the subroutine declaration, no go. I’ve tried:
!$acc routine(deledd)
in various places, no go.
Any help? I’m hoping if I can figure this out, I can then try and figure out how to then use routines that are in different files. (Heck, I can’t even get -Mextract/-Minline to work, so !$acc routine across different files is daunting!)
Thanks,
Matt