Thanks for pointing out the typo with the “{” on ParallelForAll. I’ll let Jeff know. Though, “j” doesn’t need to be declared due to implicit typing.
Oh yeah, implicit typing…I always forget you can do that. Mainly because I was taught small puppies are sad when you don’t use “implicit none” in Fortran or “default(none)” in OpenMP.
But, as we’ll see soon, this matters!
As for routine, first make sure you have PGI 14.1 or later. OpenACC “routine” directive support for subroutines was added then. Function support was added in 14.2. From what I can tell, it appears that you’re using the directive correctly but may just be using 13.10.
Oh, I’m using PGI 14.1, but just to be sure, here’s without !$acc routine:
$ pgfortran -fast -r4 -Mextend -Mpreprocess -Ktrap=fp -Kieee -Minfo=all -tp=sandybridge-64 -V14.1 -acc -ta=nvidia:5.5,cc35 -DNITERS=6 -DGPU_PRECISION=8 -c src/sorad.acc.F90
PGF90-S-0155-Accelerator region ignored; see -Minfo messages (src/sorad.acc.F90: 327)
sorad:
327, Accelerator region ignored
...
538, Accelerator restriction: function/procedure calls are not supported
Loop not vectorized/parallelized: contains call
558, Accelerator restriction: unsupported call to 'deledd'
...
and now with:
$ pgfortran -fast -r4 -Mextend -Mpreprocess -Ktrap=fp -Kieee -Minfo=all -tp=sandybridge-64 -V14.1 -acc -ta=nvidia:5.5,cc35 -DNITERS=6 -DGPU_PRECISION=8 -c src/sorad.acc.F90
PGF90-S-0070-Incorrect sequence of statements (src/sorad.acc.F90: 1669)
0 inform, 0 warnings, 1 severes, 0 fatal for deledd
make[1]: *** [sorad.acc.o] Error 2
Or if I boil it down to what your example uses:
$ pgfortran -Minfo=all -V14.1 -acc -c src/sorad.acc.F90
PGF90-S-0070-Incorrect sequence of statements (src/sorad.acc.F90: 1669)
0 inform, 0 warnings, 1 severes, 0 fatal for deledd
PGI 14.2 also leads to the same error. (Though I’ve had to pretty much stop using PGI 14.2 due to its apparent fragility when it comes to linking as seen here.)
That said, here’s something that I noticed. Let’s make a couple versions of your testr.f90 code with one added line (disregarding spaces):
$ diff -u testr.f90 testr_in.f90
--- testr.f90 2014-03-11 07:04:45.393678000 -0400
+++ testr_in.f90 2014-03-11 07:18:52.267942000 -0400
@@ -29,6 +29,9 @@
subroutine doit( b, a, i)
!$acc routine vector
+
+ implicit none
+
real*4 :: b(*), a(*)
integer :: i
b(i) = a(i)*a(i)
$ pgfortran -V14.1 -acc -Minfo=accel testr_in.f90
PGF90-S-0070-Incorrect sequence of statements (testr_in.f90: 33)
0 inform, 0 warnings, 1 severes, 0 fatal for doit
$ diff -u testr.f90 testr_in2.f90
--- testr.f90 2014-03-11 07:04:45.393678000 -0400
+++ testr_in2.f90 2014-03-11 07:21:12.545263000 -0400
@@ -28,7 +28,11 @@
end
subroutine doit( b, a, i)
+
+ implicit none
+
!$acc routine vector
+
real*4 :: b(*), a(*)
integer :: i
b(i) = a(i)*a(i)
$ pgfortran -V14.1 -acc -Minfo=accel testr_in2.f90
testit:
18, Accelerator kernel generated
20, !$acc loop gang ! blockidx%x
18, Generating copy(a0(:))
Generating copy(b0(:))
Generating NVIDIA code
doit:
30, Generating acc routine vector
Generating NVIDIA code
$ ./a.out
100.0000 100.0000
400.0000 400.0000
900.0000 900.0000
1600.000 1600.000
2500.000 2500.000
3600.000 3600.000
4900.000 4900.000
6400.000 6400.000
8100.000 8100.000
10000.00 10000.00
12100.00 12100.00
14400.00 14400.00
16900.00 16900.00
19600.00 19600.00
22500.00 22500.00
25600.00 25600.00
So, it looks like !$acc routine must come after implicit none and not before.
I’ve pored over the OpenACC 2.0a API standard and I don’t see anything about order of “implicit none”, but there are a lot of “implicit” in the document. Is there something I’m violating?
Matt