how to do a "!$acc kernels" call before a subrouti

Dear forum,

While this f90/openacc code compiles and runs fine:

program x
...
!$acc data copyin(p)
!$acc enter data copyin(p%rho)
        call mysub(p)
!$acc exit data copyout(p%rho)
!$acc end data
print *,particles%rho(1)
end program x



subroutine mysub(p)
!$acc kernels &
!$acc    present( p,  p%rho )
        do i=1,n
                p%rho(i) = 1.0
        enddo
!$acc end kernels
end subroutine mysub



where p is declared as:
type fluid
    real, allocatable :: rho(:)
end type fluid
type(fluid) :: p

-------------------------

We would like to do the same using:

!$acc data copyin(p)
!$acc enter data copyin(p%rho)
!$acc kernels &                       <-----------
!$acc    present( p,  p%rho )    <-----------
        call mysub(p)
!$acc end kernels                    <-----------
!$acc exit data copyout(p%rho)
!$acc end data
print *,particles%rho(1)



subroutine mysub(p)
!$acc routine gang
...
!commentout !$acc kernels &
!commentout !$acc    present( p,  p%rho )
        do i=1,n
                p%rho(i) = 1.0
        enddo
!commentout !$acc end kernels
end subroutine mysub

Both codes are compiled with: -acc -ta=nvidia:cc35, the last version gives wrong result.
Is there a way to do this ?

jg.

can share the code if needed.

can share the code if needed.

Please do. It’s always helpful to work with a full example.

Though, I’d call a “routine gang” from a “parallel” region instead of a “kernels”. Also, I’d add a loop directive in the routine to tell the compiler where you what to apply the gang parallelism.

  • Mat

Hi Mat,

Thank you for helping.

Here is the code:
git clone https://github.com/eth-cscs/openacc.git
then
cat 20150815/PGI/readme.pgi

Same src can be compiled in 2 versions: make will create a successful exe, make X=fail will create an exec giving wrong results. Outputs are in
20150815/PGI/o_PGI.fail
20150815/PGI/o_PGI.success

jg.

Thanks jg.

As I suspected, it appears to be a problem when calling a “routine gang” from a “kernels” region. If you instead call the routine from a “parallel” region, I show the program getting correct answers.

I have filed TPR#21879 and sent it on to engineering for further investigation.

Best Regards,
Mat

To bump a very old topic, for anyone encountering a similar issue after the fact, this problem is fixed with PGI release 19.5 and above.