reflected with dummy allocatable array argument

Hi,

I would like to use directives with a code where some allocatable arrays which may or may not be allocated are passed as argument to a subroutine. This is possible within fortran 2003. With the accelerator model I would like to use the reflected directive to avoid data transfer if they are allocated. Here is a test code:

module computation
  implicit none
  
contains

subroutine gpu_routine(nvec,nlev,a,option)
  real, allocatable, intent(inout) :: a(:,:)
  integer, intent(in) :: nvec,nlev,option
  integer :: i,k
  !$acc reflected(a)  

  if (option==1) then
     !$acc region do kernel 
     do i=1,nvec       
        do k=2,nlev 
           a(i,k)=a(i,k)*a(i,k-1)
        end do
     end do
     !$acc end region
  end if

end subroutine gpu_routine

end module computation
  
program main
  USE computation, only: gpu_routine
  implicit none
  real, allocatable :: a(:,:)
  !$acc mirror(a)
  integer, parameter :: n1=10000, nlev=60
  integer, parameter :: option=1

   
  if (option==1) then
     allocate(a(n1,nlev))
     !init a
     a=0.1
     !$acc update device(a) 
  end if

   
  call gpu_routine(n1,nlev,a,option)
 
  if (option==1) then
     !$acc update host(a)
     print*, sum(a)
  else
     print*, 'option=', option   
  end if

end program main

If I compile with
pgf90 -r8 -O3 -o test test.f90
the result is:
./test
1111.111111109653

If I now compile with -ta, I got a wrong result:
pgf90 -r8 -ta=nvidia -O3 -o test test.f90
./test
58994.21009943201

One other comment, if I change the option parameter:
integer, parameter :: option=0
in this case nothing should happen
I got the following error when compiling -ta=nvidia
./test
option= 0
call to cuMemFree returned error 1: Invalid value
CUDA driver version: 4000
althoug a should not be allocated in this case.

Is this a bug, or is this just not supported with reflected directive ?


Thanks,

Xavier

note: pgi version 11.6

Hi Xavier,

There is a compiler error here but it’s not due to the reflected or mirror clause. The bug is that “A” isn’t being copied back from the GPU and occurs if “a” has the allocatable attribute when declared in the “gpu_routine”. Removing “allocatable” will work around the problem (i.e. “real, intent(inout) :: a(:,:)”). I submitted this issue to our engineers (TPR#17938) who were able to fix the problem. The fix should be available in July’s 11.7 release.

We do apologize for the error and appreciate you bringing it to our attention.

  • Mat

Hi Mat,

Thanks for your reply and sending the issue.

Concerning the second part of my question, if I change the option parameter to 0 in the test example:

integer, parameter :: option=0

(instead of integer, parameter :: option=1 )

in this case nothing should happen
I got the following error when compiling -ta=nvidia
./test
option= 0
call to cuMemFree returned error 1: Invalid value
CUDA driver version: 4000
althoug a should not be allocated in this case.

I have also tried with
subroutine gpu_routine(nvec,nlev,a,option)
real, intent(inout) :: a(:,:)

but the program still gives me an error message.

Hi Xavier,

I sent in TPR#17983 for the “cuMemFree” error. Looks to me that we’re doing some garbage collection but aren’t accounting for cases where the mirrored variable isn’t actually allocated.

I have also tried with
subroutine gpu_routine(nvec,nlev,a,option)
real, intent(inout) :: a(:,:)

but the program still gives me an error message.

Hmm. I just tried again without a problem. Not sure why it would still fail for you.

% cat test1.f90
module computation
  implicit none
 
contains

subroutine gpu_routine(nvec,nlev,a,option)
  real, intent(inout) :: a(:,:)
  integer, intent(in) :: nvec,nlev,option
  integer :: i,k
  !$acc reflected(a) 

  if (option==1) then
     !$acc region do kernel
     do i=1,nvec       
        do k=2,nlev
           a(i,k)=a(i,k)*a(i,k-1)
        end do
     end do
     !$acc end region
  end if

end subroutine gpu_routine

end module computation
 
program main
  USE computation, only: gpu_routine
  implicit none
  real, allocatable :: a(:,:)
  !$acc mirror(a)
  integer, parameter :: n1=10000, nlev=60
  integer, parameter :: option=1

  if (option==1) then
     allocate(a(n1,nlev))
     !init a
     a=0.1
     !$acc update device(a)
  end if

   
  call gpu_routine(n1,nlev,a,option)
 
  if (option==1) then
     !$acc update host(a)
     print*, sum(a)
  else
     print*, 'option=', option   
  end if

end program main 
% pgf90 -V11.6 test1.cuf -ta=nvidia -o test1_v116_cpu.out -r8
danger3:/home/colgrove/tmp% pgf90 -V11.6 test1.cuf -ta=nvidia -o test1_v116_gpu.out -r8 -ta=nvidia
% test1_v116_cpu.out
    1111.111111109653     
% test1_v116_gpu.out
    1111.111111109653

Hi Mat,

I sent in TPR#17983 for the “cuMemFree” error. Looks to me that we’re doing some garbage collection but aren’t accounting for cases where the mirrored variable isn’t actually allocated.

Great, thanks.



mm. I just tried again without a problem. Not sure why it would still fail for you.

Sorry I wasn’t clear enough. I just meant that I also had the same “cuMemFree” error in the case where nothing should happen (i.e. “option=0”). The code is indeed working fine with “option=1” and using

real, intent(inout) :: a(:,:)

Xavier