MAXLOC and MAXVEC intrinsics with Accelerator GPU

Hi,
I have problems using the MAXLOC and MAXVEC functions within accelerator regions. I will use the following f90 code snippet to illustrate what happens:

program test
integer :: iii(1),N
real :: x(10), vmax(1), zz
N=10
!$acc region copy(x)
do i=1,N
  x(i) = -N/2.+i
end do
iii = maxloc(x)
zz    = x(iii(1))
vmax = maxval(x)
write(*,*) '          |   MAXLOC  | X(MAXLOC) |   MAXVAL  |    X(1)   |    X(N)   |'
write(*,'(A,I10,2X,4(F10.2,2X))') ' IN REGION:',iii(1), zz, vmax(1), x(1), x(N)
!$acc end region
write(*,'(A,I10,2X,4(F10.2,2X))') 'OUT REGION:',iii(1), zz, vmax(1), x(1), x(N)
end program test

When compiled with the following command

module load compilers/pgi/11.10
module load cuda/4.0
pgf90 -Minfo=all -ta=nvidia:cc20,cuda4.0 test3.f90 -o test.gpu

I obtain the following results:

           |   MAXLOC  | X(MAXLOC) |   MAXVAL  |    X(1)   |    X(N)   |
 IN REGION:         0        0.00        0.00        0.00        0.00
OUT REGION:         0        0.00        5.00       -4.00        5.00

while when I do not use the GPU, e.g. compiling the code this way,

pgf90 -Minfo=all test3.f90 -o test

I obtain the correct results:

           |   MAXLOC  | X(MAXLOC) |   MAXVAL  |    X(1)   |    X(N)   |
 IN REGION:        10        5.00        5.00       -4.00        5.00
OUT REGION:        10        5.00        5.00       -4.00        5.00

Thank you very much for your help!
Domenico

Hi Domenico.

The good news is that maxval is fine, however, maxloc isn’t supported yet within an accelerator region. This feature request is being tracked as TPR#17664. I’ll add your code to this ticket and see if we can get the priority bumped up a bit.

Thanks,
Mat

Hi Mat,

thank you very much for the help and the information. I have just another little doubt: is it normal that if I print vmax(1) inside the accelerator region I obtain 0, while the same print outside the accelerator region gives the correct result?

Sincerely,
Domenico

Hi Domenico,

Sorry, I didn’t notice that you had write statements within your accelerator region. Since write statements can’t be executed on the device, it’s actually being executed on the host side. Since vmax’s value doesn’t get copied back from the device the end of the region, you actually printing out the host copy, which is zero.

This is what’s happening with maxval as well. maxval is being executed on the host with the host’s copy of x.

  • Mat