Hi there,
This will almost certainly expose some misunderstanding I have with OpenACC, but I don’t know why this code runs differently with and without the $acc statements:
program acc_error_test
implicit none
real(SELECTED_REAL_KIND( P=12,R=60)) :: temp1(20,260)
integer :: b,bb,n
integer :: g
integer :: i,i2,j1,j2
integer :: MtrBind(260), MtrBpar(0:259)
!---------------------------------------------------------------------------------------!
MtrBpar = 0
do i = 1, 260
if (i .le. 4) then
MtrBind(i) = 1
MtrBpar(i) = MtrBpar(i - 1) + 2
else
MtrBind(i) = i
MtrBpar(i) = MtrBpar(i - 1)
end if
end do
b = 1
i2 = 20
!$acc parallel copyout(temp1),pcopyin(MtrBpar, MtrBind)
temp1 = 0.0
j2 = 0
do n = 1, 260
bb = MtrBind(n)
j1 = j2 + 1
j2 = j2 + MtrBpar(bb) - MtrBpar(bb-1)
if (bb == b) then
temp1(1:i2,j1:j2) = -1.0
endif
enddo
!$acc end parallel
print *, temp1(1:i2,1)
end program acc_error_test
With the $acc statements, the output is
-1.000000000000000 0.000000000000000 0.000000000000000
0.000000000000000 0.000000000000000 0.000000000000000
0.000000000000000 0.000000000000000 0.000000000000000
0.000000000000000 0.000000000000000 0.000000000000000
0.000000000000000 0.000000000000000 0.000000000000000
0.000000000000000 0.000000000000000 0.000000000000000
0.000000000000000 0.000000000000000
Without the $acc statements, the output is
-1.000000000000000 -1.000000000000000 -1.000000000000000
-1.000000000000000 -1.000000000000000 -1.000000000000000
-1.000000000000000 -1.000000000000000 -1.000000000000000
-1.000000000000000 -1.000000000000000 -1.000000000000000
-1.000000000000000 -1.000000000000000 -1.000000000000000
-1.000000000000000 -1.000000000000000 -1.000000000000000
-1.000000000000000 -1.000000000000000
The original code has some loops within the parallel region which I’d like to accelerate, but I managed to isolate this behaviour that’s giving me problems.
I’m compiling:
pgf90 -acc -Minfo=accel -Mlarge_arrays -mcmodel=medium -fast -o acc_error_test acc_error_test.f90
Is this behaviour expected? Presumably temp1 is being distributed across a gang (or multiple gangs?) and only one version of it is being returned. How would I otherwise copy back to the host an array that is set this way in a parallel region?
Thanks,
Rob