A common pattern in a code I’m working on adding OpenACC directives to involves fetching a pointer to a block in memory inside of loops over blocks. When I have multiple parallel regions it seems that I’m creating a race condition in the pointer assignment between them. My understanding is that parallel regions execute sequentially so I am a bit confused as to what is happening. I have a simple reproducer:
program testPointer
implicit none
real, allocatable, dimension(:,:,:,:,:), target :: unk
real, pointer, dimension(:,:,:,:) :: dataPtr, dataPtr2
integer, parameter :: nBlocks=100, nvar=12, nxb=16, nyb=16, nzb=1
integer :: lb
allocate(unk(nvar,nxb,nyb,nzb,nBlocks))
!$acc parallel loop private(dataPtr)
do lb = 1, nBlocks
dataPtr => unk(:,:,:,:,lb)
if (lb == 50) print *, 'dataPtr 1', loc(dataPtr), loc(unk(:,:,:,:,50))
nullify(dataPtr)
enddo
!$acc parallel loop private(dataPtr, dataPtr2)
do lb = 1, nBlocks
dataPtr => unk(:,:,:,:,lb)
if (lb == 50) print *, 'dataPtr 2', loc(dataPtr), loc(unk(:,:,:,:,50))
nullify(dataPtr)
dataPtr2 => unk(:,:,:,:,lb)
if (lb == 50) print *, 'dataPtr2 ', loc(dataPtr2), loc(unk(:,:,:,:,50))
nullify(dataPtr2)
enddo
if (allocated(unk)) deallocate(unk)
end program testPointer
So far I’ve only tested on multicore
nvfortran -acc=multicore test.f90
export ACC_NUM_CORES=4
dataPtr 1 139849942941728 139849942941728
dataPtr 2 139849942585376 139849942941728
dataPtr2 139849942941728 139849942941728
I would expect the printed addresses to all be the same and they are if I remove the directive from one of the loops. Is there something that I’m not understanding about how dataPtr should behave between these two parallel regions?
Thanks!