OpenACC pointer across multiple parallel regions with multicore

A common pattern in a code I’m working on adding OpenACC directives to involves fetching a pointer to a block in memory inside of loops over blocks. When I have multiple parallel regions it seems that I’m creating a race condition in the pointer assignment between them. My understanding is that parallel regions execute sequentially so I am a bit confused as to what is happening. I have a simple reproducer:

program testPointer
   implicit none
   real, allocatable, dimension(:,:,:,:,:), target  :: unk
   real, pointer, dimension(:,:,:,:) :: dataPtr, dataPtr2
   integer, parameter :: nBlocks=100, nvar=12, nxb=16, nyb=16, nzb=1
   integer :: lb

   !$acc parallel loop private(dataPtr)
   do lb = 1, nBlocks
      dataPtr  => unk(:,:,:,:,lb)
      if (lb == 50) print *, 'dataPtr 1', loc(dataPtr), loc(unk(:,:,:,:,50))

   !$acc parallel loop private(dataPtr, dataPtr2)
   do lb = 1, nBlocks
      dataPtr  => unk(:,:,:,:,lb)
      if (lb == 50) print *, 'dataPtr 2', loc(dataPtr), loc(unk(:,:,:,:,50))

      dataPtr2 => unk(:,:,:,:,lb)
      if (lb == 50) print *, 'dataPtr2 ', loc(dataPtr2), loc(unk(:,:,:,:,50))
   if (allocated(unk)) deallocate(unk)
end program testPointer

So far I’ve only tested on multicore

nvfortran -acc=multicore test.f90
export ACC_NUM_CORES=4 
dataPtr 1          139849942941728          139849942941728
dataPtr 2          139849942585376          139849942941728
dataPtr2           139849942941728          139849942941728

I would expect the printed addresses to all be the same and they are if I remove the directive from one of the loops. Is there something that I’m not understanding about how dataPtr should behave between these two parallel regions?


Hi adam.c.reyes,

I just investigated your code and I think you’ve identified a real bug with the “-acc=multicore” flag in this situation - I reported it to our internal engineering team and we’ll refer to it now as TPR #35364.

Interestingly, from playing with it - it actually only occurs when compiling with “-acc=multicore”. If I compile with “-acc=gpu”, then I get the expected results.

After engineering looks at it, I’ll get back to you with it either being fixed or identifying where we both misunderstood something in the situation.



1 Like