OpenACC pointer across multiple parallel regions with multicore

A common pattern in a code I’m working on adding OpenACC directives to involves fetching a pointer to a block in memory inside of loops over blocks. When I have multiple parallel regions it seems that I’m creating a race condition in the pointer assignment between them. My understanding is that parallel regions execute sequentially so I am a bit confused as to what is happening. I have a simple reproducer:

program testPointer
   implicit none
   real, allocatable, dimension(:,:,:,:,:), target  :: unk
   real, pointer, dimension(:,:,:,:) :: dataPtr, dataPtr2
   integer, parameter :: nBlocks=100, nvar=12, nxb=16, nyb=16, nzb=1
   integer :: lb

   allocate(unk(nvar,nxb,nyb,nzb,nBlocks))
   !$acc parallel loop private(dataPtr)
   do lb = 1, nBlocks
      dataPtr  => unk(:,:,:,:,lb)
      if (lb == 50) print *, 'dataPtr 1', loc(dataPtr), loc(unk(:,:,:,:,50))
      nullify(dataPtr)
   enddo

   !$acc parallel loop private(dataPtr, dataPtr2)
   do lb = 1, nBlocks
      dataPtr  => unk(:,:,:,:,lb)
      if (lb == 50) print *, 'dataPtr 2', loc(dataPtr), loc(unk(:,:,:,:,50))
      nullify(dataPtr)

      dataPtr2 => unk(:,:,:,:,lb)
      if (lb == 50) print *, 'dataPtr2 ', loc(dataPtr2), loc(unk(:,:,:,:,50))
      nullify(dataPtr2)
   enddo
   if (allocated(unk)) deallocate(unk)
end program testPointer

So far I’ve only tested on multicore

nvfortran -acc=multicore test.f90
export ACC_NUM_CORES=4 
dataPtr 1          139849942941728          139849942941728
dataPtr 2          139849942585376          139849942941728
dataPtr2           139849942941728          139849942941728

I would expect the printed addresses to all be the same and they are if I remove the directive from one of the loops. Is there something that I’m not understanding about how dataPtr should behave between these two parallel regions?

Thanks!

Hi adam.c.reyes,

I just investigated your code and I think you’ve identified a real bug with the “-acc=multicore” flag in this situation - I reported it to our internal engineering team and we’ll refer to it now as TPR #35364.

Interestingly, from playing with it - it actually only occurs when compiling with “-acc=multicore”. If I compile with “-acc=gpu”, then I get the expected results.

After engineering looks at it, I’ll get back to you with it either being fixed or identifying where we both misunderstood something in the situation.

Cheers,

Seth.

1 Like