I’m trying add openmp offload to a code with many scratch arrays but making them private doesn’t work properly.
Here is a reproduction of what I mean:
PROGRAM scratchTest
USE OMP_LIB
IMPLICIT NONE
INTEGER, PARAMETER :: NN=4096
INTEGER, PARAMETER :: NM=4096
INTEGER, PARAMETER :: SCRATCHSIZE=10
REAL,ALLOCATABLE,DIMENSION(:,:) :: scratch
REAL,ALLOCATABLE,DIMENSION(:,:) :: A
REAL :: st, et
INTEGER::I,J,K
ALLOCATE(A(NN,NM))
ALLOCATE(scratch(SCRATCHSIZE,SCRATCHSIZE))
A = 0.0D0
scratch = 0.0D0
st = omp_get_wtime()
!$OMP target enter data map(to:A)
!$OMP target teams distribute parallel do private(scratch) collapse(2)
DO j=1,NN
DO i=1,NM
DO k=1,SCRATCHSIZE
scratch(k,k)=j+i
END DO
A(i,j)=SUM(scratch)
END DO
END DO
!$OMP END target teams distribute parallel do
!$OMP target exit data map(from:A)
et = omp_get_wtime()
PRINT *, " total time: ", (et - st), " s"
PRINT *,A(10,10)
END PROGRAM scratchTest
(This code doesn’t do anything useful. It just illustrates the problem.)
If I change scratch to be declared with a certain size at compile time it works fine but with allocatable it just crashes like this:
========= Invalid global write of size 8 bytes
========= at nvkernel_MAIN__F1L27_2_+0xaf0 in /home/ec2-user/private_problem/scratch_array.f90:31
========= by thread (56,0,0) in block (7,0,0)
========= Address 0x0 is out of bounds
========= and is 8,696,889,344 bytes before the nearest allocation at 0x206600000 of size 8,388,864 bytes
========= Saved host backtrace up to driver entry point at kernel launch time
========= Host Frame: [0x33137f]
========= in /lib64/libcuda.so.1
========= Host Frame:launchInternal in platform_cuda/hxCuda.c:3402 [0x4ac4c]
========= in /opt/nvidia/hpc_sdk/Linux_x86_64/24.5/compilers/lib/libnvomp.so
…(etc)
Is making an allocatable array private in nvfortran with openmp offload not possible or am I missing something?