I am trying to use the atomic pragma, but it does not work for me.
I first tried Fortran :
[ufh62jk@jean-zay3: openacc-kernels]$ cat toto.F90
INTEGER :: ILOC, I2, N
N = 1000
!$acc parallel loop copy (ILOC)
DO I2 = 1, N
IF (I2 == 10) THEN
!$acc atomic write
ILOC = I2
!$acc end atomic
ENDIF
ENDDO
!$acc end parallel loop
END
[ufh62jk@jean-zay3: openacc-kernels]$ pgf90 -acc=gpu toto.F90
NVFORTRAN-S-0155-Invalid atomic region. (toto.F90: 12)
0 inform, 0 warnings, 1 severes, 0 fatal for MAIN
Then C :
[ufh62jk@jean-zay3: openacc-kernels]$ cat toto.c
int main ()
{
int iloc, i2, n;
n = 1000;
#pragma acc parallel loop copy (iloc)
for (i2 = 0; i2 < n; i2++)
{
if (i2 == 10)
{
#pragma acc atomic write
iloc = i2;
}
}
}
[ufh62jk@jean-zay3: openacc-kernels]$ pgcc -acc=gpu toto.c
nvvmCompileProgram error 9: NVVM_ERROR_COMPILATION.
Error: /tmp/pgaccg4CosEbq2AKu.gpu (69, 19): parse stored value and pointer type do not match
ptxas /tmp/pgaccM4CoYu33w_f0.ptx, line 1; fatal : Missing .version directive at start of file '/tmp/pgaccM4CoYu33w_f0.ptx'
ptxas fatal : Ptx assembly aborted due to errors
NVC++-F-0155-Compiler failed to translate accelerator region (see -Minfo messages): Device compiler exited with error status code (toto.c: 10)
NVC++/x86-64 Linux 21.5-0: compilation aborted
I must be doing something wrong, but I do not see what.
My OS & compiler version :
[ufh62jk@jean-zay3: openacc-kernels]$ lsb_release -a
LSB Version: :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
Distributor ID: RedHatEnterprise
Description: Red Hat Enterprise Linux release 8.1 (Ootpa)
Release: 8.1
Codename: Ootpa
[ufh62jk@jean-zay3: openacc-kernels]$ pgf90 --version
pgf90 (aka nvfortran) 21.5-0 LLVM 64-bit target on x86-64 Linux -tp skylake
PGI Compilers and Tools
Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
Since “ILOC” isn’t used, the dead-code elimination optimization is getting rid of it. Hence the atoimic region is empty and thus invalid. Try adding a print statement after the loop so the variable is used.
Alternately, compile with “-O1” so dead-code elimination is not performed.
Hope this helps,
Mat
% cat toto.f90
INTEGER :: ILOC, I2, N
N = 1000
!$acc parallel loop copy (ILOC)
DO I2 = 1, N
IF (I2 == 10) THEN
!$acc atomic write
ILOC = I2
!$acc end atomic
ENDIF
ENDDO
!$acc end parallel loop
print *, ILOC
END
% nvfortran -acc toto.f90 -Minfo=accel
MAIN:
6, Generating copy(iloc) [if not already present]
Generating Tesla code
7, !$acc loop gang ! blockidx%x
7, Scalar last value needed after loop for iloc at line 18
% a.out
10
I tried to follow your advice, but unfortunately, it does not work with 21.5 :
[ufh62jk@jean-zay4: tmp]$ cat toto.f90
INTEGER :: ILOC, I2, N
N = 1000
!$acc parallel loop copy (ILOC)
DO I2 = 1, N
IF (I2 == 10) THEN
!$acc atomic write
ILOC = I2
!$acc end atomic
ENDIF
ENDDO
!$acc end parallel loop
print *, ILOC
end
[ufh62jk@jean-zay4: tmp]$ nvfortran -acc toto.f90 -Minfo=accel
MAIN:
6, Generating copy(iloc) [if not already present]
Generating Tesla code
7, !$acc loop gang ! blockidx%x
7, Scalar last value needed after loop for iloc at line 18
nvvmCompileProgram error 9: NVVM_ERROR_COMPILATION.
Error: /tmp/pgaccOBhg4V77OpZX.gpu (52, 19): parse stored value and pointer type do not match
ptxas /tmp/pgaccOBhg4LNEOX25.ptx, line 1; fatal : Missing .version directive at start of file '/tmp/pgaccOBhg4LNEOX25.ptx'
ptxas fatal : Ptx assembly aborted due to errors
NVFORTRAN-F-0155-Compiler failed to translate accelerator region (see -Minfo messages): Device compiler exited with error status code (toto.f90: 6)
NVFORTRAN/x86-64 Linux 21.5-0: compilation aborted
[ufh62jk@jean-zay4: tmp]$ nvfortran --version
nvfortran 21.5-0 LLVM 64-bit target on x86-64 Linux -tp skylake
NVIDIA Compilers and Tools
Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
Apologies. I was using our pre-release 21.7 where this has been fixed. We recently re-engineered all our atomics which introduced a few issues. With 21.5 as a short-term work around, you can add the internal compiler flag “-Mx,231,0x1” to revert to the older atomic implementation. Once 21.7 is available, please remove this flag.