Problem with atomic write

Hello,

I am trying to use the atomic pragma, but it does not work for me.

I first tried Fortran :


[ufh62jk@jean-zay3: openacc-kernels]$ cat toto.F90

INTEGER :: ILOC, I2, N

N = 1000


!$acc parallel loop copy (ILOC)
DO I2 = 1, N

  IF (I2 == 10) THEN
!$acc atomic write
    ILOC = I2
!$acc end atomic 
  ENDIF

ENDDO
!$acc end parallel loop

END
[ufh62jk@jean-zay3: openacc-kernels]$ pgf90  -acc=gpu toto.F90
NVFORTRAN-S-0155-Invalid atomic region.  (toto.F90: 12)
  0 inform,   0 warnings,   1 severes, 0 fatal for MAIN

Then C :

[ufh62jk@jean-zay3: openacc-kernels]$ cat toto.c

int main ()
{

  int iloc, i2, n;

  n = 1000;

#pragma acc parallel loop copy (iloc)
  for (i2 = 0; i2 < n; i2++)
    {
      if (i2 == 10)
        {
#pragma acc atomic write
          iloc = i2;
        }
    }

}
[ufh62jk@jean-zay3: openacc-kernels]$ pgcc  -acc=gpu toto.c
nvvmCompileProgram error 9: NVVM_ERROR_COMPILATION.
Error: /tmp/pgaccg4CosEbq2AKu.gpu (69, 19): parse stored value and pointer type do not match
ptxas /tmp/pgaccM4CoYu33w_f0.ptx, line 1; fatal   : Missing .version directive at start of file '/tmp/pgaccM4CoYu33w_f0.ptx'
ptxas fatal   : Ptx assembly aborted due to errors
NVC++-F-0155-Compiler failed to translate accelerator region (see -Minfo messages): Device compiler exited with error status code (toto.c: 10)
NVC++/x86-64 Linux 21.5-0: compilation aborted

I must be doing something wrong, but I do not see what.

My OS & compiler version :

[ufh62jk@jean-zay3: openacc-kernels]$ lsb_release -a
LSB Version:    :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
Distributor ID: RedHatEnterprise
Description:    Red Hat Enterprise Linux release 8.1 (Ootpa)
Release:        8.1
Codename:       Ootpa
[ufh62jk@jean-zay3: openacc-kernels]$ pgf90 --version

pgf90 (aka nvfortran) 21.5-0 LLVM 64-bit target on x86-64 Linux -tp skylake 
PGI Compilers and Tools
Copyright (c) 2021, NVIDIA CORPORATION.  All rights reserved.

Regards,

Philippe

Hi Philippe,

Since “ILOC” isn’t used, the dead-code elimination optimization is getting rid of it. Hence the atoimic region is empty and thus invalid. Try adding a print statement after the loop so the variable is used.

Alternately, compile with “-O1” so dead-code elimination is not performed.

Hope this helps,
Mat

% cat toto.f90
INTEGER :: ILOC, I2, N

N = 1000


!$acc parallel loop copy (ILOC)
DO I2 = 1, N

  IF (I2 == 10) THEN
!$acc atomic write
    ILOC = I2
!$acc end atomic
  ENDIF

ENDDO
!$acc end parallel loop

print *, ILOC

END
% nvfortran -acc toto.f90 -Minfo=accel
MAIN:
      6, Generating copy(iloc) [if not already present]
         Generating Tesla code
          7, !$acc loop gang ! blockidx%x
      7, Scalar last value needed after loop for iloc at line 18
% a.out
           10

Hello Mat,

I tried to follow your advice, but unfortunately, it does not work with 21.5 :

 [ufh62jk@jean-zay4: tmp]$ cat toto.f90 
INTEGER :: ILOC, I2, N

N = 1000


!$acc parallel loop copy (ILOC)
DO I2 = 1, N

  IF (I2 == 10) THEN
!$acc atomic write
    ILOC = I2
!$acc end atomic
  ENDIF

ENDDO
!$acc end parallel loop

print *, ILOC

end
[ufh62jk@jean-zay4: tmp]$ nvfortran -acc toto.f90 -Minfo=accel
MAIN:
      6, Generating copy(iloc) [if not already present]
         Generating Tesla code
          7, !$acc loop gang ! blockidx%x
      7, Scalar last value needed after loop for iloc at line 18
nvvmCompileProgram error 9: NVVM_ERROR_COMPILATION.
Error: /tmp/pgaccOBhg4V77OpZX.gpu (52, 19): parse stored value and pointer type do not match
ptxas /tmp/pgaccOBhg4LNEOX25.ptx, line 1; fatal   : Missing .version directive at start of file '/tmp/pgaccOBhg4LNEOX25.ptx'
ptxas fatal   : Ptx assembly aborted due to errors
NVFORTRAN-F-0155-Compiler failed to translate accelerator region (see -Minfo messages): Device compiler exited with error status code (toto.f90: 6)
NVFORTRAN/x86-64 Linux 21.5-0: compilation aborted
[ufh62jk@jean-zay4: tmp]$ nvfortran --version

nvfortran 21.5-0 LLVM 64-bit target on x86-64 Linux -tp skylake 
NVIDIA Compilers and Tools
Copyright (c) 2021, NVIDIA CORPORATION.  All rights reserved.

It works with 20.11 but not with 21.3

Regards,

Philippe

Apologies. I was using our pre-release 21.7 where this has been fixed. We recently re-engineered all our atomics which introduced a few issues. With 21.5 as a short-term work around, you can add the internal compiler flag “-Mx,231,0x1” to revert to the older atomic implementation. Once 21.7 is available, please remove this flag.

% nvfortran toto.f90 -acc -Minfo=accel -V21.5 -Mx,231,0x01
MAIN:
      6, Generating copy(iloc) [if not already present]
         Generating Tesla code
          7, !$acc loop gang, vector(128) ! blockidx%x threadidx%x

-Mat

Very good; I will use the option you proposed.

Philippe