Nvfortran atomicAdd incorrect work

medvsn · October 31, 2023, 9:51am

I use nvfortran 23.1-0 64-bit target on x86-64 Linux -tp skylake-avx512

try to use AtomicAdd to caluclate sum of arrays’ elements. Set print from kernel and it seems like don’t work properly.

            1    1.000000        0.000000    
            2    2.000000        0.000000    
            3    3.000000        0.000000    
            4    4.000000        0.000000    
            5    5.000000        0.000000    
Sum of array(5) =   0.0
            1    1.000000        0.000000    
Sum of array(1) =   0.0

the source code is the following:

module a
  use cudafor
  implicit none 

  contains

  attributes(global) &
  subroutine sum_kernel_atomic(a, n, res)
    implicit none
    
    real :: a(*)          !< input devica array
    integer, value :: n   !< input len of array
    real :: res           !< output sum of array
    integer :: idx
    idx = threadIdx%x
    if (idx <= n) then 
      !
      res = atomicadd(res, a(idx))
      ! print the values 
      write(*,*) idx, a(idx), res 
    end if 
    return 
  end subroutine sum_kernel_atomic 

end module a

program atomic_test
  use a 
  use cudafor 
  implicit none
  integer :: n, m
  real , allocatable :: arr(:)
  real , allocatable, device :: darr(:)

  real :: res 
  real, device :: dres 

  n = 5
  allocate (arr(n))
  arr(:) = [(m, m=1,n)]
  allocate (darr(n))
  darr(:) = arr(:)


  dres = 0.0
  call sum_kernel_atomic<<<1,n>>>(darr, n, dres) 
  res = dres 
  write(*, '("Sum of array(",i1,") = ",f5.1)') n, res 

  m = 1
  dres = 0.0
  call sum_kernel_atomic<<<1,1>>>(darr, m, dres) 
  res = dres 
  write(*, '("Sum of array(",i1,") = ",f5.1)') m, res 

  deallocate(arr)
  deallocate(darr)
end program atomic_test

MatColgrove · October 31, 2023, 5:04pm

Hi medvsn,

“atomicAdd” returns the value of the variable before it was updated. Hence by using:

res = atomicadd(res, a(idx))

the code is overwriting “res” with the old value. You’ll want to use a different variable to capture the return value.

-Mat

medvsn · November 1, 2023, 10:45am

I replace original line res = atomicadd(res, a(idx)) to the following two operations:

tmp = res 
tmp = atomicadd(res, a(idx))

and get target result.

          idx     a(idx)             res             tmp
            1    1.000000        15.00000        0.000000    
            2    2.000000        15.00000        1.000000    
            3    3.000000        15.00000        3.000000    
            4    4.000000        15.00000        6.000000    
            5    5.000000        15.00000        10.00000    
Sum of array(5) =  15.0
            1    1.000000        1.000000        0.000000    
Sum of array(1) =   1.0

I think it’s a bit weird, it means, that res = tmp + a(idx), where tmp must be output. or not?

P.S.: Firstly, I tried to use res = atomicadd(tmp, a(idx)) , but I got zeros in res values.

MatColgrove · November 1, 2023, 11:10pm

Sorry, I’m not clear on what you’re asking.

“tmp” in this context is the value of “res” prior to the add, so looks correct.

“res” is correct as well at 15. All the threads in a warp (i.e. groups of 32 threads) execute the same instruction at the same time, so by the time the code gets to the print, all threads have executed the atomicAdd and value is the same. If there were a larger amount of threads (i.e. > 32), then you’re likely to see different values appear. Though the value printed will be non-deterministic and depend on the order the threads executed the atomicAdd. In general it’s not good practice to try an use the intermediary values from a global variable used in an atomic.

If you do need to print the value of res after the atomicAdd, you’d want to do something like:

tmp = atomicadd(res, a(idx))
tmp = tmp + a(idx)
write(*,*) idx, a(idx), tmp

medvsn · November 4, 2023, 5:35pm

Thank you.
To clarify, if I need to calculate sum of array and store it to variable res. To organize it I use temporary variable tmp for old value and use function atomicAdd in the following way:

tmp = atomicAdd(res, a(idx))  ! tmp -- old value of res
                              ! res -- new value after operation inside the function atomicaAdd

Topic		Replies	Views
atomicAdd not behaving as expected, atomicAdd_system not defined CUDA Programming and Performance	3	1687	September 5, 2022
atomicAdd(float,float) - atomicMul(float,float) ... CUDA Programming and Performance	13	57081	July 29, 2010
can you give me sample code for atomicAdd()? CUDA Programming and Performance	9	48973	June 5, 2009
atomicAdd() during loop not work well but at end work well CUDA Programming and Performance	3	1259	May 20, 2010
How to use atomicadd. Legacy PGI Compilers	2	7219	October 3, 2010
To use atomic add Legacy PGI Compilers	13	11971	June 30, 2012
AtomicAdd result incorrect CUDA Programming and Performance	3	1713	December 29, 2018
incorrect results from atomicAdd (maybe the method is incorrect) CUDA Programming and Performance	1	3827	May 2, 2010
Incorrect result obtained while computing atomic operation on variables stored in system memory and device memory CUDA Programming and Performance	2	469	October 9, 2019
What I am doing wrong with atomicAdd() CUDA Programming and Performance	5	2438	November 1, 2010

Nvfortran atomicAdd incorrect work

Related topics