I use nvfortran 23.1-0 64-bit target on x86-64 Linux -tp skylake-avx512

try to use AtomicAdd to caluclate sum of arrays’ elements. Set print from kernel and it seems like don’t work properly.

1 1.000000 0.000000
2 2.000000 0.000000
3 3.000000 0.000000
4 4.000000 0.000000
5 5.000000 0.000000
Sum of array(5) = 0.0
1 1.000000 0.000000
Sum of array(1) = 0.0

the source code is the following:

module a
use cudafor
implicit none
contains
attributes(global) &
subroutine sum_kernel_atomic(a, n, res)
implicit none
real :: a(*) !< input devica array
integer, value :: n !< input len of array
real :: res !< output sum of array
integer :: idx
idx = threadIdx%x
if (idx <= n) then
!
res = atomicadd(res, a(idx))
! print the values
write(*,*) idx, a(idx), res
end if
return
end subroutine sum_kernel_atomic
end module a
program atomic_test
use a
use cudafor
implicit none
integer :: n, m
real , allocatable :: arr(:)
real , allocatable, device :: darr(:)
real :: res
real, device :: dres
n = 5
allocate (arr(n))
arr(:) = [(m, m=1,n)]
allocate (darr(n))
darr(:) = arr(:)
dres = 0.0
call sum_kernel_atomic<<<1,n>>>(darr, n, dres)
res = dres
write(*, '("Sum of array(",i1,") = ",f5.1)') n, res
m = 1
dres = 0.0
call sum_kernel_atomic<<<1,1>>>(darr, m, dres)
res = dres
write(*, '("Sum of array(",i1,") = ",f5.1)') m, res
deallocate(arr)
deallocate(darr)
end program atomic_test

“tmp” in this context is the value of “res” prior to the add, so looks correct.

“res” is correct as well at 15. All the threads in a warp (i.e. groups of 32 threads) execute the same instruction at the same time, so by the time the code gets to the print, all threads have executed the atomicAdd and value is the same. If there were a larger amount of threads (i.e. > 32), then you’re likely to see different values appear. Though the value printed will be non-deterministic and depend on the order the threads executed the atomicAdd. In general it’s not good practice to try an use the intermediary values from a global variable used in an atomic.

If you do need to print the value of res after the atomicAdd, you’d want to do something like:

Thank you.
To clarify, if I need to calculate sum of array and store it to variable res. To organize it I use temporary variable tmp for old value and use function atomicAdd in the following way:

tmp = atomicAdd(res, a(idx)) ! tmp -- old value of res
! res -- new value after operation inside the function atomicaAdd