Using atomics to sum into a large vector

jay333 · August 13, 2021, 9:41am

I have multiple threads accumulating values into a large vector. Many threads write into different locations in the vector, but there is also some overlap in the write destinations. For example:

thread 1:
[ 2 3 2 0 0 0 0 0 0… ]

thread 2:
[ 0 0 1 1 2 0 0 0 0… ]

thread 3:
[ 0 0 0 1 0 0 3 0 3… ]

etc.

Note that the zeroes wouldn’t actually be stored. Each thread stores each nonzero value and the index into the result vector where the value will go. The resulting sum vector would be:
[ 2 3 3 2 2 0 3 0 3…]

I’m considering using atomicAdd to accumulate values into a global vector. This way, I don’t have to store intermediate sums, as I would if I used parallel reduction. Can I still get decent parallelism if I use atomicAdd, if many threads are still writing into different memory locations? Or are there too many drawbacks to using atomics?

cbuchner1 · August 13, 2021, 10:55am

I see no fundamental problem with your approach, considering that the degree of overlap is moderate or low.

Consider doing an intermediate sum in shared memory using shared memory atomics if a direct write to global memory turns out to be slower than expected.

Topic		Replies	Views
write results in parallel creating an unknown number of data elements in each thread CUDA Programming and Performance	5	2326	January 21, 2010
Parallel Reduction CUDA Programming and Performance	2	1160	July 8, 2010
accumulating floats accross threads in a block is there and atomicAdd + sync for floats? CUDA Programming and Performance	1	2236	January 26, 2009
How to set the priority fro threads ? CUDA Programming and Performance	1	2573	February 23, 2009
Many threads updating a single global variable CUDA Programming and Performance	7	6775	March 30, 2012
Atomic Functions CUDA Programming and Performance	1	4992	September 22, 2011
Warp-Aggregate AtomicAdd CUDA Programming and Performance	3	1977	March 10, 2015
Atomics on Kepler CUDA Programming and Performance	0	745	February 19, 2014
atomicAdd() function problem CUDA Programming and Performance	3	893	August 5, 2014
Questions with shared memory CUDA Programming and Performance	3	1659	June 21, 2011

Using atomics to sum into a large vector

Related topics