atomicAdd with float2 no API support, workarounds ?

CudaaduC · May 26, 2014, 9:24pm

Wanted to bump this thread, as the need for an atomicAdd on a int2 or float2 has come up on a number of projects.

Is there indeed a more efficient method to do an atomicAdd on a 64 bit space other than splitting it into two atomicAdd operations on the .x and .y locations?

Robert_Crovella · May 26, 2014, 11:32pm

Doing two 32-bit atomic adds in sequence is not the same as atomically updating a float2 quantity. In some contexts it might be the same, but I can certainly imagine some programming contexts where it is not the same.

If you require a 64-bit atomic update, then custom atomics using atomicCAS is probably the only solution:

[url]cuda - How can I implement a custom atomic function involving several variables? - Stack Overflow

It’s certainly not “efficient”.

If 2 independent 32-bit atomic adds will suffice for your context, I doubt you’re going to find anything “more efficient”. This thread seems to have wrapped itself around the idea at one point that coalescing is an issue. It might be, but I don’t find that subject documented anywhere. I think a conservative assumption is that there are atomic SFUs in the cache HW, and they behave in some undescribed manner as far as synchronicity goes. This recent SO posting may be of interest:

[url]caching - Can consecutive CUDA atomic operations on global memory benefit from L2 cache? - Stack Overflow

CudaaduC · October 30, 2014, 9:41pm

I did develop a way to do atomicAdd() on a complex type which is indeed faster than doing two seperate atomicAdds on two different locations. While it is only about 30% faster than doing two seperate atomicAdds on 32-bit locations, it does work correctly.

PM me if anybody is interested. Only tested using GTX 780ti sofar, but assume it should work on Kepler and beyond.

784988825 · January 28, 2021, 1:17pm

hello， CudaaduC
can you tell me the way to do atomicAdd() on a complex type?

Topic		Replies	Views
AtomicAdd algorithm CUDA Programming and Performance	7	3722	August 25, 2009
atomicAdd(float,float) - atomicMul(float,float) ... CUDA Programming and Performance	13	56402	July 29, 2010
Half2 atomics generate unused code CUDA Programming and Performance	13	137	August 8, 2024
Why does the atomicAdd work faster than â€˜+=â€™? CUDA Programming and Performance	3	12185	November 2, 2011
Is there a way to avoid atomicAdd in my situation? CUDA Programming and Performance	3	1386	March 4, 2019
Atomic float operations. especially add CUDA Programming and Performance	10	16377	July 31, 2009
Threads and Race Condition CUDA Programming and Performance	11	2969	April 30, 2012
multiple threads writing value to a same variable CUDA Programming and Performance	19	6660	March 20, 2012
Strategies for improving Atomic operation peformance CUDA Programming and Performance	5	916	June 18, 2014
Using atomicAdd to step through an array CUDA Programming and Performance	7	3885	May 24, 2011

atomicAdd with float2 no API support, workarounds ?

Related topics