Can float4 be used for atomicAdd efficiently?

Hi! I am wondering, can I write atomicAdd with float4 in one line? Like:

#define FETCH_FLOAT4(pointer) (reinterpret_cast<float4*>(&(pointer))[0])

atomicAdd(&FETCH_FLOAT4(c[threadIdx.x*8]), FETCH_FLOAT4(panelB[0][0]));

But I meet error like this:

error: no instance of overloaded function "atomicAdd" matches the argument list
            argument types are: (float4 *, float4)

Must I write them separately like:

atomicAdd(&c[threadIdx.x*8+0, panelB[0][0]);
atomicAdd(&c[threadIdx.x*8+1, panelB[0][1]);
atomicAdd(&c[threadIdx.x*8+2, panelB[0][2]);


I guess this can not fully utilize the property of float4? Maybe just float4 is not considered by atomicAdd? Would you suggest the developer to include this in the future?

Thank you!!!

By the way, I find this: use float AtomicAdd to write to a float4
12 years passed, not sure whether atomicAdd can accept float4 now…Haha…

Atomics can work on up to a 64-bit (properly-aligned) quantity at a time. So you cannot do an atomic add on a float4 considering the entire float4 quantity - that is a 128 bit quantity. Furthermore the atomic engine doesn’t know anything about elementwise addition for a vector type.

The only types supported by atomicAdd are those listed in the programming guide. And if you wanted to perform multiple additions across various elements of a vector type, each of those additions would need to be handled by a single atomicAdd instruction.

You could possibly construct your own custom Atomic to perform an atomic addition on two of the quantities at a time. I think that is unlikely to provide better performance than 2 native atomics, however.

1 Like

Well, very interesting! I never expected there exists a “custom Atomic”. I will try this later. And thank you very much for your answer!!!

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.