What is the best practice way to atomically add two cuda::std::complex numbers?
how would you do it in C++, ie. with std::complex<T>
?
(hint: I don’t think it is possible)
AFAICT cuda::atomic
provides no overloads for cuda::std::complex<T>
.
In short, I would probably treat it as an array of float2
or double2
and do atomics on the individual components.
I should also mention that doing atomics on the individual components creates a gap in behavior compared to what we might normally expect from atomics. Normally, if I intermix reads with atomics, I expect a coherent, sensible value. Something that was actually written there. When we use the individual component update method, we cannot intermix reads with atomics and expect to always get coherent values, considering both real and imaginary parts. Therefore this “individual components” idea requires that you arrange synchronization of some sort, so that any reads are peformed after all atomics are done (or a set of atomics are done) so as to get a coherent value.
But leaving that all aside, for e.g. a reduction, we can be assured that the final value in the complex location will correspond to the correct result, subject to the limits of the type numerics and order of operations.
As Robert said, adding the components of complex numbers individually should be fine.
As general tips for those kind of challenges:
For more complex (no pun intended) tasks, you can always do atomic CAS (compare-and-swap) operations. E.g. multiplying lots of complex numbers.
Also instead of atomic accesses, if you do not need intermediate results, you can individually write out the numbers to operate on, e.g. all the numbers to add or to multiply. Each thread, which writes a number has its own memory location. And then in a second step, e.g. a second kernel run or the same kernel after a synchronizing operation, you combine those numbers written out.
Thank you very much for you answers.
I ended up using the cufftComplex (or cuComplex) instead of cuda::std::complex because it is feels easier to access their real and imag parts (since they are just float2)
And I did atomicAdds on the x and y
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.