Complex addition as Atomic operation

akjha · August 29, 2010, 8:01am

Hi,
I have to add two complex double numbers atomically ( All my threads compute a complex double value and add it to the same memory location ). Unfortunately it does not appear as if CUDA provides an atomic routine for that ( I am running on Tesla C1060, but I dont think even Fermi has anything for this ). Any suggestions on how I could get this done ? Is there a way I could manually make an operation atomic ?

Thank you
akjha

tera · August 29, 2010, 8:37am

Just do atomic additions of real and imaginary parts separately.

Or store the result from each thread (or each block, after a reduction in shared memory) into an array and start a separate reduction kernel. This has the added advantage of producing same results on repeated invocations, which might help debugging.

tera · August 29, 2010, 8:37am

Just do atomic additions of real and imaginary parts separately.

Or store the result from each thread (or each block, after a reduction in shared memory) into an array and start a separate reduction kernel. This has the added advantage of producing same results on repeated invocations, which might help debugging.

avidday · August 29, 2010, 9:25am

Isn’t that the problem though? Fermi supports single precision floating point atomic add, but not double precision.

avidday · August 29, 2010, 9:25am

Isn’t that the problem though? Fermi supports single precision floating point atomic add, but not double precision.

tera · August 29, 2010, 9:42am

Ah ok. It’s not as fast as single precision atomicAdd(), but it can be done in software.

tera · August 29, 2010, 9:42am

Ah ok. It’s not as fast as single precision atomicAdd(), but it can be done in software.

akjha · August 29, 2010, 10:35am

Thank you for the replies. I tried searching for how to do it in software, and am citing the link below in case anyone else needs to do it.

http://code.google.com/p/cusp-library/sour…10b7846e8540e7e

However, it was not very useful for my case, since the thread started taking slightly longer to run. Since I was calling these set of threads multiple number of times, therefore the program execution time increased considerably.

akjha · August 29, 2010, 10:35am

Thank you for the replies. I tried searching for how to do it in software, and am citing the link below in case anyone else needs to do it.

http://code.google.com/p/cusp-library/sour…10b7846e8540e7e

However, it was not very useful for my case, since the thread started taking slightly longer to run. Since I was calling these set of threads multiple number of times, therefore the program execution time increased considerably.

tera · August 29, 2010, 11:23am

Oh, I didn’t pay enough attention to the fact that all threads add to the same memory location. In that case, you should definitely do a per-block reduction in shared memory before adding the block’s result to the global variable in order to reduce contention.

tera · August 29, 2010, 11:23am

Oh, I didn’t pay enough attention to the fact that all threads add to the same memory location. In that case, you should definitely do a per-block reduction in shared memory before adding the block’s result to the global variable in order to reduce contention.

Topic		Replies	Views
How to use Atomic add for Complex float data? CUDA Programming and Performance	1	1055	June 18, 2021
atomicAdd cuda complex CUDA Programming and Performance cuda , kernel	4	370	June 6, 2024
atomic add operation CUDA Programming and Performance	2	4327	July 22, 2014
Atomic Functions for double precision CUDA Programming and Performance	2	3441	June 8, 2009
CUDA dot product atomics problem CUDA Programming and Performance	4	1851	February 26, 2012
Threads and Race Condition CUDA Programming and Performance	11	2977	April 30, 2012
Is there a way to avoid atomicAdd in my situation? CUDA Programming and Performance	3	1392	March 4, 2019
atomicAdd with float2 no API support, workarounds ? CUDA Programming and Performance	23	5174	January 28, 2021
Atomic Add 1.3 Float/Double Atomic Add CUDA Programming and Performance	4	2403	December 17, 2010
double precision atomicAdd() problem CUDA Programming and Performance	3	3073	February 1, 2024

Complex addition as Atomic operation

Related topics