Hi,
I have to add two complex double numbers atomically ( All my threads compute a complex double value and add it to the same memory location ). Unfortunately it does not appear as if CUDA provides an atomic routine for that ( I am running on Tesla C1060, but I dont think even Fermi has anything for this ). Any suggestions on how I could get this done ? Is there a way I could manually make an operation atomic ?
Just do atomic additions of real and imaginary parts separately.
Or store the result from each thread (or each block, after a reduction in shared memory) into an array and start a separate reduction kernel. This has the added advantage of producing same results on repeated invocations, which might help debugging.
Just do atomic additions of real and imaginary parts separately.
Or store the result from each thread (or each block, after a reduction in shared memory) into an array and start a separate reduction kernel. This has the added advantage of producing same results on repeated invocations, which might help debugging.
However, it was not very useful for my case, since the thread started taking slightly longer to run. Since I was calling these set of threads multiple number of times, therefore the program execution time increased considerably.
However, it was not very useful for my case, since the thread started taking slightly longer to run. Since I was calling these set of threads multiple number of times, therefore the program execution time increased considerably.
Oh, I didn’t pay enough attention to the fact that all threads add to the same memory location. In that case, you should definitely do a per-block reduction in shared memory before adding the block’s result to the global variable in order to reduce contention.
Oh, I didn’t pay enough attention to the fact that all threads add to the same memory location. In that case, you should definitely do a per-block reduction in shared memory before adding the block’s result to the global variable in order to reduce contention.