atomicAdd not behaving as expected, atomicAdd_system not defined

Benutzer5182 · August 22, 2022, 9:47am

Hello,

I’ve got many blocks to work on a small array of data. I used local memory to calculate everything and after that, write it to the small global array using an atomicAdd to have no race conditions. Everytime I run my code, I get different outputs. So I tested the atomicAdd function using a small Kernel:

global void test(float * arr) {
atomicAdd(&arr[0], threadIdx.x);
}

test << <1000, 1024 >> > (x_train);

So I start 1000 Blocks with 1024 Threads. Theres always the same amount of Threads, with Ids from 0 to 1023. Shouldnt the output be always the same?

If I try this:

global void test(float * arr) {
atomicAdd(&arr[0], 1);
}

The output is 1024000, which is correct. What is wrong here?

Second thing is, Ive tried using __atomicAdd or atomicAdd_system exactly like in the documentation but recieve “error: identifier “atomicAdd_system” is undefined”. Using a Quadro RTX 6000.

Thanks for any help!

Robert_Crovella · August 22, 2022, 1:44pm

You’re exceeding what can be represented in a float variable with full accuracy.

The sum of a single block using threadIdx.x is 1023x1024/2 = 523776. If you do that over 1000 such blocks, it would be 523776000. But a float variable only has 23 (or 24) significand bits. The result is that after about 16 million, the sum can no longer be precisely accurate in all/arbitrary cases. 523776000 is larger than 16 million, whereas 1024000 is smaller than 16 million.

Try switching your arr variable to type double.

In addition to being “incorrect” one reason why the output variable result varies from run to run could be that CUDA provides no specified order of thread execution, coupled with the aforementioned resolution issue, coupled with this. Once the addition operation becomes limited by the float resolution in the significand, then the order of operations matters to determine exactly which “incorrect” result you will get.

Second thing is, Ive tried using __atomicAdd or atomicAdd_system exactly like in the documentation but recieve “error: identifier “atomicAdd_system” is undefined”. Using a Quadro RTX 6000.

Compile for the correct arch matching the GPU. So for RTX 6000 that would be something like -arch=sm_75. You only have access to atomic operations if you are compiling for an architecture that supports the requested atomic.

Benutzer5182 · August 22, 2022, 8:12pm

This makes sense, I’m stupid.
Thank you very much for your help!

system · September 5, 2022, 8:13pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
AtomicAdd() functions CUDA Programming and Performance	1	753	December 9, 2016
AtomicAdd result incorrect CUDA Programming and Performance	3	1596	December 29, 2018
atomicAdd problems. CUDA Programming and Performance	3	2346	April 13, 2011
Why does atomicAdd not work with doubles as input? CUDA Programming and Performance	6	13440	December 21, 2017
atomicAdd crash CUDA Programming and Performance	8	1310	August 25, 2016
Problem with atomicAdd. CUDA Programming and Performance	7	21116	December 10, 2011
AtomicAdd with Visual Studio 2013 CUDA Setup and Installation	11	5490	February 26, 2015
AtomicAdd algorithm CUDA Programming and Performance	7	3767	August 25, 2009
atomicAdd CUDA Programming and Performance	10	7212	September 26, 2013
atomicAdd(float) does not add very small values CUDA Programming and Performance	7	2642	December 5, 2013

atomicAdd not behaving as expected, atomicAdd_system not defined

Related topics