How can I make sure atomicAdd() was successful?

raminbarca · March 12, 2017, 8:25pm

Hi,

I want to use atomicAdd() in my kernel, but how do I know if it was successful? Is it possible for it to be unsuccessful at all, or does it keep retrying until it succeeds? (It seems it fails sometimes, or I’m doing something wrong.)

The documentation mentions that it returns the old value in memory (I can’t understand why, though). So I can’t find that out directly from the return value. If I check the value in memory after the atomicAdd() call, then I can’t be sure if the new value is because this thread’s call to atomicAdd() was successful or some other threads has written a new value to the memory location.

I’m using the following function to manipulate the blockIdx in my kernel (i.e., use the value returned by this function instead of blockIdx) to model a work queue. But I’m observing duplicate results in the output, which means that the scenario I explained is occurring (if my understanding is correct).

__device__ void get_next_block(int *executed)
{
	int b_old, b_new;
	do {
		b_old = atomicAdd(executed, 1);
		b_new = *executed;
	} while(b_new == b_old);
}

I also tried doing the addition outside the while loop (only once) and it didn’t fix the problem.

Any help on how I should do this is appreciated. I’m using a K40c card and CUDA driver 8.0.

Robert_Crovella · March 12, 2017, 9:12pm

atomicAdd will always be successful, assuming you have not violated basic programming principles (e.g. assuming your executed pointer points to a valid memory location, for example).

In your example, atomicAdd will always add 1 to the location pointed to by executed, every time it is called.

Based on your description, I’m reasonably sure that it should be sufficient to just do:

__device__ int get next_block(int *executed)
{
  return atomicAdd(executed, 1);
}

That will give each user a unique return value, again assuming basic programming principles, such that executed points to a valid global memory location, and it has been properly initialized, etc.

My guess is that you are getting confused by the behavior of e.g. atomicCAS, which does have a conditional nature associated with it. atomicAdd has no such conditional behavior. It always adds to the location pointed to.

raminbarca · March 12, 2017, 9:24pm

So what else could I possibly be doing wrong? I allocate memory using cudaMalloc and keep the address in the pointer that I pass to the function. I then write value -1 to the address using cudaMemcpy.

Basically what I’m trying to do is to launch a smaller grid than the total number of blocks that need to be executed (say the original grid is 1024 blocks, but I want to launch 128 blocks), and then have a loop in my kernel that executes until the value pointed to by executed pointer exceeds the total number of blocks (1024 in the example). Thread 0 at each block calls get_next_block and then I call __syncthreads() so that all the threads can see the new block ID. When I do this for binomialOptions in CUDA samples, I fail the test and when I print the generated numbers, some are repeated multiple times. I thought unsuccessful atomicAdd() could explain what is happening, but if it always secceeds then I don’t know what I’m doing wrong.

Robert_Crovella · March 12, 2017, 10:11pm

I would write 0 to the location, not -1. The return of the “old” value means that the first requester of the function will get the initialized value.

However that doesn’t seem likely as an explanation for your observation(s).

Your general description seems reasonable to me. I don’t think I would be able to spot the problem without a complete test case. If you want to provide one, it should be as short as you can make while still demonstrating the problem. It’s quite possible that in the creation of the short test case, you may discover the problem yourself.

For me to help, your test case would need to be something I can copy, paste, compile, and run, and see the problem without having to add anything or change anything.

raminbarca · March 12, 2017, 11:20pm

Thanks for your help, I solved the problem. The issue was that instead of using the return value of the atomicAdd(), I was reading the result from memory again. Because all the additions had been done, all the blocks after each iteration of the loop read the same value, but when I use the return value and write it in a location in shared memory for all the threads in a block to read, it passes the test.

Topic		Replies	Views
atomicAdd() during loop not work well but at end work well CUDA Programming and Performance	3	1186	May 20, 2010
AtomicAdd result incorrect CUDA Programming and Performance	3	1595	December 29, 2018
Atomic functions problem CUDA Programming and Performance	8	1829	May 30, 2009
atomicAdd occasionally fails on doubles CUDA Programming and Performance	3	6831	October 12, 2011
AtomicAdd() functions CUDA Programming and Performance	1	753	December 9, 2016
Problems after atomicAdd CUDA Programming and Performance	0	9245	June 28, 2011
Get different results for every running with atomicAdd() CUDA Programming and Performance	2	371	October 3, 2022
AtomicAdd algorithm CUDA Programming and Performance	7	3765	August 25, 2009
Bug: atomicInc() always inc's by one CUDA Programming and Performance	3	1301	August 31, 2011
incorrect results from atomicAdd (maybe the method is incorrect) CUDA Programming and Performance	1	3765	May 2, 2010

How can I make sure atomicAdd() was successful?

Related topics