Atomic operation and variable access

Martini · January 13, 2021, 4:25pm

Is it fair to say that if I have a kernel in which I perform an atomic operation (say an atomicAdd) on foo in global memory and later in the same kernel i read the value of foo, then I’m looking for trouble? It can be the other way too: reading foo and then atomically incrementing it.

I can imagine a situation when a thread from a block reads foo through a pointer while a thread from a different block, exactly at the dame time, is atomically operating on foo. While this could happen rarely, when it does the behavior i imagine it’s undefined.

Am i getting this right?

Robert_Crovella · January 13, 2021, 5:14pm

The behavior isn’t purely UB, in my opinion. However there is no particular guarantee of ordering unless you go to some lengths to impose it.

The behavior is that a read will get a value that was there before the atomic op, or else after the atomic op. Which one is undefined unless you go to some length to impose ordering (both execution barriers as well as visibility guarantees). (**)

We should distinguish the above from the behavior when a single thread is doing all the activity, which doesn’t seem to be what you are asking about.

Let’s be specific. Suppose location x has the value 100. Let’s suppose that one thread (A) is doing an atomic add to x of 1, and another thread (B) is reading x. Let’s also assume there is no other activity of any kind with respect to x.

B should expect read either 100, or 101. No other values should be expected/possible.

(**) using “undefined” here strikes me as obvious. Two unsynchronized threads may or may not see each other’s writes. I don’t think this concept is unique or specific to CUDA. But if you need to refer to that as undefined, so be it.

Martini · January 13, 2021, 6:01pm

Robert - thanks for your input, I see it the same way, with one caveat.
Is there a distinction to be made between UB and a race condition? If i understand correctly your point of view, you are saying that the problem i described would lead to a race condition.

However, can it be UB, that is, worse than a race condition?
Here’s what i mean, perhaps you can help me see things straight.

Let’s say that the atomic op is on a 64 bit wide variable. Could it be that the first 32 bits are updated one clock cycle before the last 32 bits, whilst the read proceeds by first reading the last 32 bits and then the first 32 bits?

What you suggested in your post is that you can read a dog or a cat. I’m asking whether there can be a bad case when the variable i’m getting is part dog, part cat.

Thanks for your input. Much appreciated.

Robert_Crovella · January 13, 2021, 7:19pm

In the general (non-atomic) case, I would certainly suggest that you avoid the scenario where different threads are writing to locations that overlap but are of different sizes. That is a very complex scenario to unpack.

However in this case, I would trust the use of the word atomic here to be exactly what it implies. for a 64-bit atomic, that is an uninterruptable Read-Modify-Write operation, on a properly aligned (naturally aligned) 64-bit quantity.

At the moment, if one of the operations is an atomic, I’m at a loss to explain how you might observe something other than a dog or a cat. I don’t think it is possible.

Topic		Replies	Views
How's atomic operations in CUDA implemented? CUDA Programming and Performance cuda , kernel , programming	8	3153	March 26, 2024
Memory Reading and Atomic Operations CUDA Programming and Performance	3	33	December 20, 2024
Variable Number of Results CUDA Programming and Performance	3	1680	April 10, 2009
Concurrent writes by different blocks in a kernel CUDA Programming and Performance	4	1056	December 14, 2011
Which write operations are atomic in CUDA? CUDA Programming and Performance	6	3203	October 8, 2017
Q: read/writing data by multiple threads CUDA Programming and Performance	4	2360	July 15, 2009
can one force two operations to occur atomically together? CUDA Programming and Performance	2	1468	June 30, 2015
Question about atomic operation CUDA Programming and Performance	5	852	June 19, 2017
Are long integer assignments atomic? Atomicity of assignment operator CUDA Programming and Performance	3	5241	May 9, 2011
Useful Arbitrary Atomic Operation Hack CUDA Programming and Performance	0	10061	July 20, 2008

Atomic operation and variable access

Related topics