behaviour of overlapping writes into shared memory from single warp (software attomic operations)

laughingrice · May 18, 2010, 6:41pm

I’ve been working with the software atomic operations in shared memory appearing in the histogram (256 bin) example.
It says that when multiple threads in a warp write an int (32 bits) into the same place in shared memory, only one write gets executed and others are dropped.
I was wondering about the limits of this behavior, that is

What happens with larger/smaller write sizes i.e
(a) two threads writing bytes to different positions in the same 32bit word
(B) two threads writing an int4 for example or a structure
What happens with a partial overlap, i.e, one thread writing to bytes 0-3 and second thread writing to byte 2-5

is the behvior defined or can’t I count on what happens?

Thanks

tmurray · May 18, 2010, 6:43pm

Undefined.

SPWorley · May 18, 2010, 9:56pm

I did some tests here that found that shared memory writes were atomic in bytes… meaning that if you’re writing to byte X, other threads writing to bytes X-1 and X+1 won’t affect your byte writes. Multiple thread writes to the same byte will of course have races, and “one will succeed” with no ordering promises.

There’s two posts in that thread… one is some code showing the byte-wise writes don’t interfere, and a followup measuring speeds showing that byte writes were (very surprisingly) about 10% faster than word writes.

So for both of your questions, your byte-wise addressing should probably work OK.

tmurray · May 18, 2010, 10:07pm

Okay, 1 is defined. Byte stores work. Overlapping anything doesn’t work.

Topic		Replies	Views
shared memory intra-warp conflicts summing into shared memory, how? CUDA Programming and Performance	2	2790	September 5, 2009
Clarification on Memory Access issue CUDA Programming and Performance	1	3727	September 9, 2009
Which write operations are atomic in CUDA? CUDA Programming and Performance	6	3207	October 8, 2017
Warp writes to the shared memory CUDA Programming and Performance	0	1652	June 2, 2009
Shared memory bytewise memory write guarantees CUDA Programming and Performance	3	9463	June 1, 2009
setting bits in shared memory CUDA Programming and Performance	16	17081	June 6, 2007
Multiple writes to the same location using thread CUDA Programming and Performance	2	4117	July 12, 2007
Are Coalesced Memory Transactions Atomic? CUDA Programming and Performance	1	786	December 8, 2009
Concurrent access to struct fields CUDA Programming and Performance	6	998	June 23, 2013
Deliberate race condition CUDA Programming and Performance	4	23	January 14, 2025

behaviour of overlapping writes into shared memory from single warp (software attomic operations)

Related topics