Dealing with relaxed memory consistency model

Abhi · February 12, 2010, 10:04pm

Hi,
I was wondering if there are software solutions for dealing with CUDA’s relaxed memory consistency model. So basically I don’t want to use any kind of barrier synchronization as it might slow down my computation. Is it possible to have a forced ordering to make sure that the reads and writes to shared memory are in a specified order.

i)
shared int temp=0;
int result;
temp = 1;
result = temp;

ii)
shared int temp = 0;
int step = 0;
int result;
while(true){
if (step==0){
temp = 1;
}
if (step==1){
result = temp;
break;
}
step++;
}

The i) and ii) above are basically trying to do the same thing: first write into shared memory location for variable “temp” and then read from same location. Due to CUDA’s relaxed consistency model i) can lead to any value of “result” (so it could be either 0 or 1). But my question is does ii) make sure that result will always have value 1 in other words read will always be done after preceding write has been done.

I doubt it because my understanding of the way reads and writes are done (in case of relaxed memory consistency model) is by using a buffer for writes. So write requests are added to the buffer and serviced in FIFO order. But that is not the case for read requests which are serviced as soon as one is received. I just want to know if that is the same case for CUDA as well? And if that is true then what I am trying to do in ii) won’t work!!

Thanks,
Abhi

SPWorley · February 12, 2010, 10:25pm

Nope, neither of your codes will guarantee to work. In particular another thread in another part of your code could be setting temp to any value at all at any time.

You’re fighting the wrong battle.

__syncthreads() is your friend here. It’s not expensive! It’s certainly faster than your losing battle in method (ii), as well as being guaranteed correct.

tmurray · February 12, 2010, 10:33pm

The first one will actually always return 1.

SPWorley · February 12, 2010, 10:38pm

Not necessarily… there could be a race with later instructions which also update temp…

__shared__ int temp=0;

int result;

temp = 1;

result = temp;  // result may not be 1

temp=threadIdx.x;

tmurray · February 12, 2010, 10:55pm

Even in that case, I’m not sure you will ever see something besides 1 because of when warps switch.

Regardless, you should not listen to me and certainly not rely upon this behavior for real apps. There are plenty of other cases when you have to worry about these things.

Abhi · February 13, 2010, 3:56am

Thanks a lot guys I appreciate your help… I make sure that only one thread make these changes but still then reading and writing to shared memory (temp variable) could be in any order…I had doubts if either of i) or ii) would work…but now i am sure they won’t and so I am off to using __syncthreads()

–Abhishek

Topic		Replies	Views
Memory Consistency Model (strong or weak) CUDA Programming and Performance	5	4591	May 4, 2009
CUDA Memory Consistency CUDA Programming and Performance	23	55624	March 8, 2007
shared memory writes CUDA Programming and Performance	6	3153	December 30, 2007
I want to ask a shared memory efficiency question in Sort. CUDA Programming and Performance	3	527	October 11, 2019
General Shared Memory Question CUDA Programming and Performance	5	6626	March 4, 2010
setting bits in shared memory CUDA Programming and Performance	16	17124	June 6, 2007
Deliberate race condition CUDA Programming and Performance	4	27	January 14, 2025
Can reads be forced to serialize? CUDA Programming and Performance	4	4567	November 8, 2008
Race condition within warp CUDA Programming and Performance	9	3016	September 20, 2016
Logical OR pattern CUDA Programming and Performance	9	2215	November 4, 2009

Dealing with relaxed memory consistency model

Related topics