Dealing with relaxed memory consistency model

I was wondering if there are software solutions for dealing with CUDA’s relaxed memory consistency model. So basically I don’t want to use any kind of barrier synchronization as it might slow down my computation. Is it possible to have a forced ordering to make sure that the reads and writes to shared memory are in a specified order.

shared int temp=0;
int result;
temp = 1;
result = temp;

shared int temp = 0;
int step = 0;
int result;
if (step==0){
temp = 1;
if (step==1){
result = temp;

The i) and ii) above are basically trying to do the same thing: first write into shared memory location for variable “temp” and then read from same location. Due to CUDA’s relaxed consistency model i) can lead to any value of “result” (so it could be either 0 or 1). But my question is does ii) make sure that result will always have value 1 in other words read will always be done after preceding write has been done.

I doubt it because my understanding of the way reads and writes are done (in case of relaxed memory consistency model) is by using a buffer for writes. So write requests are added to the buffer and serviced in FIFO order. But that is not the case for read requests which are serviced as soon as one is received. I just want to know if that is the same case for CUDA as well? And if that is true then what I am trying to do in ii) won’t work!!


Nope, neither of your codes will guarantee to work. In particular another thread in another part of your code could be setting temp to any value at all at any time.

You’re fighting the wrong battle.

__syncthreads() is your friend here. It’s not expensive! It’s certainly faster than your losing battle in method (ii), as well as being guaranteed correct.

The first one will actually always return 1.

Not necessarily… there could be a race with later instructions which also update temp…

__shared__ int temp=0;

int result;

temp = 1;

result = temp;  // result may not be 1


Even in that case, I’m not sure you will ever see something besides 1 because of when warps switch.

Regardless, you should not listen to me and certainly not rely upon this behavior for real apps. There are plenty of other cases when you have to worry about these things.

Thanks a lot guys I appreciate your help… I make sure that only one thread make these changes but still then reading and writing to shared memory (temp variable) could be in any order…I had doubts if either of i) or ii) would work…but now i am sure they won’t and so I am off to using __syncthreads()