Hi,
I was wondering if there are software solutions for dealing with CUDA’s relaxed memory consistency model. So basically I don’t want to use any kind of barrier synchronization as it might slow down my computation. Is it possible to have a forced ordering to make sure that the reads and writes to shared memory are in a specified order.
i)
shared int temp=0;
int result;
temp = 1;
result = temp;
ii)
shared int temp = 0;
int step = 0;
int result;
while(true){
if (step==0){
temp = 1;
}
if (step==1){
result = temp;
break;
}
step++;
}
The i) and ii) above are basically trying to do the same thing: first write into shared memory location for variable “temp” and then read from same location. Due to CUDA’s relaxed consistency model i) can lead to any value of “result” (so it could be either 0 or 1). But my question is does ii) make sure that result will always have value 1 in other words read will always be done after preceding write has been done.
I doubt it because my understanding of the way reads and writes are done (in case of relaxed memory consistency model) is by using a buffer for writes. So write requests are added to the buffer and serviced in FIFO order. But that is not the case for read requests which are serviced as soon as one is received. I just want to know if that is the same case for CUDA as well? And if that is true then what I am trying to do in ii) won’t work!!
Thanks,
Abhi