Hi,
I took me a while messing with PTX to find out why my program didn’t run correctly. It appears that the compiler just eliminates by optimization the following line completely:
while (a==0) {;}
__syncthreads();
the purpose of this line is that all threads wait until a condition is met. My only solution for now is to do this:
volatile int dummy;
while (a=0) {dummy++;}
however that costs registers+performance. Is there a way to tell the compiler to not remove the statement.
What could make [font=“Courier New”]a[/font] become zero? If it is set by some other thread in the same block, [font=“Courier New”]__syncthreads();[/font] should be sufficient on its own. If it is set by a different block, you are in dangerous territory.
Is [font=“Courier New”]a[/font] declared volatile? If not, the compiler has just eliminated a potential endless loop.
Registers can never be volatile, as the semantic of [font=“Courier New”]volatile[/font] is just to prevent keeping a variable in a register. A volatile global variable should do.
Use [font=“Courier New”]__threadfence()[/font] in the thread that changes [font=“Courier New”]a[/font] to flush it out to global memory (or at least L2 cache).