I’ve got a little question about working with global memory.
My kernel needs to write variable to global memory based on some condition. I can guarantee that condition evaluates to TRUE on no more than 1 thread in whole grid.
The problem is that even if condition is TRUE write never occurs unless I call __syncthreads() before. Sample code is shown below:
uint4 vec = CalcVec( data ); // If __syncthreads() is present write is performed __syncthreads(); if( vec.x == myConst ) *( (unsigned int*) pdOut ) = tid;
I really cannot understand why this __syncthreads() is needed: my kernel does not use local or shared memory at all (.cubin says: lmem=0, smem=20, reg=14).