Conditional write to global memory

Hello All,

I’ve got a little question about working with global memory.

My kernel needs to write variable to global memory based on some condition. I can guarantee that condition evaluates to TRUE on no more than 1 thread in whole grid.

The problem is that even if condition is TRUE write never occurs unless I call __syncthreads() before. Sample code is shown below:

uint4 vec = CalcVec( data );

// If __syncthreads() is present write is performed


if( vec.x == myConst )

    *( (unsigned int*) pdOut ) = tid;

I really cannot understand why this __syncthreads() is needed: my kernel does not use local or shared memory at all (.cubin says: lmem=0, smem=20, reg=14).

Any ideas?

I also experienced this once before. Making the pointer different for different threads solved my problem that time. You could also try making pdOut volatile.
However, later investigation into my program showed that there are buffer overflows elsewhere, so I’m not sure about what exactly was the cause of my problem. You’d better also check similar problems.

Well, I guess I do not have buffer overflows since I do not have any buffers :) All data is stored and processed in registers.

Maybe someone from NVIDIA can explain this behaviour?

The described behavior shouldn’t be happening. So, it’s a bug in either your code or CUDA. Post the minimal code that reproduces the problem in a single .cu file, which can be compiled with nvcc without other dependencies.