Hi
What the error message ‘incorrect use of __syncthreads()’ means in reality ?
I have kernel that runs fine in hardware, but when run in device emulation mode
some blocks terminate early with ‘incorrect use of __syncthreads()’ message,
In what situations this message is triggered ?
I’m suspicious about this construct:
int device foobar()
{ shared someval;
if (threadIdx.x == 0)
{
// calculate someval
}
__syncthreads();
// early out from function
if (someval == 0)
return 0;
int reg_someval = someval;
for (i = 0; i < reg_someval; i++)
{
for (j = 0; …)
{
}
__syncthreads();
if (threadIdx.x == 0)
{
// merge values on thread 0 and write result to shared memory
}
__syncthreads();
}
return result_from_shared_memory;
}
void global fooKernel()
{
int val = foobar();
if (val > 0)
{
// do something on all threads
if (threadIdx.x == 0)
{
// merge values from shared memory and write result to global memory
}
__syncthreads();
}
// do some other calculations
return;
}
how the ‘__syncthreads’ are realised in emulation mode ?
is the code above abuses ‘__syncthreads’ ? - all should be fine since all threads in
the block always follow that same execution path (they all take or not the branch).
__syncthreads() works only if ALL threads of the same block reach it,
ex:
if (val > 0)
{
// do something on all threads
if (threadIdx.x == 0)
{
// merge values from shared memory and write result to global memory
}
__syncthreads();
}
and since val is from shared memory it is that same for whole block :), not a case here
interestingly if i’v put OutputDebugString around every __syncthreads in my kernel to see witch one triggers the message … suprise … the message is gone and all blocks finish now corectly in device emulation mode … what is going on ?