__syncthreads() inside an if-then block

Hello to all,

I am writing a kernel, and I wonder if it is possible to synchronize threads inside an if-then block.

For example:

[codebox]

int i = blockIdx.x*blockDim.x + threadIdx.x

if (i>0 && i<N)

{

dosomething

__syncthreads()

dosomethingelse

}

[/codebox]

will it work?

Or I have to implement the kernel like this:

[codebox]

if (i>0 && i<N)

{

dosomething

}

__syncthreads()

if (i>0 && i<N)

{

dosomethingelse

}

[/codebox]

Thanks in advance

Francesco

I don’t think you can use __syncthreads in that way. If some threads branch around the synchronization call and exit early, then the kernel will probably hang.

Remember that __syncthreads() causes all executing threads (in the block) to reach that command before moving on. But if you ensure that the branch is taken by either ALL threads in an arbitrary block or NONE of the threads in an arbitrary block, you can use it that way.

Just an aside…can’t the compiler check for this and throw an error if you try to __syncthreads() inside of an if statement?

Compiler support would be neat, however I think emulation mode catches these invalid uses of __syncthreads(). I know I have seen error messages complaining about me doing this before…

I’m not sure you can tell in all situations at compile-time if the entire block will follow the same branch, in which case the __syncthreads() is ok.

Indeed, a switch(blockIdx.x) for example can contain syncthreads without problems. ( A nice way to have a kernel perform multiple ‘kernels’ btw ;) )

maybe

int i = blockIdx.x*blockDim.x + threadIdx.x
if (i>0 && i<N)
{
dosomething
__syncthreads()
dosomethingelse
}
else
{
__syncthreads();
}

will work?

I seem to recall tmurray saying this is not correct either (though it might work currently, not sure). You really need both branches to hit the same instruction, which is annoying because you will have to essentially cut your blocks up:

if(i > 0 && i < N) {

// do something

}

__syncthreads();

if(i > 0 && i < N) {

// do some more stuff

}

I tried that once and it didn’t work :lol:

Dear Denis- Would you have any example code you could share? I’m wrestling with concept of efficiently branching with one branch a standard array walk-thru and the other an RNG func/kernel. Not sure if switching could be efficiently/correctly brought to bear here.

TIA, Vince

Well, not really at hand. I have a kernel where I have a bunch of inputs and calculate mean, max and min of 7 different values. There I have something like this:

switch(blockIdx.x) {

case 0 : {

 get data from first array (contains syncthreads())

}

case 1 : {

 get data from other array

}

case 2 : {

 get data from yet another array

}

etc.

}

perform reduction

switch(blockIdx.x) {

case 0 : {

 write data for first array

}

case 1 : {

 write data for other array

}

case 2 : {

 write data for yet another array

}

etc.

}

But in general you can have completely different code in the case parts, where you have 1 dimension of your grid for ‘switching’ between kernels. As long as your kernels take about the same amount of time, you will probably not have too much trouble performance-wise