Discard unnecessary threads

Hi,

is there a possibility to discard unused CUDA threads, like the “discard” keyword in GLSL shaders?
I didn’t find anything about it in the programming guide.
If I need an uneven number of threads, the last block will only be used partly. I tried an if-clause around the whole kernel, that sorts out unused threads. However, synchronizing threads won’t work that way, because the unused ones never reach __syncthreads() operation.
Just discarding those threads would be nice and simple.

Thanks for your help!

There’s no such thing (except 'IF’ing them out). There would also be no performance benefit from something like this unless the hardware could merge warps. At present it can’t, so discarded threads would just be windmilling nops - which is exactly what they do now when you IF them out. Unless the condition happens to be such that it IFs out an entire warp (32 consecutive threads) - then the whole warp finishes instantly and is effectively discarded.

It is indeed tricky to use __syncthreads() in this scenario. Here’s one way to do it:

Instead of doing this:

__global__ void kernel(...)

{

	if(particular thread is useful)

	{

		do work

		...

		shared memory write

		__syncthreads(); //warning, deadlock can occur

		shared memory read

		do more work

		...

	}

	

}

Do this:

__global__ void kernel(...)

{

	if(particular thread is useful)

	{

		do work

		...

		shared memory write

	}

	

	__syncthreads(); //even the windmilling threads reach this properly

	

	if(particular thread is useful, same condition)

	{

		shared memory read

		do more work

		...

	}

	

}

When you need __syncthreads you can break out of your condition block, do an unconditional sync and then get back to conditional code.