__syncthreads and return() looking for the optimal way to safely catch excess threads

Alex_Loddoch · July 28, 2009, 6:17am

Hi everyone.

I understand that __synchtreads in conditional code may lead to undefined results/behavior (if the condition evaluates differently within one block).
How about __syncthreads following a conditional return() statement? I.e. do returned threads implicitly satisfy all subsequent syncthreads (as they don’t access shared/global memory anymore)?

It “seems” to work, but…

Background for my question is that I’m looking for the optimal (if exists) way to safely handle excess threads in a block. (to avoid outofbounds access, etc)
If each thread updates/operates on one point of a N-element vector with N > blockDim.x and N % blockDim.x > 0, there seem to be different ways to handle this:

pad the vector accordingly
probably not advisable/possible for large, 3D arrays
pad by one element and do something like tx = min(threadIdx.x, N), i.e. collate ALL excess threads on ONE dummy element
causes shmem bank conflicts and wastes cycles (and doesn’t look nice)
use if…then…else conditionals for everything EXCEPT the __syncthreads
works, but serializes warps and obfuscates the code
return unused threads (see my question): if (threadIdx.x >=N) return;
safe? if so, does it cause warps to serialize?

Would it be a combination of 1 and 3: pad to multiple of warpsize and ifthenelse the rest?

Cheers, Alex

Topic		Replies	Views
Thread return on conditional CUDA Programming and Performance	1	1005	March 13, 2012
Bug report: Threads out of sync, branched syncthreads problem CUDA Programming and Performance	2	1663	November 30, 2009
A stupid question on __syncthread() function CUDA Programming and Performance	5	5415	May 17, 2022
__syncthreads() is ignored by threads CUDA Programming and Performance	4	7632	December 5, 2011
__syncthreads(); doesn't work more than 32 threads in a block. Shared memory: make sure the shared CUDA Programming and Performance	5	1152	December 9, 2013
__syncthreads() question CUDA Programming and Performance	1	1437	January 14, 2009
Strange __syncthreads behavior CUDA Programming and Performance	2	1049	January 21, 2014
__syncthreads() code execution hangs CUDA Programming and Performance	4	4089	September 20, 2007
Semantics of __syncthreads CUDA Programming and Performance	18	18040	January 2, 2008
__syncthreads(); on tensor GPU CUDA Programming and Performance	3	590	January 2, 2020

__syncthreads and return() looking for the optimal way to safely catch excess threads

Related topics