Early return and __syncthreads() function

pengshenglin · May 14, 2024, 7:48am

In CUDA Programming Guide it says

Blockquote waits until all threads in the thread block have reached this point and all global and shared memory accesses made by these threads prior to __syncthreads() are visible to all threads in the block.

However, I found that in practice the kernel does not hang if some of the threads within a block returns before reaching __syncthreads()(which is a common case when the last block is not fully utilized). I wonder why this happens. Should the documentation be modified or the behaviour of the kernel function under this situation is undefined?

Curefab · May 14, 2024, 10:01am

__syncthreads() internally uses the barrier synchronization, see PTX ISA 8.4

The description there indicates that exited threads are just ignored:

Operand b specifies the number of threads participating in the barrier. If no thread count is specified, all threads in the CTA participate in the barrier. When specifying a thread count, the value must be a multiple of the warp size.

barrier{.cta} instruction causes executing thread to wait for all non-exited threads from its warp and marks warps’ arrival at barrier.

Robert_Crovella · May 14, 2024, 2:23pm

The ptx info provided is a sensible answer as to

However, the documentation for CUDA C++ should be adhered to. Just because something appears to work does not mean that it is correctly written code.

pengshenglin · May 15, 2024, 2:57am

Thank you! That makes a lot of sense. I guess I should turn to cuda::barrier for finer synchronization control.

Curefab · May 15, 2024, 1:32pm

That is good as you do not just assume how it internally works/compiles, but adhere to the C++ documentation, as also Robert recommends.

Topic		Replies	Views
Thread sync CUDA Programming and Performance	2	794	May 9, 2011
__syncthreads thread syncronization CUDA Programming and Performance	7	18567	October 27, 2009
A stupid question on __syncthread() function CUDA Programming and Performance	5	5211	May 17, 2022
Strange __syncthreads behavior CUDA Programming and Performance	2	1043	January 21, 2014
Question regarding cudaThreadSynchronize() Does it act like a barrier? CUDA Programming and Performance	1	1142	September 16, 2008
Particular thread-thread synchronization CUDA Programming and Performance	3	612	December 25, 2017
__syncthreads() is ignored by threads CUDA Programming and Performance	4	7615	December 5, 2011
__syncthreads() and global memory CUDA Programming and Performance	1	2452	December 1, 2008
"cudaThreadSynchronize()" and "__syncthreads()" CUDA Programming and Performance	1	9738	March 22, 2008
Problems with __syncthreads() CUDA Programming and Performance	2	880	May 4, 2013

Early return and __syncthreads() function

Related topics