Confusion about __syncwarp() if all threads in a warp are automatically in sync?

garko12345 · March 9, 2020, 9:08pm

Hi guys,

While reading the docs I came to the conclusion that threads in a warp are always synchronized? Because the SM operates with warps with SIMT model, i.e. it issues the same instruction concurrently to each thread in a warp.

So if we have a source

"
global f() {
line_1
line_2
…
}
"
which translates to machine code (SASS) as
"
IR_1
IR_2
…
IR_N
"

each thread in a warp will get the instruction IR_k at the same time.

So what’s the use of “__syncwarp()”? Where am I wrong?

njuffa · March 9, 2020, 9:38pm

I would suggest reading more of the docs, and all will be revealed :-) In particular, keep on the lookout for word “divergence”.

garko12345 · March 10, 2020, 9:27am

I did, and look what it says under the “4. Hardware implementation; 4.1 SIMT Architecture”:

“Prior to Volta, warps used a single program counter shared amongst all 32 threads in the warp together with an active mask specifying the active threads of the warp. As a result, threads from the same warp in divergent regions or different states of execution cannot signal each other or exchange data, and algorithms requiring fine-grained sharing of data guarded by locks or mutexes can easily lead to deadlock, depending on which warp the contending threads come from.”

It says in the first sentence that warps use a single program counter. So at each clock cycle the same instruction is issued to the whole warp. So my question is: how can threads be out of sync intra-warp, i.e. what is the use of __syncwarp()?

Please help me, I don’t understand…

Gnimuc · March 10, 2020, 9:44am

IIRC, __syncwarp() is introduced in CUDA 9, and is for architectures that are not prior to Volta. :) You may be interested in this post: Using CUDA Warp-Level Primitives | NVIDIA Technical Blog

Topic		Replies	Views
Why __syncwarp is necessary in undivergent warp reduction? CUDA Programming and Performance	6	4399	April 1, 2022
is syncthreads needed when will divergent threads in same warp re-sync CUDA Programming and Performance	9	3399	January 23, 2012
syncronize a warp CUDA Programming and Performance	8	2943	August 25, 2008
Warp Synchronisation Problem? CUDA Programming and Performance	4	3950	November 30, 2008
Is syncthreads required within a warp? CUDA Programming and Performance	10	12783	November 8, 2013
are threads of a warp really sync? CUDA Programming and Performance	2	862	August 3, 2011
about the __syncwarp() in P100 CUDA Programming and Performance	11	4267	June 6, 2018
warp synchronization test CUDA Programming and Performance	5	1785	September 2, 2014
Do I understand the nuances of __syncwarp() and __shfl() correctly? CUDA Programming and Performance	11	714	July 17, 2024
Do __syncwarp() and __ballot_sync() protect global write scheduling? CUDA Programming and Performance	2	515	January 19, 2024

Confusion about __syncwarp() if all threads in a warp are automatically in sync?

Related topics