Hello, I have produce a simple kernel to study warp divergence. I am trying for force the first 16 threads of a warp to do something different from the last 16 threads of the warp. The idea is that if this leads to serialization of the first and second group of 16 threads, the run time should be do…

Warp divergence in independent thread scheduling?

Robert_Crovella September 7, 2021, 1:25am 2

Yes. All GPUs of Volta family or newer have the volta thread execution model (independent thread scheduling).
It is always active, you cannot disable it. (You might be able to disable it for Volta architecture on some CUDA versions, by compiling for an arch less than 7.0, but this is something I would not rely on, and it would limit you from doing the right thing in terms of compilation strategy).
Warp divergence may still have a cost.
Here is an example.

Topic		Replies	Views
Must all threads execute the same code? "Branch divergence occurs only within a warp" CUDA Programming and Performance	5	2927	December 28, 2008
Does the new independent thread scheduling give better performance? CUDA Programming and Performance	4	2664	February 6, 2020
Question regarding conditionals in kernels CUDA Programming and Performance	4	4976	July 31, 2010
Is there efficient way to deal with if/else in the kernel CUDA Programming and Performance	4	13820	June 14, 2009
Question about divergence and loops CUDA Programming and Performance	7	7067	November 21, 2010
Thread divergence due to IF CUDA Programming and Performance	3	6853	September 13, 2007
Diverge-free doesn't win 32x over Diverge-all warp divergence CUDA Programming and Performance	6	3114	September 14, 2007
Difference between Thread Divergence and Warp Divergence CUDA Programming and Performance	3	8903	September 7, 2018
Is there warp divergence in reduce0 kernel which is implemented in the CUDA sample Reduction? CUDA Programming and Performance	4	861	January 8, 2020
threads diverging in a loop when does a loop cause divergance CUDA Programming and Performance	13	20910	May 12, 2007