execution within one diverged warp

Martini · February 21, 2020, 3:22pm

On Volta, there is not one PC per warp, but 32 registers that hang on to the PC for each thread. Thus, there is no implicit assumption of execution in lock-step fashion, although I understand that a scheduler optimizer will still attempt to get all threads to execute the same instruction for better performance. Is my understanding correct thus far?

The question that I am curious about is as follows: Say warp 5 is handled on one SM by “Warp Scheduler A”.
Assume the warp is diverged, and that threads 0 through 9 are about to execute instruction “foo” while threads 10 through 31 would like to execute next instruction “bar”.
Is it true that we still cannot have on Volta “Warp Scheduler A” issue for execution at the same time “foo” and “bar”? And this is why the scheduler optimizer has the mission of bringing the threads in the warp to execute the same instruction, given that the threads could go anywhere they want since they have their own PC?

Robert_Crovella · February 21, 2020, 3:35pm

The two separate paths cannot both be issued in the same clock cycle.

I refer you to page 32 of the Volta white paper:

https://images.nvidia.com/content/volta-architecture/pdf/volta-architecture-whitepaper.pdf

As well as the observation that individual warp schedulers on volta are not dual-issue:

https://arxiv.org/pdf/1804.06826.pdf

Martini · February 21, 2020, 3:55pm

Very helpful, thank you, Robert!

Topic		Replies	Views
Difference between Thread Divergence and Warp Divergence CUDA Programming and Performance	3	10350	September 7, 2018
Question about warp execution and the warp scheduler CUDA Programming and Performance	5	522	July 5, 2025
Threads Dispatching : 2 different instructions per cycles? CUDA Programming and Performance	2	131	January 31, 2025
handling thread divergence, Volta and Turing CUDA Programming and Performance	2	1627	January 19, 2020
Warp thread Scheduling CUDA Programming and Performance	7	2335	June 28, 2010
Can warps from different CTAs be coscheduled? CUDA Programming and Performance	5	367	July 6, 2024
warp scheduler of Fermi architecture CUDA Programming and Performance	2	3291	February 5, 2012
Basic question about warps CUDA Programming and Performance	14	6796	June 9, 2009
Do all threads in a warp share the same PC? CUDA Programming and Performance	9	180	November 5, 2025
"Half-warps", scheduling, and branch divergence CUDA Programming and Performance	3	4377	February 24, 2013

execution within one diverged warp

Related topics