GPU architecture and warp scheduling

BulatZiganshin · February 8, 2018, 6:59pm

txbob, for you it may be easier to ask inside nvidia

it’s not clearly documented. in Fermi whitepapers, register file (page 8) was pictured as monolith, but page 10 shown that left scheduler executes only even warps, while right scheduler executes only odd warps: http://www.nvidia.com/content/PDF/fermi_white_papers/NVIDIA_Fermi_Compute_Architecture_Whitepaper.pdf

Kepler whitepaper also pictured register file as monolith: http://www.geforce.com/Active/en_US/en_US/pdf/GeForce-GTX-680-Whitepaper-FINAL.pdf

Fortunately, hardware.fr publications contained more exact picture of each SM, in particular for Kepler: http://www.hardware.fr/medias/photos_news/00/44/IMG0044011_1.jpg

Finally, starting with Maxwell, NVidia fixed their picture and started to show register file as individual per scheduler. See page 8 in documents below:

http://international.download.nvidia.com/geforce-com/international/pdfs/GeForce_GTX_980_Whitepaper_FINAL.PDF

And individual register file per scheduler means that warp cannot be quickly moved to other scheduler - it will require to copy all register contents. Or it will require to make all registers available to all schedulers, but this will require to make 4x more read/write ports (which is very precious resource) in the register file and in this case, it will be more logical to continue show register file as shared by the all schedulers

Topic		Replies	Views
Warp scheduling - have I got this right? CUDA Programming and Performance	17	12259	February 12, 2013
warp and core What's the relationship between warp and core? CUDA Programming and Performance	12	15654	February 4, 2011
GPU architecture and CUDA kernel execution CUDA Programming and Performance	13	24933	September 6, 2009
Warp Size Question CUDA Programming and Performance	21	14079	June 18, 2010
Branch Divergence Serialization (Threads/hardware stalls ?) Performance Impact ? Branch divergence s CUDA Programming and Performance	3	1602	June 15, 2011
Kernel launch failure plus Warp execution performance CUDA Programming and Performance hw , cuda	13	716	May 9, 2024
Can threads in a warp from different blocks? CUDA Programming and Performance	17	11888	March 26, 2010
Multiprocessors or Cuda Cores CUDA Programming and Performance	25	19850	July 5, 2011
How many parallel threads? CUDA Programming and Performance	19	10187	October 1, 2021
questions about thread execution & volatile CUDA Programming and Performance	19	16952	December 29, 2008

GPU architecture and warp scheduling

Related topics