Waht's the difference between 'wavefronts' and 'sectors/Req'?

felix_dt · January 7, 2021, 7:47am

A simplified model for the processing in L1TEX for Volta chips and newer architectures can be described as follows: When an SM executes a global/local/shared memory instruction for a warp, a single request is sent to L1TEX. This request communicates the information for all participating threads of this warp (up to 32 threads). For local and global memory, based on the access pattern and the participating threads, the request requires to access a number of cache lines, and sectors within these cache lines. The L1TEX unit has internally multiple processing stages operating in a pipeline.

A wavefront is the maximum unit of work that can pass through that pipeline stage per cycle. If not all cache lines or sectors can be accessed in a single wavefront, multiple wavefronts are created and sent for processing one by one, i.e. in a serialized manner. Limitations of the work within a wavefront may include the need for a consistent memory space, a maximum number of cache lines that can be accessed, as well as various other reasons. Each wavefront then flows through the L1TEX pipeline and fetches the sectors handled in that wavefront. The given relationships of the three key values in this model are requests:sectors is 1:N, wavefronts:sectors 1:N, and requests:wavefronts is 1:N.

In the documentation we describe a wavefront as a (work) package that can be processed at once, i.e. there is a notion of processing a wavefront per cycle in L1TEX. Wavefronts therefore represent the number of cycles required to process the requests, while the number of sectors per request is a property of the access pattern of the memory instruction for all participating threads. For example, it is possible to have a memory instruction that requires 4 sectors per request in 1 wavefront. However, you can also have a memory instruction having 4 sectors per request, but requiring 2 or more wavefronts.

Topic		Replies	Views
Problem about bank conflict test CUDA Programming and Performance	6	512	March 12, 2024
Reuse of L1/shared memory during execution of consecutive wavefronts CUDA Programming and Performance	2	379	April 7, 2024
Question about l1tex__data_pipe_lsu_wavefronts.avg Nsight Compute	8	288	April 23, 2025
Understanding L1/TEX Cache Sectors/Req Nsight Compute	4	245	December 13, 2024
Difference in number of wavefronts for strided access to shared-memory and L1 cache in Ampere GPUs Nsight Compute hw	1	820	February 16, 2023
What's the diff between X-stage, M-stage and T-stage Nsight Compute	2	1284	April 26, 2022
Different betweent in lts__t_sectors_srcunit_tex_op_read.sum and lts__t_bytes.sum Nsight Compute	5	523	June 24, 2024
Different wavefront between global and surface read Nsight Compute	0	534	January 22, 2022
Understanding cache throughput in Nsight Nsight Compute	4	2589	July 30, 2021
Shared memory bank conflicts and nsight metric CUDA Programming and Performance	15	5446	October 19, 2024

Waht's the difference between 'wavefronts' and 'sectors/Req'?

Related topics