L2 throughput metrics

Hi
I see two metrics for L2 like:

l2_read_throughput        lts__t_sectors_op_read.sum.per_second + lts__t_sectors_op_atom.sum.per_second + lts__t_sectors_op_red.sum.per_second 
l2_tex_read_throughput 	   lts__t_sectors_srcunit_tex_op_read.sum.per_second

When I check the definitions from nv-nsight-cu, I see

lts__t_sectors_op_read                 # of LTS sectors for reads       
lts__t_sectors_srcunit_tex_op_read     # of LTS sectors from unit TEX for reads   

It seems that l2_tex_read_throughput represents the throughput from tex to l2. I would like to know what is l2_read_throughput then? Is that the path from dram to l2?

The first definition (not sure of the source)
l2_read_throughput = lts__t_sectors_op_read.sum.per_second + lts__t_sectors_op_atom.sum.per_second + lts__t_sectors_op_red.sum.per_second
is the throughput of all operations that read from the L2. This can include other units such as (a) the display controller, (b) nvlink, (c) MMU, etc.

lts__t_sectors_op_read counts only read operations. This does not count L2 data bank reads due to atomics or reductions.

lts__t_sectors_src_unit_tex_op_read is a subset of the L2 data bank reads due to read operations from the SM L1TEX unit.

l2_tex_read_throughput is the read throughput limited to requests from the SM L1TEX unit. As stated above the SM has many memory clients including but not limited to

  • asynchronous engines (display, copy engines, nvdec, nvenc, …)
  • memory clients (MMU, host (CPU))
1 Like