What are the definitions of the metrics l1tex__cycles_active and l1tex__cycles_elapsed here? Particularly what does “active” and “elapsed” mean here? When I wanted to calculate the throughput of L1 cache, I was stuck in deciding which cycles to use. I know l1tex__throughput can give L1 throughput based on elapsed cycles, but why? and why not active cycles? Thanks.
You can find the definition of the terms “elapsed” and “active” here under "Cycle Metrics. Elapsed is all the cycles that were counted for this unit, while active refers to the subset of these that processed data. You can also find more information in this recent post.
The throughput of the L1 cache is given in the metric l1tex__throughput.avg.pct_of_peak_sustained_active, please refer to the SpeedOfLight.section (GPU Speed Of Light Throughput), which shows the metrics associated with the high-level throughputs of many important HW units.
I know l1tex__throughput.avg.pct_of_peak_sustained_active is represented as percentage. Is it the percentage to maximum throughput of L1? What is the maximum throughput of L1 (GB/s)? Thanks.
Most Nsight Compute metrics have structured suffixes as explained here. The l1tex__throughput metric is computed as the maximum of its constituent sub-metrics, i.e. its breakdown. You can list these breakdown metrics e.g. by collecting
ncu --metrics breakdown:l1tex__throughput.avg.pct_of_peak_sustained_active
You can subsequently inspect any of these breakdown metrics and collect their theoretical, average, sustained maximum using the “.avg.peak_sustained” suffix, e.g. l1tex__data_bank_reads.avg.peak_sustained.