What is the different between “SM: Pipe Tc Cycles Active [%]” and “SM: Pipe Tensor Cycles Active [%]” in nsight compute

qibiaowen · December 19, 2025, 3:24am

I see the description in nsight compute:

1) for “SM: Pipe Tc Cycles Active [%]”：

SM: Pipe Tc Cycles Active
sm__pipe_tc_cycles_active.avg.pct_of_peak_sustained_elapsed

tc: Tensor Core.
The TC pipeline executes UTCBAR, UTCCP, UTCMMA, UTCSHIFT and UTCSWS instructions.
It is different from the Tensor pipeline.

2) for “SM: Pipe Tensor Cycles Active [%]”:

SM: Pipe Tensor Cycles Active
sm__pipe_tensor_cycles_active.avg.pct_of_peak_sustained_elapsed

tensor: The Tensor pipeline executes various MMA instructions.
It is different from the Tensor Core pipeline.

This description is so confused. What is the actual diff?

I have a ncu report:

Instruction:

ncu_report: bf16_9_1024_16_128_false.ncu-rep.zip (1.8 MB)

In this report, it have not MMA inst, but only have UTC MMA inst. Why Pipe Tensor(Metric: “SM: Pipe Tensor Cycles Active [%]”) is very busy?

Thanks!

Greg · December 19, 2025, 11:50pm

The SM has three types of Tensor instructions:

wmma warp level tensor instructions ({BIHQ}MMA)
GH100 - wgmma warp group level tensor instructions
GB100 - tcgen05.mma CTA and CTA pair tensor instructions (UTC{BIHQ}MMA)

In all three cases the MMA unit (Tensor Cycles) is in the SM sub-partitions.

wmma warp level the instructions execute like a FFMA instruction. The pipeline reads the register file and dispatches to the pipeline.

wgmma warp group level instructions have additional instructions to specify that all dependencies for issuing the warp group instructions have been resolved and the instruction is issued simultaneously by all 4 SM sub-partitions allowing sharing of input.

In tcgen05 there is a new controlling unit in the SM MIO called the Tensor Core unit or TC unit (overloaded term). Instructions are dispatched from a SM sub-partition to an instruction queue. If the instruction is one CTA the TC issues the instruction on all 4 sub-partitions of the SM. If the instruction is CTA pair then the TC issues the instruction on both SMs in the TPC covering 8 sub-partitions.

A tcgen05 instruction will result in both sm__pipe_tc_cycles_active (singleton per SM) and sm[sp]__pipe_tensor_cycles_active (per SMSP) counters being updated. Please note I used the syntax sm[sp] as you can collect pipe_tensor_cycles_active at either the SM level or the SMSP (sub-partition level). For WMMA instructions the per SMSP value could differ. For WGMMA and tcgen05 the pipe_tensor_cycle_active should be the same across all 4 sub-partitions in the SM.

The TC unit is also responsible for some of the other tcgen05 (not .mma) instructions so activity of TC could exceed Tensor Cycles Active. In practice these tend to be very close. In NCU these will likely be collected in different passes so there can be some variance run to run in the % as the denominator can change and real-time stalls in the TC can change.

Topic		Replies	Views
Why Low Tensor Pipe Utilization CUDA Programming and Performance cuda , kernel	4	1538	May 20, 2022
Questions about sm__pipe_*_cycles_active metrics Nsight Compute cuda , kernel	4	102	September 26, 2025
How to get Nsight Compute timeline of tensor cores and cuda cores? Nsight Compute cuda , kernel	5	997	April 16, 2024
What exactly does SM Active Cycles mean? Nsight Compute	3	1592	July 30, 2024
Question on CTA Execution and Tensor Core Parallelism CUDA Programming and Performance	1	123	September 23, 2024
Wrong pipe utilization for Tensor (FP)? Nsight Compute	0	699	November 6, 2021
computeprof "active cycles" counter "active cycles" value doesn't make sense to CUDA Programming and Performance	7	2656	May 15, 2012
Concurrent execution of CUDA and Tensor cores CUDA Programming and Performance	34	9239	November 3, 2024
How to measure Tensor FLOPs? CUDA Programming and Performance tensorrt , cuda , kernel	14	3427	May 15, 2024
Difference sm__cycles_elapsed/smsp__cycles_elapsed and sm__inst_executed/smsp__inst_executed? Nsight Compute performance-metrics	6	2102	February 16, 2022

What is the different between “SM: Pipe Tc Cycles Active [%]” and “SM: Pipe Tensor Cycles Active [%]” in nsight compute

1) for “SM: Pipe Tc Cycles Active [%]”：

2) for “SM: Pipe Tensor Cycles Active [%]”:

Related topics