Timelline View of Using PM Sampling to Get Tenso Core Utilitation

jyin7 · December 17, 2025, 10:01am

Hi, I have two questions regarding PM sampling timelines in Nsight Compute.

I noticed that only pmsampling:sm__pipe_tensor_cycles_active_realtime_v2.avg.pct_of_peak_sustained_elapsed produces a usable PM sampling timeline (with per-sample timestamps) in the CLI raw report. Replacing avg with min or max (e.g., .min.pct_of_peak_sustained_elapsed) does not expose a timeline, even though those variants are valid metrics. Is the timeline intentionally limited to the avg aggregation for this metric family?
For the same metric, the PM sampling timestamps shown in the CLI raw report appear to use a different time origin than the GUI timeline. For example, the first non-zero sample appears at 260,000 ns in the raw text, while the GUI timeline shows the same transition at 4,000 ns. Am I correct that the GUI re-normalizes PM sampling timestamps relative to the NVTX range or kernel start, and that this alignment information is not exposed in the CLI raw output?

matmul_tensor_pm_timeline.txt (387.6 KB)

felix_dt · December 17, 2025, 10:25am

Replacing avg with min or max (e.g., .min.pct_of_peak_sustained_elapsed) does not expose a timeline, even though those variants are valid metrics.

They are not valid metrics. If you check ncu --query-metrics-collection pmsampling --metrics sm__pipe_tensor_cycles_active_realtime_v2 to get all metric suffixes for this base metric name (in the context of pm sampling), you will only see sub-metrics with avg/max.pct_of_peak_sustained_elapsed. This is different from the default profiling collection mode.

Am I correct that the GUI re-normalizes PM sampling timestamps relative to the NVTX range or kernel start, and that this alignment information is not exposed in the CLI raw output?

Yes, the UI normalizes the timestamps in the table to be relative to the first sample’s timestamp to make it easier to associate the two places. In addition, if the collection is context-switched (as in your case, see the ContextSwitched Yes entry in the table), the metrics in the UI are filtered to only show samples for the CUDA context of interest. The CLI raw output isn’t filtered like this (at this point). You can technically do that yourself manually or using the Python Report Interface using the metrics tracking the context switch trace, but it’s not trivial. You can disable the context switch filtering in the UI using the timeline’s right-click context menu.

jyin7 · December 17, 2025, 11:24pm

I wonder if the time needed to do the context switch is deterministic or not. On my side, there is a 256 us gap between the raw output and the GUI view. Is it a coincidence?

Greg · December 19, 2025, 11:57pm

The time to context switch and the time other contexts run when the target context is not active on the GPU are non-deterministic.

Topic		Replies	Views
Which metrics can I see in the PM sampling timeline Nsight Compute	16	954	January 19, 2024
Obtain the Raw Data from the PM Sampling Timeline View Nsight Compute cuda , kernel , python , data	4	99	October 13, 2025
Nsight Compute PM Sampling Nsight Compute	7	116	October 15, 2025
How to utilize PM sampling? Nsight Compute	2	801	April 26, 2024
Can not find "The timeline row Workload Execution" in Nsight compute CUDA Programming and Performance	6	113	February 11, 2025
How to get Nsight Compute timeline of tensor cores and cuda cores? Nsight Compute cuda , kernel	5	1007	April 16, 2024
Question about PM sampling Nsight Compute	5	1033	November 7, 2023
How can I use PmSamling with ncu? Nsight Compute	2	469	June 28, 2024
Question about the PM Smpling results Nsight Compute	4	69	November 28, 2025
Does Nsight compute provide timeline chart when running a kernel? Nsight Compute	10	978	January 17, 2024

Timelline View of Using PM Sampling to Get Tenso Core Utilitation

Related topics