How to utilize PM sampling?

202476410arsmart · March 24, 2024, 9:05am

0228-output-file-full.nsight-cuprof-report.zip (1.6 MB)
I see this new property in NCU recently: PM sampling.

But I am not sure how to use it to help my profiling… What I can take from here?

Maybe… I want to know, if, I have double buffer, when the load unit is waiting for compute unit?

felix_dt · April 3, 2024, 6:53am

PM (performance monitor) sampling allows you to see the values of single-pass metrics over the runtime of your workload. This enables you to identify e.g. tail effects (lower number of active warps towards the end of the kernel), how metric values correlate to potential phases in your algorithm, or how different metrics are correlated for your workload (e.g. compute pipeline idling when dram throughput is higher for loading data).

In the screenshot you shared, it seems the DRAM throughput is generally low and the compute throughput generally high, but there doesn’t seem to be a clear pattern where one drops and the other increases, or similar.

In Nsight Compute 2024.1 (CUDA 12.4), you can also collect the PmSampling set, which includes a new PM sampling section dedicated to warp stalls. Those may give you more insight regarding warp stalls happening due to compute waiting for loads to happen. Enable the set for collection and then select the corresponding entry in the PM Sampling section’s header dropdown (right-hand side).

Topic		Replies	Views
Which metrics can I see in the PM sampling timeline Nsight Compute	15	1085	December 6, 2023
How to generate the PM Sampling WarpStates? Nsight Compute	1	292	May 14, 2025
Nsight Compute PM Sampling Nsight Compute	6	262	October 15, 2025
Timelline View of Using PM Sampling to Get Tenso Core Utilitation Nsight Compute	3	89	December 19, 2025
PM monitor data is missing Nsight Compute	6	680	December 20, 2023
Question about the PM Smpling results Nsight Compute	3	121	October 17, 2025
How can I use PmSamling with ncu? Nsight Compute	1	511	May 13, 2024
Question about PM sampling Nsight Compute	4	1120	November 3, 2023
PM Sampling not enabled on RTX 4090 Nsight Compute	4	310	June 13, 2025
How to understanding stall_wait and sampling data Nsight Compute	4	2720	December 1, 2021

How to utilize PM sampling?

Related topics