Any hardware performance counters for number of cores/SMs occupied?

johnkrantkrant · January 17, 2020, 11:59am

Hello,

I am looking for a performance counter monitor that will monitor at runtime hardware metrics of an NVIDIA GPU. I need runtime monitoring (during the GPU application) and not monitoring after the end of the application like what nvidia visual profiler or nvprof do.

I have used nvidia-smi that monitors its metrics at runtime, but I need the metric of GPU utilization that refers to spatial percentage of total SMs/cores are used and not to time percentage (like what nvidia-smi refers to).

Is there any tool, that monitors spatial GPU utilization and other hardware metrics like cache misses during the CUDA application?

Any ideas? Thank you

mnicely · January 18, 2020, 2:19pm

Have you looked at Nsight Systems and Compute? They are replacements to NVProf and NVVP.

https://devblogs.nvidia.com/using-nsight-compute-to-inspect-your-kernels/

Greg · January 20, 2020, 5:26pm

Nsight Compute and older versions of CUPTI are designed to monitoring single kernel launches. CUPTI has a continuous mode with a selection of counters that can be collected. The latest version of CUPTI is based upon Perfworks metrics API https://docs.nvidia.com/cupti/Cupti/r_main.html#r_host_metrics_api. The range profiling API may be able to satisfy your needs. Determining which metrics can be collected in single pass can be challenging.

The GPU is a collection of engines. The primary engine is the GR (graphics and compute) engine. Other engines include display, copy engines, NVENC (video encoder), NVDEC (video decoder), etc.

In hierarchical order I would recommend these counters

gr__cycles_active # of cycles where GR was active
sm__cycles_active # of cycles with at least one warp in flight
sm__warps_active cumulative # of warps in flight

with the metric

avg.pct_of_peak_sustained_elapsed

You can continue further down to look at instruction pipe active. Look for the counters named sm__pipe_{pipename>_cycles_active and add the same metric.

The Perfworks API through CUPTI provides 100s of raw counters with various roll-up and throughput metrics. For monitoring the key challenge is finding what can be collected in a single pass.

Topic		Replies	Views
Profile counters for a duration Nsight Compute	1	424	July 20, 2023
showing gpu utlization per process CUDA Programming and Performance	5	2298	October 12, 2018
How to Get the Exact Amount of Resources the GPU Uses at the Moment (e.g., Used Tensor Cores) Regardless of the Running Process CUDA Programming and Performance performance-metrics	5	204	January 13, 2025
API can measure or query values of performance counters CUDA Programming and Performance	5	1578	August 1, 2017
Is there a tool to monitor the real time usage of the SM or the cores inside SM CUDA Programming and Performance	1	666	October 23, 2013
Monitoring GPU Utilization "Top" like utility for GPU CUDA Programming and Performance	8	6544	July 28, 2010
GPU utilization for CUDA CUDA Programming and Performance	1	783	October 31, 2018
Watch Resource Usage of an SM in Real Time CUDA Programming and Performance	1	757	April 12, 2023
Measuring the GPU Occupancy of Multi-stream Workloads Technical Blog	1	270	April 20, 2024
Which foundational libraries do the high-frequency GPU metrics in Nsight Systems come from? CUPTI – CUDA Profiler Tools Interface cuda , nsight , profiling , performance-metrics	5	544	March 14, 2024

Any hardware performance counters for number of cores/SMs occupied?

Related topics