Any hardware performance counters for number of cores/SMs occupied?


I am looking for a performance counter monitor that will monitor at runtime hardware metrics of an NVIDIA GPU. I need runtime monitoring (during the GPU application) and not monitoring after the end of the application like what nvidia visual profiler or nvprof do.

I have used nvidia-smi that monitors its metrics at runtime, but I need the metric of GPU utilization that refers to spatial percentage of total SMs/cores are used and not to time percentage (like what nvidia-smi refers to).

Is there any tool, that monitors spatial GPU utilization and other hardware metrics like cache misses during the CUDA application?

Any ideas? Thank you

Have you looked at Nsight Systems and Compute? They are replacements to NVProf and NVVP.

Nsight Compute and older versions of CUPTI are designed to monitoring single kernel launches. CUPTI has a continuous mode with a selection of counters that can be collected. The latest version of CUPTI is based upon Perfworks metrics API The range profiling API may be able to satisfy your needs. Determining which metrics can be collected in single pass can be challenging.

The GPU is a collection of engines. The primary engine is the GR (graphics and compute) engine. Other engines include display, copy engines, NVENC (video encoder), NVDEC (video decoder), etc.

In hierarchical order I would recommend these counters

gr__cycles_active # of cycles where GR was active
sm__cycles_active # of cycles with at least one warp in flight
sm__warps_active cumulative # of warps in flight

with the metric


You can continue further down to look at instruction pipe active. Look for the counters named sm__pipe_{pipename>_cycles_active and add the same metric.

The Perfworks API through CUPTI provides 100s of raw counters with various roll-up and throughput metrics. For monitoring the key challenge is finding what can be collected in a single pass.