Performance Counters similar to CPU

Hi, I am looking to get information about performance counters similar to perf_events present on CPU.
The specific information that I am looking for are caches and TLBs [all levels]. I would really appreciate if pointed towards the right direction.
This is mainly for academic purposes.


NVIDIA GPU performance counters can be accessed by

  • NVIDIA Visual Profiler (NVVP)
  • Nsight Compute
  • Nsight VSE Compute Profiler
  • CUDA Profiler Tools Interface (CUPTI) library

There are two mechanisms available for collection.

  1. Kernel level profiling. Value is for full kernel.
  2. Program counter sampling. Samples are flat (no call stack) and the only counter available is the warp scheduler state of the sampled warp (known as stall reasons).

The various profiler tools offer L1 and L2 cache statistics. TLB and MMU counters are not available through the tools.

These libraries differ from perf_events. perf_events primary data collection mechanism is frequency based sampling or event based sampling with call stack and counter collection. The GPU performance counter tools and libraries do not support event based sampling or call stack collection.

Hi, Currently I am using a 1080Ti, I believe these tools can be run on it.
Thanks for the input. Cheers!