Hi, I am looking to get information about performance counters similar to perf_events present on CPU.
The specific information that I am looking for are caches and TLBs [all levels]. I would really appreciate if pointed towards the right direction.
This is mainly for academic purposes.
NVIDIA GPU performance counters can be accessed by
NVIDIA Visual Profiler (NVVP)
Nsight Compute
Nsight VSE Compute Profiler
CUDA Profiler Tools Interface (CUPTI) library
There are two mechanisms available for collection.
Kernel level profiling. Value is for full kernel.
Program counter sampling. Samples are flat (no call stack) and the only counter available is the warp scheduler state of the sampled warp (known as stall reasons).
The various profiler tools offer L1 and L2 cache statistics. TLB and MMU counters are not available through the tools.
These libraries differ from perf_events. perf_events primary data collection mechanism is frequency based sampling or event based sampling with call stack and counter collection. The GPU performance counter tools and libraries do not support event based sampling or call stack collection.