Is there a way to monitor real time usage of tensor cores through some API? I couldn’t find anything on the nvml api, with the only option being nsight, which isn’t able to do real time monitoring.
The CUPTI Metric API seems to have some tensor utilization functions.
https://docs.nvidia.com/cupti/Cupti/r_main.html#r_host_raw_metrics_api
You can do this with Data Center GPU Manager (DCGM)
https://docs.nvidia.com/datacenter/dcgm/latest/dcgm-user-guide/feature-overview.html#profiling
Download from here:
.
Profiling Tools!? That is a far lower level than basic obvious system monitoring stuff that is usually released with a product. So if nvml says my 4090 is 0% busy even though all my Tensor cores are 100% busy how is that right is any way shape or form? Shouldn’t the tools provide accurate data. How many years have Tensor cores been present?
Documentation for Nsight Compute CLI lists a couple of tensor metrics:
sm__pipe_tensor_op_hmma_cycles_active.avg.pct_of_peak_sustained_active
sm__pipe_tensor_op_imma_cycles_active.avg.pct_of_peak_sustained_active (SM 7.2+)