In [1], I see that a tensor related metric for integer instructions.

tensor_int_fu_utilization The utilization level of the multiprocessor function units that execute tensor core int8 instructions on a scale of 0 to 10. This metric is only available for device with compute capability 7.2.

On 2080Ti which is CC=7.5, nvprof doesn’t work and on the other hand I see that NsightCompute has no metric related to that [2]. It only supports utilization for FP instructions (sm__pipe_tensor_cycles_active.avg.pct_of_peak_sustained_active).

So, isn’t there any way to collect data for that? Why NsightCompute doesn’t support that?

Moreover, as far as I know, tensor operations generally use FP calculations. So, what does integer unit utilization really mean for tensor instructions?

[1] https://docs.nvidia.com/cuda/profiler-users-guide/index.html#metrics-reference-7x

[2] https://docs.nvidia.com/nsight-compute/NsightComputeCli/index.html#nvprof-metric-collection