Nsight Compute to measure metrics data

We want to measure utilization of tensor core for which we would like to get the metrics values for

  • tensor_precision_fu_utilization and
  • tensor_int_fu_utilization

For our deep learning inference (tensorflow) presently deployed on tesla T4 (Turing) and 2080 Ti (turing) (both are > compute capability 7.0) I read we need to use the NsightCompute (and not Nvprof) tool in order to get these metric values.

We have few technical issues using the tool and a question regarding measurement of tensor core utilization on the GPU as follows:

  1. Unfortunately, with cuda 10.1 and using NVIDIA Corporation\Nsight Compute 2019.1\target\windows-desktop-win7-x64\nv-nsight-cu-cli.exe we are unable to start our application (it’s a batch file that invokes our application exe). We tried also directly invoking the exe but that also did not help
  2. To measure the above mentioned metrics values, Is it there a set of CUPTI API that could be used to get the results during application runtime for all the kernels (per kernel basis perhaps) that were executed during inference ?

Yes you can use Nsight Compute. Please refer the Nvprof Transition Guide->Metric Comparison section in the Nsight Compute CLI document for equivalent metrics.

For point #1 please provide more details. Is there any error reported by Nsight Compute. Also it will be good if you can move to using a newer version of Nsight Compute.

For point #2 it is not clear why you want to use CUPTI. You can use Nsight Compute for profiling your inference code. But yes the same tensor core utilization metrics are also supported by CUPTI.