Nsight Compute to measure metrics data

We want to measure utilization of tensor core for which we would like to get the metrics values for

  • tensor_precision_fu_utilization and
  • tensor_int_fu_utilization

For our deep learning inference (tensorflow) presently deployed on tesla T4 (Turing) and 2080 Ti (turing) (both are > compute capability 7.0) I read we need to use the NsightCompute (and not Nvprof) tool in order to get these metric values.

We have few technical issues using the tool and a question regarding measurement of tensor core utilization on the GPU as follows:

  1. Unfortunately, with cuda 10.1 and using NVIDIA Corporation\Nsight Compute 2019.1\target\windows-desktop-win7-x64\nv-nsight-cu-cli.exe we are unable to start our application (it’s a batch file that invokes our application exe). We tried also directly invoking the exe but that also did not help
  2. To measure the above mentioned metrics values, Is it there a set of CUPTI API that could be used to get the results during application runtime for all the kernels (per kernel basis perhaps) that were executed during inference ?