We want to measure utilization of tensor core for which we would like to get the metrics values for

- tensor_precision_fu_utilization and
- tensor_int_fu_utilization

For our deep learning inference (tensorflow) presently deployed on tesla T4 (Turing) and 2080 Ti (turing) (both are > compute capability 7.0) I read we need to use the NsightCompute (and not Nvprof) tool in order to get these metric values.

We have few technical issues using the tool and a question regarding measurement of tensor core utilization on the GPU as follows:

- Unfortunately, with cuda 10.1 and using NVIDIA Corporation\Nsight Compute 2019.1\target\windows-desktop-win7-x64\nv-nsight-cu-cli.exe we are unable to start our application (it’s a batch file that invokes our application exe). We tried also directly invoking the exe but that also did not help
- To measure the above mentioned metrics values, Is it there a set of CUPTI API that could be used to get the results during application runtime for all the kernels (per kernel basis perhaps) that were executed during inference ?