How to measure Tensor core utilization using NVIDIA profiling tools such as Nsight System, DLProf, nvprof etc

Description

I want to get a detailed Tensor core utilization information about each Layer\CuDNN API\CUDA kernel, which activated by the TensorRT C++\Python APIs (which I manually programmed) while using it for inference my model.

I want to know if there is a technique\tool which I can use to get Tensor core utilization percentage:

  • Model level

  • Layer\CUDA kernel level

Environment

I am using two environments:
First environment:
TensorRT Version: 8.5.3.1
GPU Type: Quadro RTX 3000
Nvidia Driver Version: R516.01 (r515_95-3) / 31.0.15.1601 (4-24-2022)
CUDA Version: 11.7
CUDNN Version: 8.9.2
Operating System + Version: Windows 10
Python Version (if applicable): 3.8.8
TensorFlow Version (if applicable): NA
PyTorch Version (if applicable): 1.13.1cu+117
Baremetal or Container (if container which image + tag): Baremetal

Secondenvironment:
TensorRT Version: 8.5.1.7
GPU Type: GeForceRTX 3090
Nvidia Driver Version: 535.86.05
CUDA Version: 11.8
CUDNN Version: 8.7.0
Operating System + Version: Ubuntu 22.04
Python Version (if applicable): 3.8.10
TensorFlow Version (if applicable): NA
PyTorch Version (if applicable): 1.13.1cu+117
Baremetal or Container (if container which image + tag): Container - base image - NGC, nvcr.io/nvidia/tensorrt:22.11-py3

Relevant Files

DLProf screenshot:

Nsight System screenshot - before Qant.:

Nsight System screenshot - afterQant. to FP16 using TRT API:

Steps To Reproduce

On my Linux environment, I tried to learn the DLProf user guide and successfully installed and used it based on its user guide:
DLProf User Guide

I successfully generated a DLProf report - see attached screenshot
But I cannot figured it out how can I get a final Tensor core usage metrics.

Additionally, I learned how to use the Nsight System which report the SM instructions, Tensor Active metric, in order to verify that the Tensor core are active - see attached screenshot.
But again, I cannot figured it out how can I get a final Tensor core usage metrics.

Also, I tried to use the nvprof tool with metric tensor_precision_fu_utilization but I got that it isn’t supported for GPU CC 7.5 and above.

Please advise,

Hi,
Any לןמג םכ support\response\guidance will be much appriciated.
Thanks,

Hi,
Can you try running your model with trtexec command, and share the “”–verbose"" log in case if the issue persist

You can refer below link for all the supported operators list, in case any operator is not supported you need to create a custom plugin to support that operation

Also, request you to share your model and script if not shared already so that we can help you better.

Meanwhile, for some common errors and queries please refer to below link:

Thanks!

Hello,
The topic doens’t related to specific model but to general knowlege how can I measure the metric Tensor core utilization precentage.

I will happy to hear that this metric can be supplied in the level of specific Layer or any CUDA kernel\API but it provided in the level of entire model it wil lalso bo OK for my needs.

I want to learn how to correctly use tools such as DLProf or Nsight to achieve this metric for any model.

Above I tried to described what I successfully did with these tools but I wasn’t sutisfied, they were not detailed and clarified enough for me.

Thanks,

Hi ,
I hope the document may help

Thanks