Description
I want to get a detailed Tensor core utilization information about each Layer\CuDNN API\CUDA kernel, which activated by the TensorRT C++\Python APIs (which I manually programmed) while using it for inference my model.
I want to know if there is a technique\tool which I can use to get Tensor core utilization percentage:
-
Model level
-
Layer\CUDA kernel level
Environment
I am using two environments:
First environment:
TensorRT Version: 8.5.3.1
GPU Type: Quadro RTX 3000
Nvidia Driver Version: R516.01 (r515_95-3) / 31.0.15.1601 (4-24-2022)
CUDA Version: 11.7
CUDNN Version: 8.9.2
Operating System + Version: Windows 10
Python Version (if applicable): 3.8.8
TensorFlow Version (if applicable): NA
PyTorch Version (if applicable): 1.13.1cu+117
Baremetal or Container (if container which image + tag): Baremetal
Secondenvironment:
TensorRT Version: 8.5.1.7
GPU Type: GeForceRTX 3090
Nvidia Driver Version: 535.86.05
CUDA Version: 11.8
CUDNN Version: 8.7.0
Operating System + Version: Ubuntu 22.04
Python Version (if applicable): 3.8.10
TensorFlow Version (if applicable): NA
PyTorch Version (if applicable): 1.13.1cu+117
Baremetal or Container (if container which image + tag): Container - base image - NGC, nvcr.io/nvidia/tensorrt:22.11-py3
Relevant Files
DLProf screenshot:
Nsight System screenshot - before Qant.:
Nsight System screenshot - afterQant. to FP16 using TRT API:
Steps To Reproduce
On my Linux environment, I tried to learn the DLProf user guide and successfully installed and used it based on its user guide:
DLProf User Guide
I successfully generated a DLProf report - see attached screenshot
But I cannot figured it out how can I get a final Tensor core usage metrics.
Additionally, I learned how to use the Nsight System which report the SM instructions, Tensor Active metric, in order to verify that the Tensor core are active - see attached screenshot.
But again, I cannot figured it out how can I get a final Tensor core usage metrics.
Also, I tried to use the nvprof tool with metric tensor_precision_fu_utilization but I got that it isn’t supported for GPU CC 7.5 and above.
Please advise,