How to measure Tensor core utilization using NVIDIA profiling tools such as Nsight System, DLProf, nvprof etc

orong13 · November 6, 2023, 8:29am

Description

I want to get a detailed Tensor core utilization information about each Layer\CuDNN API\CUDA kernel, which activated by the TensorRT C++\Python APIs (which I manually programmed) while using it for inference my model.

I want to know if there is a technique\tool which I can use to get Tensor core utilization percentage:

Model level
Layer\CUDA kernel level

Environment

I am using two environments:
First environment:
TensorRT Version: 8.5.3.1
GPU Type: Quadro RTX 3000
Nvidia Driver Version: R516.01 (r515_95-3) / 31.0.15.1601 (4-24-2022)
CUDA Version: 11.7
CUDNN Version: 8.9.2
Operating System + Version: Windows 10
Python Version (if applicable): 3.8.8
TensorFlow Version (if applicable): NA
PyTorch Version (if applicable): 1.13.1cu+117
Baremetal or Container (if container which image + tag): Baremetal

Secondenvironment:
TensorRT Version: 8.5.1.7
GPU Type: GeForceRTX 3090
Nvidia Driver Version: 535.86.05
CUDA Version: 11.8
CUDNN Version: 8.7.0
Operating System + Version: Ubuntu 22.04
Python Version (if applicable): 3.8.10
TensorFlow Version (if applicable): NA
PyTorch Version (if applicable): 1.13.1cu+117
Baremetal or Container (if container which image + tag): Container - base image - NGC, nvcr.io/nvidia/tensorrt:22.11-py3

Relevant Files

DLProf screenshot:

Nsight System screenshot - before Qant.:

Nsight System screenshot - afterQant. to FP16 using TRT API:

Steps To Reproduce

On my Linux environment, I tried to learn the DLProf user guide and successfully installed and used it based on its user guide:
DLProf User Guide

I successfully generated a DLProf report - see attached screenshot
But I cannot figured it out how can I get a final Tensor core usage metrics.

Additionally, I learned how to use the Nsight System which report the SM instructions, Tensor Active metric, in order to verify that the Tensor core are active - see attached screenshot.
But again, I cannot figured it out how can I get a final Tensor core usage metrics.

Also, I tried to use the nvprof tool with metric tensor_precision_fu_utilization but I got that it isn’t supported for GPU CC 7.5 and above.

Please advise,

orong13 · November 27, 2023, 5:09am

Hi,
Any לןמג םכ support\response\guidance will be much appriciated.
Thanks,

AakankshaS · November 27, 2023, 7:43am

Hi,
Can you try running your model with trtexec command, and share the “”–verbose"" log in case if the issue persist

You can refer below link for all the supported operators list, in case any operator is not supported you need to create a custom plugin to support that operation

github.com

onnx/onnx-tensorrt/blob/main/docs/operators.md

<!--- SPDX-License-Identifier: Apache-2.0 -->

# Supported ONNX Operators

TensorRT 8.6 supports operators up to Opset 17. Latest information of ONNX operators can be found [here](https://github.com/onnx/onnx/blob/master/docs/Operators.md)

TensorRT supports the following ONNX data types: DOUBLE, FLOAT32, FLOAT16, INT8, and BOOL

> Note: There is limited support for INT32, INT64, and DOUBLE types. TensorRT will attempt to cast down INT64 to INT32 and DOUBLE down to FLOAT, clamping values to `+-INT_MAX` or `+-FLT_MAX` if necessary.

See below for the support matrix of ONNX operators in ONNX-TensorRT.

## Operator Support Matrix

| Operator                  | Supported  | Supported Types | Restrictions                                                                                                           |
|---------------------------|------------|-----------------|------------------------------------------------------------------------------------------------------------------------|
| Abs                       | Y          | FP32, FP16, INT32 |
| Acos                      | Y          | FP32, FP16 |
| Acosh                     | Y          | FP32, FP16 |
| Add                       | Y          | FP32, FP16, INT32 |

This file has been truncated. show original

Also, request you to share your model and script if not shared already so that we can help you better.

Meanwhile, for some common errors and queries please refer to below link:

Thanks!

orong13 · December 4, 2023, 5:53am

Hello,
The topic doens’t related to specific model but to general knowlege how can I measure the metric Tensor core utilization precentage.

I will happy to hear that this metric can be supplied in the level of specific Layer or any CUDA kernel\API but it provided in the level of entire model it wil lalso bo OK for my needs.

I want to learn how to correctly use tools such as DLProf or Nsight to achieve this metric for any model.

Above I tried to described what I successfully did with these tools but I wasn’t sutisfied, they were not detailed and clarified enough for me.

Thanks,

AakankshaS · January 31, 2024, 10:01am

Hi ,
I hope the document may help

Thanks

Topic		Replies	Views
How to monitor tensor cores utilization? TensorRT	5	4725	August 5, 2021
GPU Utilization TensorRT tensorrt	3	785	August 29, 2023
nVidia release versions compatibility TensorRT	4	1029	September 20, 2023
TensorRT batch inference - How to be sure one kernel does use all the GPU ressources? TensorRT tensorrt , nsight	3	795	May 18, 2021
Keras CRNN model conversion to tensorrt engine error TensorRT tensorrt , tensorflow , onnx	3	958	April 8, 2022
TensorRT Inference Consuming Large Amount of System Resources TensorRT	1	622	July 5, 2022
nvonnxparser::IParse::parse() fail,and trt report paramenter check fail TensorRT tensorrt	7	1206	July 12, 2021
TensorRT version for CUDA 12.0 TensorRT tensorrt , cuda	1	1835	March 14, 2023
Does TensorRT exploit parallelism in a computational graph during inference? TensorRT	2	526	April 18, 2023
Nsight Profile of NVIDIA/CUDALibrarySamples/cuTENSOR. Does it use CUDA Programming and Performance	4	519	November 22, 2022