Speed up or measure progress of the network profiling/building phase

user7278133 · May 18, 2022, 6:18pm

Description

Note: this question applies to any ONNX model so I’m not providing one here.

When building a network, the profiling step can take a lot of time to benchmark different tactics and algorithms on the machine’s hardware configuration. This makes deploying a model to a client’s machine not ideal since our users may have to wait a variable amount of time. Thus:

In general, are there any tips & tricks for speeding up this process?
Is there a reliable way of measuring the progress of the profiling process? One possible route is to inspect the verbose logs and count % of layers completed, but that may not be an accurate measurement.
Per the section on compatibility of serialized engines, TensorRT has a way to check if a pre-built model may be compatible with the current hardware. Are these checks available somewhere as a library/API call? We’d like to consider hosting several versions of the model optimized for different setups and use these checks to load the one most likely to be compatible.

Environment

TensorRT Version: 8.2.2.1
GPU Type: Variable
Nvidia Driver Version: Variable
CUDA Version: 11.6
CUDNN Version: 8
Operating System + Version: Ubuntu 20.04, Windows 10
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

NVES · May 18, 2022, 6:37pm

Hi,

Request you to share the model, script, profiler, and performance output if not shared already so that we can help you better.

Alternatively, you can try running your model with trtexec command.

While measuring the model performance, make sure you consider the latency and throughput of the network inference, excluding the data pre and post-processing overhead.
Please refer to the below links for more details:
https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-803/best-practices/index.html#measure-performance

https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-803/best-practices/index.html#model-accuracy

Thanks!

user7278133 · May 18, 2022, 7:58pm

Hi, the question is applicable to all models and so I’m looking for solutions/practices that can be applied generally. The profiling performance I’m referring to can be reproduced with trtexec command. If a specific model is still needed, one can e.g. use any of these ResNet ones.

Also, to clarify the question further, it’s not about the inference performance but rather about the time it takes TensorRT to profile a network during the building phase.

spolisetty · May 24, 2022, 10:33am

For (1), we can try to use the timing cache.
For (2) and (3), we don’t have those APIs or plans yet.

Thank you.

Topic		Replies	Views
Is it possible to know how much time each layer takes on TensorRT? TensorRT	3	1023	April 27, 2022
How to check performance of TensorRT optimised file DRIVE AGX Xavier General driveos-dl	11	923	June 19, 2023
Performance of Neural Network on Jetson Jetson TX2	6	619	October 18, 2021
Profiling TensorRT Inference TensorRT	3	1298	January 20, 2021
TensorRT Python API ProfilingVerbosity DETAILED level generate the same information as LAYER_NAMES_ONLY level generate TensorRT	1	1389	January 25, 2022
How can i profile the tensorrt model after highly fused? TensorRT tensorrt , kernel	4	944	August 23, 2023
TensorRT uses GPU alone or mix of CPU & GPU Jetson Nano tensorrt	5	612	October 18, 2021
Profile pytorch model using NCU TensorRT ubuntu , pytorch , python	1	1301	July 1, 2022
Profile TensorRT Model on Orin NX Nsight DL Designer	17	198	December 18, 2025
tensorRT performance measurement DRIVE AGX Orin General driveos-dl	9	165	October 17, 2025

Speed up or measure progress of the network profiling/building phase

Description

Environment

Related topics