Speed up or measure progress of the network profiling/building phase

Description

Note: this question applies to any ONNX model so I’m not providing one here.

When building a network, the profiling step can take a lot of time to benchmark different tactics and algorithms on the machine’s hardware configuration. This makes deploying a model to a client’s machine not ideal since our users may have to wait a variable amount of time. Thus:

  1. In general, are there any tips & tricks for speeding up this process?
  2. Is there a reliable way of measuring the progress of the profiling process? One possible route is to inspect the verbose logs and count % of layers completed, but that may not be an accurate measurement.
  3. Per the section on compatibility of serialized engines, TensorRT has a way to check if a pre-built model may be compatible with the current hardware. Are these checks available somewhere as a library/API call? We’d like to consider hosting several versions of the model optimized for different setups and use these checks to load the one most likely to be compatible.

Environment

TensorRT Version: 8.2.2.1
GPU Type: Variable
Nvidia Driver Version: Variable
CUDA Version: 11.6
CUDNN Version: 8
Operating System + Version: Ubuntu 20.04, Windows 10
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Hi,

Request you to share the model, script, profiler, and performance output if not shared already so that we can help you better.

Alternatively, you can try running your model with trtexec command.

While measuring the model performance, make sure you consider the latency and throughput of the network inference, excluding the data pre and post-processing overhead.
Please refer to the below links for more details:
https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-803/best-practices/index.html#measure-performance

https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-803/best-practices/index.html#model-accuracy

Thanks!

Hi, the question is applicable to all models and so I’m looking for solutions/practices that can be applied generally. The profiling performance I’m referring to can be reproduced with trtexec command. If a specific model is still needed, one can e.g. use any of these ResNet ones.

Also, to clarify the question further, it’s not about the inference performance but rather about the time it takes TensorRT to profile a network during the building phase.

For (1), we can try to use the timing cache.
For (2) and (3), we don’t have those APIs or plans yet.

Thank you.