How to accelrate build engine time?

jackgao0323 · June 21, 2022, 7:57am

Description

Our customers have different devices like 1080TI, 2080TI, 3080, Jetson AGX… and I want to know how to shortened the build engine time.
I’m upgrading my TRT version from 7 to 8 and I found it have a new feature call timing cache. Does this feature means I can use this cache across different devices like gen cache on 2080TI and use it on 3080? If not, what is the best method to shortened the build engine time?

Environment

TensorRT Version: 8.4.1.5
GPU Type: 1080TI, 2080TI, 3080, Jetson AGX
Nvidia Driver Version: 516.40
CUDA Version: 11.1
CUDNN Version: 8
Operating System + Version: windows 10 + ubuntu 20.04 + Jetson
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

Steps To Reproduce

NVES · June 21, 2022, 8:37am

Hi ,
We recommend you to check the supported features from the below link.

You can refer below link for all the supported operators list.
For unsupported operators, you need to create a custom plugin to support the operation

github.com

onnx/onnx-tensorrt/blob/main/docs/operators.md

<!--- SPDX-License-Identifier: Apache-2.0 -->

# Supported ONNX Operators

TensorRT 8.4 supports operators up to Opset 17. Latest information of ONNX operators can be found [here](https://github.com/onnx/onnx/blob/master/docs/Operators.md)

TensorRT supports the following ONNX data types: DOUBLE, FLOAT32, FLOAT16, INT8, and BOOL

> Note: There is limited support for INT32, INT64, and DOUBLE types. TensorRT will attempt to cast down INT64 to INT32 and DOUBLE down to FLOAT, clamping values to `+-INT_MAX` or `+-FLT_MAX` if necessary.

See below for the support matrix of ONNX operators in ONNX-TensorRT.

## Operator Support Matrix

| Operator                  | Supported  | Supported Types | Restrictions                                                                                                           |
|---------------------------|------------|-----------------|------------------------------------------------------------------------------------------------------------------------|
| Abs                       | Y          | FP32, FP16, INT32 |
| Acos                      | Y          | FP32, FP16 |
| Acosh                     | Y          | FP32, FP16 |
| Add                       | Y          | FP32, FP16, INT32 |

This file has been truncated. show original

Thanks!

jackgao0323 · June 21, 2022, 8:58am

Hi,
My model is support for all device like 1080, 2080,3080. I remember that it could not share the same plan file for different compute capability, right?
We give customer our onnx models and they gen the plan file by their devices. Now, they think it takes a long time to generate the plan file and I found maybe timing cache can solve this question, is it?
If not, do you have any idea to solve this problem?

Thanks!

spolisetty · June 22, 2022, 4:04pm

Hi,

Yes, currently we have a timing cache, and please try increasing the workspace.

Thank you.

jackgao0323 · June 23, 2022, 6:38am

Hi @spolisetty ,
If I want to shortend build engine time for fp16, just turn on timing cache and workspace, is it?
If so, how big would you recommend increasing the workspace?

Thanks.

spolisetty · June 23, 2022, 4:30pm

Yes, workspace you can allocate based on your GPU memory available.

Topic		Replies	Views
ONNX engine initialisation/build takes significantly longer in TensorRT 8.5 vs 8.0 TensorRT tensorrt , performance , benchmarks	10	1699	August 20, 2024
Extreme engine building time for certain models on Windows with FP16 TensorRT	6	1338	March 23, 2022
TensorRt inference is taking 1.5 sec to inference a single frame.i want to speed up my inference TensorRT tensorrt , jetson-inference , jetson-nano	1	989	March 13, 2023
【TensorRT】buildEngineWithConfig too slow in FP16 TensorRT tensorrt	11	3986	April 5, 2022
TensorRT Inference Slower When Loading Searalized Engine than Building on the Fly TensorRT jetson-inference	6	1448	March 10, 2021
Two TRT compiled engines that were generated from the same Onnx model show different inference average times TensorRT cudnn	2	254	August 11, 2024
Execution time much slower with TensorRT TensorRT tensorrt , cudnn , jetson-orin	0	150	April 2, 2025
Infer time after conversion and ram usage TensorRT tensorrt	12	1250	February 15, 2022
CPU seems to be slowed down by large TensorRT Engines - Cache problem? Jetson AGX Xavier tensorrt , cuda	4	1271	February 15, 2022
[Feature request] Make using incompatible timing caches for building CUDA engines not a hard error TensorRT	5	711	July 21, 2022

How to accelrate build engine time?

Description

Environment

Relevant Files

Steps To Reproduce

Related topics