When to update a tensorrt engine file?

samsonf89m5 · April 21, 2022, 4:31pm

Description

(1) We have computers with a GPU, with a model that we have converted to tensorrt and cached on disk. Imagine we want to update the version of CUDA or CuDNN or tensorrt. For these changes, must we re-convert the model to tensorrt? Are there other changes that would require us to reconvert the model? Under what conditions would it simply be a good idea to re-convert the model, but perhaps not a requirement?

(2) I believe it is recommended to convert models to tensorrt using the GPU architecture of choice. E.g. if we are deploying on a tegra, we should do the conversion on that specific type of tegra. Is that correct? If we deploy to multiple GPU architectures, it seems to behoove us to deploy a generic model format (e.g. onnx) and then convert it on the device. Some of our models are large, and require 10s of seconds to convert. For this circumstance, it would seem that our system would be forced to have downtime when we upgrade our model. What is the recommended way to avoid downtown?

Thanks!

NVES · April 21, 2022, 5:07pm

Hi ,
We recommend you to check the supported features from the below link.

You can refer below link for all the supported operators list.
For unsupported operators, you need to create a custom plugin to support the operation

github.com

onnx/onnx-tensorrt/blob/main/docs/operators.md

<!--- SPDX-License-Identifier: Apache-2.0 -->

# Supported ONNX Operators

TensorRT 8.4 supports operators up to Opset 17. Latest information of ONNX operators can be found [here](https://github.com/onnx/onnx/blob/master/docs/Operators.md)

TensorRT supports the following ONNX data types: DOUBLE, FLOAT32, FLOAT16, INT8, and BOOL

> Note: There is limited support for INT32, INT64, and DOUBLE types. TensorRT will attempt to cast down INT64 to INT32 and DOUBLE down to FLOAT, clamping values to `+-INT_MAX` or `+-FLT_MAX` if necessary.

See below for the support matrix of ONNX operators in ONNX-TensorRT.

## Operator Support Matrix

| Operator                  | Supported  | Supported Types | Restrictions                                                                                                           |
|---------------------------|------------|-----------------|------------------------------------------------------------------------------------------------------------------------|
| Abs                       | Y          | FP32, FP16, INT32 |
| Acos                      | Y          | FP32, FP16 |
| Acosh                     | Y          | FP32, FP16 |
| Add                       | Y          | FP32, FP16, INT32 |

This file has been truncated. show original

Thanks!

samsonf89m5 · April 21, 2022, 7:05pm

Thanks for your quick answer. I think you are saying:

It doesn’t matter where we convert to tensorrt files.
It could be on a machine with a target GPU or a different GPU.
There is backwards compatibility for newer tensorrt versions reading in older tensorrt files. The only reason to re-convert would be if a newer versions of tensorrt failed to read an older version, which should not happen.

Are the above two points correct?

In Developer Guide :: NVIDIA Deep Learning TensorRT Documentation, It says that
Engines created by TensorRT are specific to both the TensorRT version with which they were created and the GPU on which they were created.

So, they look incorrect to me. It looks like when we upgrade to a newer version of tensorrt, we should we-convert. It also looks like we should do the correct conversion for each GPU.

spolisetty · April 25, 2022, 11:32am

Hi,

Yes. Currently, TensorRT does not support the above features. We need to re-build the TensorRT engine file.

Thank you.

samsonf89m5 · April 25, 2022, 8:53pm

Thank you spolisetty.

Given what you wrote:

(1) How does one minimize downtime when uploading a new network? If it takes a while to convert to tensorrt?

(2) I would guess one would download pre-converted tensorrt files to the board. If that is true, I need to know which tegra I am running on. What’s best practices for figuring out which board you are running on?

Thanks so much!

spolisetty · May 11, 2022, 11:10am

Do you mean engine build time? we can increase the workspace for faster conversion.

You need to rebuild the TRT engine on the same machine/board you’re planning to deploy.

Thank you.

Topic		Replies	Views
Question regarding Tensorrt engine build vs inference environment (TensorRT version, Platform, etc) TensorRT	4	900	October 21, 2021
Engine upgrade TensorRT	2	341	March 27, 2023
TensorRT version for CUDA 12.0 TensorRT tensorrt , cuda	1	1806	March 14, 2023
nVidia release versions compatibility TensorRT	4	966	September 20, 2023
TensorRT support for multiple GPUs - URGENT TensorRT	6	2169	October 28, 2021
Is there any way to install a higher version than 8.0 of TensorRT through tar without changing the cuda11.3 version? TensorRT	1	313	June 9, 2023
Build TensorRT on Cuda compute capability 7.5 and make it backward compatible with previous capabilities TensorRT tensorrt	4	1886	May 19, 2022
Unable to use TensorRT 7.2.3 for Ubuntu 18.04 and CUDA 11.2 TensorRT	6	4562	May 6, 2021
Does TensorRT support GeForce and Quadro boards TensorRT	1	657	April 6, 2023
Keras CRNN model conversion to tensorrt engine error TensorRT tensorrt , tensorflow , onnx	3	956	April 8, 2022

When to update a tensorrt engine file?

Description

Related topics