Question regarding Tensorrt engine build vs inference environment (TensorRT version, Platform, etc)

kgksl · October 12, 2021, 1:21am

Description

We like to convert models trained with TAO and save them as tensorrt engine files.
These saved .engine files will be used with trition server docker containers for inferencing (on the same host machine on which the models were built - same GPU). We would upgrade triton docker base image as new images are released, but we would like to use already converted modes (.engine files) with the new versions of docker images whenever its possible. Therefore, we would like to know which specific changes in model build environment vs inference environment are acceptable and which would result in mismatch issues.

TensorRT documentation(Python API> building an engine and saving .engine file) states following:

Serialized engines are not portable across platforms or TensorRT versions. 
Engines are specific to the exact GPU model they were built on (in addition to the platform and the TensorRT version).

We would like to know:

what is referred as platform here ? (what constitutes it, does it include changes in CUDA version, CuDNN version etc)
TensorRT versions - does is have to match exact versions (build vs inference environment) or only need to match major/ major-minor versions?

NVES · October 12, 2021, 5:38am

Hi,
Can you try running your model with trtexec command, and share the “”–verbose"" log in case if the issue persist

You can refer below link for all the supported operators list, in case any operator is not supported you need to create a custom plugin to support that operation

github.com

onnx/onnx-tensorrt/blob/main/docs/operators.md

<!--- SPDX-License-Identifier: Apache-2.0 -->

# Supported ONNX Operators

TensorRT 8.4 supports operators up to Opset 17. Latest information of ONNX operators can be found [here](https://github.com/onnx/onnx/blob/master/docs/Operators.md)

TensorRT supports the following ONNX data types: DOUBLE, FLOAT32, FLOAT16, INT8, and BOOL

> Note: There is limited support for INT32, INT64, and DOUBLE types. TensorRT will attempt to cast down INT64 to INT32 and DOUBLE down to FLOAT, clamping values to `+-INT_MAX` or `+-FLT_MAX` if necessary.

See below for the support matrix of ONNX operators in ONNX-TensorRT.

## Operator Support Matrix

| Operator                  | Supported  | Supported Types | Restrictions                                                                                                           |
|---------------------------|------------|-----------------|------------------------------------------------------------------------------------------------------------------------|
| Abs                       | Y          | FP32, FP16, INT32 |
| Acos                      | Y          | FP32, FP16 |
| Acosh                     | Y          | FP32, FP16 |
| Add                       | Y          | FP32, FP16, INT32 |

This file has been truncated. show original

Also, request you to share your model and script if not shared already so that we can help you better.

Meanwhile, for some common errors and queries please refer to below link:

Thanks!

kgksl · October 13, 2021, 7:44am

thanks for the quick reply.

I think you have misunderstood my question. I am not facing an issue at the moment. What I have described above is a question about definitions for some facts given under TensorRT documentation. And it would be great to clarify the definitions so that we can add some validation checks to my code to see if the converted engines are compatible with inference environment.

TensorRT documentation(Python API> building an engine and saving .engine file) states following:

Serialized engines are not portable across platforms or TensorRT versions. 
Engines are specific to the exact GPU model they were built on (in addition to the platform and the TensorRT version).

We would like to know:

what is referred as platform here ? (what constitutes it, does it include changes in CUDA version, CuDNN version etc)
TensorRT versions - does is have to match exact versions (build vs inference environment) or only need to match major/ major-minor versions?

Thanks alot

spolisetty · October 18, 2021, 5:39pm

Hi,

Sorry for the delayed response.

Platform refers to OS here. However we have to make sure above. Also versions should be exactly same.

Thank you.

kgksl · October 21, 2021, 10:16pm

Thank you

Topic		Replies	Views
How can I access the same TensorRT engine model in different thread TensorRT cudnn	1	533	November 27, 2023
When to update a tensorrt engine file? TensorRT	5	912	May 11, 2022
TensorRT Inference form a .etlt model on Python TAO Toolkit tensorrt	7	1195	November 16, 2021
Tensorrt Execution Provider TensorRT tensorrt , cudnn , onnx	1	720	November 27, 2023
ONNX Model and Tensorrt Engine gives different output TensorRT tensorrt , onnx	13	5143	June 29, 2022
Tensorrt8.5 inference different with origin onnx model TensorRT	6	1052	December 13, 2022
Darknet YoloV4-tiny model in TensorRT 8 inference TensorRT tensorrt , onnx	7	2148	October 22, 2021
Engine upgrade TensorRT	2	340	March 27, 2023
nVidia release versions compatibility TensorRT	4	877	September 20, 2023
TensorRT gives diffent results than ONNX and Pytorch TensorRT	8	1439	September 28, 2023

Question regarding Tensorrt engine build vs inference environment (TensorRT version, Platform, etc)

Description

Related topics