Unable to convert torchvision model to TRT engine compatible with latest version of deepstream on Jetson

xqsd · April 24, 2023, 10:58pm

Please provide complete information as applicable to your setup.

**• Hardware Platform Jetson Xavier **
• DeepStream Version: 6.2
• JetPack Version: 5.1
• TensorRT Version
• Issue Type: Question/bug
• How to reproduce the issue ?
To reproduce, download the torchvision retinanet model, and export it as an onnx file ( I have tried every conversion method I can find, and all have this issue)
Then attempt to convert this file into a TRT engine (using TRTEXEC, deepstreams automatic conversion, torch2trt, etc…) on a Jetson Xavier running jetpack 5.1

I am attempting to convert the Torchvision RetinaNet model to a TRT engine which can be run on the jetson.
https://pytorch.org/vision/main/models/retinanet.html

Using torch.onnx.export and TRTEXEC
This results in error due to ConstantOfShape having different datatypes.

According to the onnx documentation onnx/Operators.md at main · onnx/onnx · GitHub ConstantOfShape only works with INT64 as input.

However, TRT will only work with ConstantOfShape of type FP32, as shown in the following documentation:

github.com

onnx/onnx-tensorrt/blob/main/docs/operators.md

<!--- SPDX-License-Identifier: Apache-2.0 -->

# Supported ONNX Operators

TensorRT 8.6 supports operators up to Opset 17. Latest information of ONNX operators can be found [here](https://github.com/onnx/onnx/blob/master/docs/Operators.md)

TensorRT supports the following ONNX data types: DOUBLE, FLOAT32, FLOAT16, INT8, and BOOL

> Note: There is limited support for INT32, INT64, and DOUBLE types. TensorRT will attempt to cast down INT64 to INT32 and DOUBLE down to FLOAT, clamping values to `+-INT_MAX` or `+-FLT_MAX` if necessary.

See below for the support matrix of ONNX operators in ONNX-TensorRT.

## Operator Support Matrix

| Operator                  | Supported  | Supported Types | Restrictions                                                                                                           |
|---------------------------|------------|-----------------|------------------------------------------------------------------------------------------------------------------------|
| Abs                       | Y          | FP32, FP16, INT32 |
| Acos                      | Y          | FP32, FP16 |
| Acosh                     | Y          | FP32, FP16 |
| Add                       | Y          | FP32, FP16, INT32 |

This file has been truncated. show original

I have tried scripting or tracing the model before converting it, as well as using polygraphy surgeon to try and clean up the graph before conversion, with no effect.

Is there a way to convert and run this model? Otherwise this seems like a bug, as the requirements for the latest supported version of ONNX does not match TRTexecs capabilities.

fanzh · April 25, 2023, 2:12am

which deepstream sample are you testing? could share the whole test logand the configuration file?

xqsd · April 25, 2023, 4:39pm

Hi,
At this point the deepstream sample and config is irrelevant, as I am still working on getting a TRT engine. I am using TRTexec to create this engine.

The main thing that is important for this being for deepstream on the Jetson is the versions of onnx, trt and CUDA that are available.

However, if you could provide some insight on the issues I am having with the conversion from Torchvision’s RetinaNet model to a TRT engine, that would be great.

Thank you!

lynettez · April 26, 2023, 9:47am

Hey, which opset version did you use for onnx.export? could you try with opset=13?

xqsd · April 26, 2023, 7:54pm

Hi,
I was previously using opset=16, but I just tried with opset=13.
The error is pretty much the same, provided below:

[04/26/2023-12:52:46] [I] [TRT] ----------------------------------------------------------------
[04/26/2023-12:52:46] [I] [TRT] Input filename: exported_model.onnx
[04/26/2023-12:52:46] [I] [TRT] ONNX IR version: 0.0.7
[04/26/2023-12:52:46] [I] [TRT] Opset version: 13
[04/26/2023-12:52:46] [I] [TRT] Producer name: pytorch
[04/26/2023-12:52:46] [I] [TRT] Producer version: 2.0.0
[04/26/2023-12:52:46] [I] [TRT] Domain:
[04/26/2023-12:52:46] [I] [TRT] Model version: 0
[04/26/2023-12:52:46] [I] [TRT] Doc string:
[04/26/2023-12:52:46] [I] [TRT] ----------------------------------------------------------------
[04/26/2023-12:52:46] [W] [TRT] onnx2trt_utils.cpp:375: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[04/26/2023-12:52:47] [W] [TRT] onnx2trt_utils.cpp:403: One or more weights outside the range of INT32 was clamped
[04/26/2023-12:52:51] [E] Error[1]: [network.cpp::setWeightsName::3366] Error Code 1: Internal Error (Error: Weights of same values but of different types are used in the network!)
[04/26/2023-12:52:51] [E] [TRT] ModelImporter.cpp:726: While parsing node number 1120 [ConstantOfShape → “/anchor_generator/ConstantOfShape_output_0”]:
[04/26/2023-12:52:51] [E] [TRT] ModelImporter.cpp:727: — Begin node —
[04/26/2023-12:52:51] [E] [TRT] ModelImporter.cpp:728: input: “/anchor_generator/Constant_12_output_0”
output: “/anchor_generator/ConstantOfShape_output_0”
name: “/anchor_generator/ConstantOfShape”
op_type: “ConstantOfShape”
attribute {
name: “value”
t {
dims: 1
data_type: 7
raw_data: “\000\000\000\000\000\000\000\000”
}
type: TENSOR
}
doc_string: “/usr/local/lib/python3.8/dist-packages/torchvision-0.16.0a0+0d75d9e-py3.8-linux-aarch64.egg/torchvision/models/detection/anchor_utils.py(121): \n/usr/local/lib/python3.8/dist-packages/torchvision-0.16.0a0+0d75d9e-py3.8-linux-aarch64.egg/torchvision/models/detection/anchor_utils.py(119): forward\n/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py(1488): _slow_forward\n/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py(1501): _call_impl\n/usr/local/lib/python3.8/dist-packages/torchvision-0.16.0a0+0d75d9e-py3.8-linux-aarch64.egg/torchvision/models/detection/retinanet.py(636): forward\n/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py(1488): _slow_forward\n/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py(1501): _call_impl\n/usr/local/lib/python3.8/dist-packages/torch/jit/_trace.py(118): wrapper\n/usr/local/lib/python3.8/dist-packages/torch/jit/_trace.py(127): forward\n/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py(1501): _call_impl\n/usr/local/lib/python3.8/dist-packages/torch/jit/_trace.py(1260): _get_trace_graph\n/usr/local/lib/python3.8/dist-packages/torch/onnx/utils.py(893): _trace_and_get_graph_from_model\n/usr/local/lib/python3.8/dist-packages/torch/onnx/utils.py(989): _create_jit_graph\n/usr/local/lib/python3.8/dist-packages/torch/onnx/utils.py(1113): _model_to_graph\n/usr/local/lib/python3.8/dist-packages/torch/onnx/utils.py(1533): _export\n/usr/local/lib/python3.8/dist-packages/torch/onnx/utils.py(506): export\nconvert_pickle.py(37): \n”
[04/26/2023-12:52:51] [E] [TRT] ModelImporter.cpp:729: — End node —
[04/26/2023-12:52:51] [E] [TRT] ModelImporter.cpp:731: ERROR: ModelImporter.cpp:172 In function parseGraph:
[6] Invalid Node - /anchor_generator/ConstantOfShape
[network.cpp::setWeightsName::3366] Error Code 1: Internal Error (Error: Weights of same values but of different types are used in the network!)
[04/26/2023-12:52:51] [E] Failed to parse onnx file
[04/26/2023-12:52:51] [I] Finish parsing network model
[04/26/2023-12:52:51] [E] Parsing model failed
[04/26/2023-12:52:51] [E] Failed to create engine from model or file.
[04/26/2023-12:52:51] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v8502] # trtexec --onnx=exported_model.onnx --saveEngine=converted.engine

lynettez · April 27, 2023, 6:08am

I tried to fold constants using below cmd, then the error is gone but the trtexec failed at topK whose K is not an initializer.
polygraphy surgeon sanitize --fold-constants -o RetinaNet_constantsfolded.onnx RetinaNet.onnx
You may try it and please make sure the onnxruntime is installed in your en.

xqsd · April 27, 2023, 5:09pm

Thank you for the suggestion.
I encounter the same error

topK whose K is not an initializer

after folding constants.

Do you know how to proceed from that point? I was unable to find any information on solving the topK error, so that seemed like another dead-end.

lynettez · May 7, 2023, 1:59pm

Sorry for the late reply, I was OOTO last week. It looks like the TopK operators are part of NMS, one way around that error is to replace those NMS ops with a NMS plugin.

xqsd · May 10, 2023, 11:37pm

Thank you, I will look into that.

yingliu · June 6, 2023, 2:28am

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks

system · June 20, 2023, 2:28am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Running a pytorch network converted to ONNX with TensorRT on the TX2 Jetson TX2	24	8821	October 18, 2021
Issues with torch.nn.ReflectionPad2d(padding) conversion to TRT engine TensorRT tensorrt , pytorch , onnx	21	4137	February 8, 2022
[TensorRT] ERROR: Network must have at least one output TensorRT tensorrt	29	2307	September 30, 2021
Onnx model to TRT conversion error TensorRT	6	3165	April 15, 2022
Encountered known unsupported method torch.max_pool3d DeepStream SDK	12	1251	October 12, 2021
TensorRT only supports input K as an initializer TensorRT	9	3060	August 10, 2021
Torchvision Faster RCNN failed to convert to TensorRT engine TensorRT tensorrt , ubuntu , python	3	1422	October 5, 2023
I am trying to convert the ONNX SSD mobilnet v2 model into TensorRT Engine. I am getting the below error Jetson AGX Xavier tensorrt , jetson	8	788	December 8, 2021
Use pre-trained object detection TF2 models with TensorRT ONNX TensorRT	9	1893	May 31, 2021
Cannot convert SSD ONNX model to TensorRT TensorRT tensorrt	15	2338	November 23, 2022

Unable to convert torchvision model to TRT engine compatible with latest version of deepstream on Jetson

Related topics