Gather node output wrong when converting with TensorRT

Mark_Bentley · June 16, 2022, 1:11am

Description

Hi, during conversion of an onnx model (exported from pytorch with opset 11) to tensorRT, the parser reports an error:

[06/15/2022-23:45:05] [E] Error[4]: [graphShapeAnalyzer.cpp::analyzeShapes::1285] Error Code 4: Miscellaneous (IShuffleLayer Reshape_1179: reshape changes volume. Reshaping [343728000] to [1,4200].)
[06/15/2022-23:45:05] [E] [TRT] ModelImporter.cpp:773: While parsing node number 357 [Reshape → “2462”]:
[06/15/2022-23:45:05] [E] [TRT] ModelImporter.cpp:774: — Begin node —
[06/15/2022-23:45:05] [E] [TRT] ModelImporter.cpp:775: input: “2460”
input: “2461”
output: “2462”
name: “Reshape_1179”
op_type: “Reshape”
[06/15/2022-23:45:05] [E] [TRT] ModelImporter.cpp:776: — End node —

This issue seems to be caused by an upstream node (gather_1175), outputting a (1, 4200, 81840) :

[06/15/2022-23:45:05] [V] [TRT] Parsing node: Gather_1175 [Gather]
[06/15/2022-23:45:05] [V] [TRT] Searching for input: 2453
[06/15/2022-23:45:05] [V] [TRT] Searching for input: 2455
[06/15/2022-23:45:05] [V] [TRT] Gather_1175 [Gather] inputs: [2453 → (1, 81840)[FLOAT]], [2455 → (1, 4200)[INT32]],
[06/15/2022-23:45:05] [V] [TRT] Using Gather axis: 0
[06/15/2022-23:45:05] [V] [TRT] Registering layer: Gather_1175 for ONNX node: Gather_1175
[06/15/2022-23:45:05] [V] [TRT] Registering tensor: 2456 for ONNX tensor: 2456
[06/15/2022-23:45:05] [V] [TRT] Gather_1175 [Gather] outputs: [2456 → (1, 4200, 81840)[FLOAT]],

instead of what netron shows (when inspecting the onnx file) in the image below ( a (1, 4200, 1))

Does anyone know why this may be?

Environment

TensorRT Version: 8.2.1
GPU Type: Xavier NX
CUDA Version: 10.2
Operating System + Version: L4T 32.6
Python Version (if applicable):
PyTorch Version (if applicable): 1.9
Baremetal or Container (if container which image + tag): nvcr.io/nvidia/l4t-tensorrt:r8.2.1-runtime

Relevant Files

Full output log attached
trtexec_reshape_failure_output.txt (277.8 KB)

Steps To Reproduce

Export a fasterrcnn_resnet50 model from torchvision using pytorch to onnx
Using nvidia provided container, run conversion from onnx to tensorRT using trtexec

Thanks for your time,
Mark

NVES · June 16, 2022, 1:37am

Hi,
UFF and Caffe Parser have been deprecated from TensorRT 7 onwards, hence request you to try ONNX parser.
Please check the below link for the same.

Thanks!

Mark_Bentley · June 16, 2022, 1:37am

Actually the discrepancy between the output of trtexec and netron/onnx starts one step back in the graph:
In the flatten op that feeds the gather, trtexec’s input and output tensors are the same:
[06/15/2022-23:45:05] [V] [TRT] Flatten_1172 [Flatten] inputs: [2337 → (1, 81840)[FLOAT]],
[06/15/2022-23:45:05] [V] [TRT] Registering tensor: 2453 for ONNX tensor: 2453
[06/15/2022-23:45:05] [V] [TRT] Flatten_1172 [Flatten] outputs: [2453 → (1, 81840)[FLOAT]],

But the dims are switched in netron:

spolisetty · June 16, 2022, 5:13pm

Hi,

Could you please share with us issue repro ONNX model and trtexec --verbose logs for better debugging.

Thank you.

Mark_Bentley · June 16, 2022, 7:37pm

Hi,

The trtexec --verbose log was attached in the original post.
The model is just torchvisions fasterrcnn resnet50 exported to onnx using torch.onnx.export. Shall i upload the onnx file here?

I will attempt to use onnx-tensorrt, but it should be noted that trtexec is in your developer and quickstart quides as the method to use for onnx to tensorRT conversion. Please update if it is truly deprecated.

Thanks,
Mark

Mark_Bentley · June 16, 2022, 8:24pm

I forgot to mentioned I get the same error using the python api.

import tensorrt as trt
logger = trt.Logger(trt.Logger.WARNING)
builder = trt.Builder(logger)
network = builder.create_network(1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH))
parser = trt.OnnxParser(network, logger)
parser.parse(onnx_model.SerializeToString())

Mark_Bentley · June 16, 2022, 10:10pm

OK I have tried onnx-tensorrt’s onnx2trt exe, and the error is the same:

Input filename: models/simplified_thermal.onnx
ONNX IR version: 0.0.6
Opset version: 11
Producer name: pytorch
Producer version: 1.9
Domain:
Model version: 0
Doc string:

Parsing model
[2022-06-16 22:08:57 WARNING] onnx2trt_utils.cpp:366: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[2022-06-16 22:08:57 WARNING] onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[2022-06-16 22:08:57 WARNING] onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[2022-06-16 22:08:57 WARNING] onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[2022-06-16 22:08:57 WARNING] onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[2022-06-16 22:08:57 WARNING] onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[2022-06-16 22:08:57 WARNING] onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[2022-06-16 22:08:57 WARNING] onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[2022-06-16 22:08:57 WARNING] onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[2022-06-16 22:08:57 WARNING] onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[2022-06-16 22:08:57 WARNING] onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[2022-06-16 22:08:57 WARNING] onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[2022-06-16 22:08:57 WARNING] onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[2022-06-16 22:08:57 WARNING] onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[2022-06-16 22:08:57 WARNING] onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[2022-06-16 22:08:57 WARNING] onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[2022-06-16 22:08:57 ERROR] [graphShapeAnalyzer.cpp::analyzeShapes::1285] Error Code 4: Miscellaneous (IShuffleLayer Reshape_1179: reshape changes volume. Reshaping [343728000] to [1,4200].)
While parsing node number 357 [Reshape → “2462”]:
ERROR: ModelImporter.cpp:179 In function parseGraph:
[6] Invalid Node - Reshape_1179
[graphShapeAnalyzer.cpp::analyzeShapes::1285] Error Code 4: Miscellaneous (IShuffleLayer Reshape_1179: reshape changes volume. Reshaping [343728000] to [1,4200].)

spolisetty · June 17, 2022, 2:41pm

Yes, please. We would like to try from our end to reproduce the issue for better debugging.

Thank you.

Mark_Bentley · June 18, 2022, 11:07pm

simplified_thermal.zip.001 (10 MB)
simplified_thermal.zip.002 (10 MB)
simplified_thermal.zip.003 (10 MB)
simplified_thermal.zip.004 (10 MB)
simplified_thermal.zip.005 (10 MB)

simplified_thermal.zip.006 (10 MB)
simplified_thermal.zip.007 (10 MB)
simplified_thermal.zip.008 (10 MB)
simplified_thermal.zip.009 (10 MB)
simplified_thermal.zip.010 (10 MB)
simplified_thermal.zip.011 (10 MB)
simplified_thermal.zip.012 (10 MB)
simplified_thermal.zip.013 (10 MB)
simplified_thermal.zip.014 (10 MB)
simplified_thermal.zip.015 (8.3 MB)

Thanks, Mark

Mark_Bentley · June 21, 2022, 4:43am

@spolisetty have you had a chance to look at this?

For some wider context, we really need an open source object detection model (that we can modify) that will convert to a TensorRT engine (so we can run in Deepstream on Xavier NX). We have tried many models and conversion tools without success.

We appreciate any advice on this, if anyone knows of any decent pytorch detection model source that definitely converts to TRT without error (on L4T), please let me know.

mugurcal · June 21, 2022, 10:42am

Is there any update on this issue? I have encountered with the same problem.

spolisetty · June 21, 2022, 10:54am

Hi @Mark_Bentley ,

Looks like Zip files are corrupted. Please share with us the issue repro ONNX model.

Thank you.

mugurcal · June 21, 2022, 11:21am

I have tried to convert torchvision pretrained maskrcnn_resnet50_fpn to onnx. Same issue occured. When I checked the onnx graph it seems ok.

The error that I encountered:

[06/21/2022-13:44:50] [E] Error[4]: [graphShapeAnalyzer.cpp::analyzeShapes::1300] Error Code 4: Miscellaneous (IShuffleLayer Reshape_1318: reshape changes volume. Reshaping [1152020331] to [1,4741].)
[06/21/2022-13:44:50] [E] [TRT] ModelImporter.cpp:748: While parsing node number 357 [Reshape -> "onnx::Sigmoid_2657"]:
[06/21/2022-13:44:50] [E] [TRT] ModelImporter.cpp:749: --- Begin node ---
[06/21/2022-13:44:50] [E] [TRT] ModelImporter.cpp:750: input: "onnx::Reshape_2655"
input: "onnx::Reshape_2656"
output: "onnx::Sigmoid_2657"
name: "Reshape_1318"
op_type: "Reshape"
attribute {
  name: "allowzero"
  i: 0
  type: INT
}

[06/21/2022-13:44:50] [E] [TRT] ModelImporter.cpp:751: --- End node ---
[06/21/2022-13:44:50] [E] [TRT] ModelImporter.cpp:754: ERROR: ModelImporter.cpp:179 In function parseGraph:
[6] Invalid Node - Reshape_1318
[graphShapeAnalyzer.cpp::analyzeShapes::1300] Error Code 4: Miscellaneous (IShuffleLayer Reshape_1318: reshape changes volume. Reshaping [1152020331] to [1,4741].)
[06/21/2022-13:44:50] [E] Failed to parse onnx file
[06/21/2022-13:44:50] [I] Finish parsing network model
[06/21/2022-13:44:50] [E] Parsing model failed
[06/21/2022-13:44:50] [E] Failed to create engine from model.
[06/21/2022-13:44:50] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v8400]

spolisetty · June 21, 2022, 11:22am

Could you please share with us ONNX model and trtexec command you’re trying

mugurcal · June 21, 2022, 11:25am

Model can be downloaded from the drive link

trtexec command that I am exucuting is:

/usr/src/tensorrt/bin/trtexec --onnx=/home/fugurcal/L4-2022-ws/deployment/deployment_guide/models/maskrcnn_resnet50_fpn_simplified.onnx --workspace=2048 --warmUp=1000 --noDataTransfers

mugurcal · June 21, 2022, 11:42am

After creating onnx file I have used polygraphy to do constant folding:

polygraphy surgeon sanitize --fold-constants maskrcnn_resnet50_fpn.onnx  -o maskrcnn_resnet50_fpn_simplified.onnx

kaltinok · June 23, 2022, 10:17am

Is there any updates on this issue? @spolisetty

user97788 · June 23, 2022, 10:21am

I was looking for the same topic? Is there any update? @spolisetty

spolisetty · June 23, 2022, 10:28am

Hi,

We could reproduce the same error. Please allow us some time to work on this issue.
Will get back to you shortly.

Thank you.

Mark_Bentley · June 30, 2022, 10:12pm

thanks @spolisetty.

If this is a bug with the flatten op implementation in TensorRT, will a fix only be available in >v8.4 ?

Topic		Replies	Views
Having trouble converting Pytorch Faster-RCNN to TensorRT Engine TensorRT	4	1938	September 13, 2022
Torchvision Faster RCNN failed to convert to TensorRT engine TensorRT tensorrt , ubuntu , python	3	1422	October 5, 2023
Unable to parse custom pytorch UNET onnx model with python deepstream-segmentation-app TensorRT	8	1476	September 21, 2022
Conversion error of Mask RCNN ONNX model for different types weights TensorRT tensorrt , pytorch , onnx , jetson	5	853	May 30, 2023
Cannot convert SSD ONNX model to TensorRT TensorRT tensorrt	15	2337	November 23, 2022
Issues with torch.nn.ReflectionPad2d(padding) conversion to TRT engine TensorRT tensorrt , pytorch , onnx	21	4137	February 8, 2022
Running a pytorch network converted to ONNX with TensorRT on the TX2 Jetson TX2	24	8821	October 18, 2021
TensorRT's OnnxParser problem TensorRT tensorrt	6	2304	October 12, 2021
[TensorRT] ERROR: Network must have at least one output TensorRT tensorrt	29	2307	September 30, 2021
I am trying to convert the ONNX SSD mobilnet v3 model into TensorRT Engine. I am getting the below error Jetson TX2 tensorrt , tensorflow	24	3670	February 17, 2022

Gather node output wrong when converting with TensorRT

Description

Environment

Relevant Files

Steps To Reproduce

Input filename: models/simplified_thermal.onnx ONNX IR version: 0.0.6 Opset version: 11 Producer name: pytorch Producer version: 1.9 Domain: Model version: 0 Doc string:

Related topics

Input filename: models/simplified_thermal.onnx
ONNX IR version: 0.0.6
Opset version: 11
Producer name: pytorch
Producer version: 1.9
Domain:
Model version: 0
Doc string: