Gather node output wrong when converting with TensorRT

Description

Hi, during conversion of an onnx model (exported from pytorch with opset 11) to tensorRT, the parser reports an error:

[06/15/2022-23:45:05] [E] Error[4]: [graphShapeAnalyzer.cpp::analyzeShapes::1285] Error Code 4: Miscellaneous (IShuffleLayer Reshape_1179: reshape changes volume. Reshaping [343728000] to [1,4200].)
[06/15/2022-23:45:05] [E] [TRT] ModelImporter.cpp:773: While parsing node number 357 [Reshape → “2462”]:
[06/15/2022-23:45:05] [E] [TRT] ModelImporter.cpp:774: — Begin node —
[06/15/2022-23:45:05] [E] [TRT] ModelImporter.cpp:775: input: “2460”
input: “2461”
output: “2462”
name: “Reshape_1179”
op_type: “Reshape”
[06/15/2022-23:45:05] [E] [TRT] ModelImporter.cpp:776: — End node —

This issue seems to be caused by an upstream node (gather_1175), outputting a (1, 4200, 81840) :

[06/15/2022-23:45:05] [V] [TRT] Parsing node: Gather_1175 [Gather]
[06/15/2022-23:45:05] [V] [TRT] Searching for input: 2453
[06/15/2022-23:45:05] [V] [TRT] Searching for input: 2455
[06/15/2022-23:45:05] [V] [TRT] Gather_1175 [Gather] inputs: [2453 → (1, 81840)[FLOAT]], [2455 → (1, 4200)[INT32]],
[06/15/2022-23:45:05] [V] [TRT] Using Gather axis: 0
[06/15/2022-23:45:05] [V] [TRT] Registering layer: Gather_1175 for ONNX node: Gather_1175
[06/15/2022-23:45:05] [V] [TRT] Registering tensor: 2456 for ONNX tensor: 2456
[06/15/2022-23:45:05] [V] [TRT] Gather_1175 [Gather] outputs: [2456 → (1, 4200, 81840)[FLOAT]],

instead of what netron shows (when inspecting the onnx file) in the image below ( a (1, 4200, 1))

Does anyone know why this may be?

Environment

TensorRT Version: 8.2.1
GPU Type: Xavier NX
CUDA Version: 10.2
Operating System + Version: L4T 32.6
Python Version (if applicable):
PyTorch Version (if applicable): 1.9
Baremetal or Container (if container which image + tag): nvcr.io/nvidia/l4t-tensorrt:r8.2.1-runtime

Relevant Files

Full output log attached
trtexec_reshape_failure_output.txt (277.8 KB)

Steps To Reproduce

  1. Export a fasterrcnn_resnet50 model from torchvision using pytorch to onnx
  2. Using nvidia provided container, run conversion from onnx to tensorRT using trtexec

Thanks for your time,
Mark

Hi,
UFF and Caffe Parser have been deprecated from TensorRT 7 onwards, hence request you to try ONNX parser.
Please check the below link for the same.

Thanks!

Actually the discrepancy between the output of trtexec and netron/onnx starts one step back in the graph:
In the flatten op that feeds the gather, trtexec’s input and output tensors are the same:
[06/15/2022-23:45:05] [V] [TRT] Flatten_1172 [Flatten] inputs: [2337 → (1, 81840)[FLOAT]],
[06/15/2022-23:45:05] [V] [TRT] Registering tensor: 2453 for ONNX tensor: 2453
[06/15/2022-23:45:05] [V] [TRT] Flatten_1172 [Flatten] outputs: [2453 → (1, 81840)[FLOAT]],

But the dims are switched in netron:

Hi,

Could you please share with us issue repro ONNX model and trtexec --verbose logs for better debugging.

Thank you.

Hi,

The trtexec --verbose log was attached in the original post.
The model is just torchvisions fasterrcnn resnet50 exported to onnx using torch.onnx.export. Shall i upload the onnx file here?

I will attempt to use onnx-tensorrt, but it should be noted that trtexec is in your developer and quickstart quides as the method to use for onnx to tensorRT conversion. Please update if it is truly deprecated.

Thanks,
Mark

I forgot to mentioned I get the same error using the python api.

import tensorrt as trt
logger = trt.Logger(trt.Logger.WARNING)
builder = trt.Builder(logger)
network = builder.create_network(1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH))
parser = trt.OnnxParser(network, logger)
parser.parse(onnx_model.SerializeToString())

OK I have tried onnx-tensorrt’s onnx2trt exe, and the error is the same:


Input filename: models/simplified_thermal.onnx
ONNX IR version: 0.0.6
Opset version: 11
Producer name: pytorch
Producer version: 1.9
Domain:
Model version: 0
Doc string:

Parsing model
[2022-06-16 22:08:57 WARNING] onnx2trt_utils.cpp:366: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[2022-06-16 22:08:57 WARNING] onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[2022-06-16 22:08:57 WARNING] onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[2022-06-16 22:08:57 WARNING] onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[2022-06-16 22:08:57 WARNING] onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[2022-06-16 22:08:57 WARNING] onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[2022-06-16 22:08:57 WARNING] onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[2022-06-16 22:08:57 WARNING] onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[2022-06-16 22:08:57 WARNING] onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[2022-06-16 22:08:57 WARNING] onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[2022-06-16 22:08:57 WARNING] onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[2022-06-16 22:08:57 WARNING] onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[2022-06-16 22:08:57 WARNING] onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[2022-06-16 22:08:57 WARNING] onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[2022-06-16 22:08:57 WARNING] onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[2022-06-16 22:08:57 WARNING] onnx2trt_utils.cpp:392: One or more weights outside the range of INT32 was clamped
[2022-06-16 22:08:57 ERROR] [graphShapeAnalyzer.cpp::analyzeShapes::1285] Error Code 4: Miscellaneous (IShuffleLayer Reshape_1179: reshape changes volume. Reshaping [343728000] to [1,4200].)
While parsing node number 357 [Reshape → “2462”]:
ERROR: ModelImporter.cpp:179 In function parseGraph:
[6] Invalid Node - Reshape_1179
[graphShapeAnalyzer.cpp::analyzeShapes::1285] Error Code 4: Miscellaneous (IShuffleLayer Reshape_1179: reshape changes volume. Reshaping [343728000] to [1,4200].)

Yes, please. We would like to try from our end to reproduce the issue for better debugging.

Thank you.

simplified_thermal.zip.001 (10 MB)
simplified_thermal.zip.002 (10 MB)
simplified_thermal.zip.003 (10 MB)
simplified_thermal.zip.004 (10 MB)
simplified_thermal.zip.005 (10 MB)

simplified_thermal.zip.006 (10 MB)
simplified_thermal.zip.007 (10 MB)
simplified_thermal.zip.008 (10 MB)
simplified_thermal.zip.009 (10 MB)
simplified_thermal.zip.010 (10 MB)
simplified_thermal.zip.011 (10 MB)
simplified_thermal.zip.012 (10 MB)
simplified_thermal.zip.013 (10 MB)
simplified_thermal.zip.014 (10 MB)
simplified_thermal.zip.015 (8.3 MB)

Thanks, Mark

@spolisetty have you had a chance to look at this?

For some wider context, we really need an open source object detection model (that we can modify) that will convert to a TensorRT engine (so we can run in Deepstream on Xavier NX). We have tried many models and conversion tools without success.

We appreciate any advice on this, if anyone knows of any decent pytorch detection model source that definitely converts to TRT without error (on L4T), please let me know.

Is there any update on this issue? I have encountered with the same problem.

Hi @Mark_Bentley ,

Looks like Zip files are corrupted. Please share with us the issue repro ONNX model.

Thank you.

I have tried to convert torchvision pretrained maskrcnn_resnet50_fpn to onnx. Same issue occured. When I checked the onnx graph it seems ok.

The error that I encountered:

[06/21/2022-13:44:50] [E] Error[4]: [graphShapeAnalyzer.cpp::analyzeShapes::1300] Error Code 4: Miscellaneous (IShuffleLayer Reshape_1318: reshape changes volume. Reshaping [1152020331] to [1,4741].)
[06/21/2022-13:44:50] [E] [TRT] ModelImporter.cpp:748: While parsing node number 357 [Reshape -> "onnx::Sigmoid_2657"]:
[06/21/2022-13:44:50] [E] [TRT] ModelImporter.cpp:749: --- Begin node ---
[06/21/2022-13:44:50] [E] [TRT] ModelImporter.cpp:750: input: "onnx::Reshape_2655"
input: "onnx::Reshape_2656"
output: "onnx::Sigmoid_2657"
name: "Reshape_1318"
op_type: "Reshape"
attribute {
  name: "allowzero"
  i: 0
  type: INT
}

[06/21/2022-13:44:50] [E] [TRT] ModelImporter.cpp:751: --- End node ---
[06/21/2022-13:44:50] [E] [TRT] ModelImporter.cpp:754: ERROR: ModelImporter.cpp:179 In function parseGraph:
[6] Invalid Node - Reshape_1318
[graphShapeAnalyzer.cpp::analyzeShapes::1300] Error Code 4: Miscellaneous (IShuffleLayer Reshape_1318: reshape changes volume. Reshaping [1152020331] to [1,4741].)
[06/21/2022-13:44:50] [E] Failed to parse onnx file
[06/21/2022-13:44:50] [I] Finish parsing network model
[06/21/2022-13:44:50] [E] Parsing model failed
[06/21/2022-13:44:50] [E] Failed to create engine from model.
[06/21/2022-13:44:50] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v8400]

Could you please share with us ONNX model and trtexec command you’re trying

Model can be downloaded from the drive link

trtexec command that I am exucuting is:

/usr/src/tensorrt/bin/trtexec --onnx=/home/fugurcal/L4-2022-ws/deployment/deployment_guide/models/maskrcnn_resnet50_fpn_simplified.onnx --workspace=2048 --warmUp=1000 --noDataTransfers

After creating onnx file I have used polygraphy to do constant folding:

polygraphy surgeon sanitize --fold-constants maskrcnn_resnet50_fpn.onnx  -o maskrcnn_resnet50_fpn_simplified.onnx

Is there any updates on this issue? @spolisetty

I was looking for the same topic? Is there any update? @spolisetty

Hi,

We could reproduce the same error. Please allow us some time to work on this issue.
Will get back to you shortly.

Thank you.

3 Likes

thanks @spolisetty.

If this is a bug with the flatten op implementation in TensorRT, will a fix only be available in >v8.4 ?