Cannot broadcast shapes that have different ranks (onnx=>tensorrt)

Description

I have SavedModel format of a Tensorflow2 model and I follow the guide: https://developer.nvidia.com/blog/speeding-up-deep-learning-inference-using-tensorflow-onnx-and-tensorrt/

When I try to build engine (from onnx to tensorrt), I get the error from parser:
In node 66 (isBroadcastValid): UNSUPPORTED_NODE: Cannot broadcast shapes that have different ranks!

I also try it with trtexec, but the result is same.

Any help is appreciated.

Environment

Module: Jetson Nano ( 4GB)
Jetpack: 4.6
TensorRT Version: 8.0.1.6
GPU Type: Tegra X1
Nvidia Driver Version:
CUDA Version: 10.2.300
CUDNN Version: 8.2.1.32
Operating System + Version: Ubuntu 18.04 Bionic Beaver
Python Version (if applicable): 3.6.9
TensorFlow Version (if applicable): 2.5.0+nv21.8

Relevant Files

summary of the model:

savedmodel:
https://drive.google.com/drive/folders/1Fut3t5JwMHd_IkmnhtAAjMR2vMCqg7Sh?usp=sharing

onnx:

Steps To Reproduce

As GitHub - onnx/tensorflow-onnx: Convert TensorFlow, Keras, Tensorflow.js and Tflite models to ONNX indicates for a SavedModel:
python3 -m tf2onnx.convert --saved-model tensorflow-model-path --output model.onnx

then,
trtexec --onnx=onxx-model

&&&& RUNNING TensorRT.trtexec [TensorRT v8001] # trtexec --onnx=/home/jetson/Desktop/DIP/trt/aocr/lpr.onnx
[03/09/2023-09:51:20] [I] === Model Options ===
[03/09/2023-09:51:20] [I] Format: ONNX
[03/09/2023-09:51:20] [I] Model: /home/jetson/Desktop/DIP/trt/aocr/lpr.onnx
[03/09/2023-09:51:20] [I] Output:
[03/09/2023-09:51:20] [I] === Build Options ===
[03/09/2023-09:51:20] [I] Max batch: explicit
[03/09/2023-09:51:20] [I] Workspace: 16 MiB
[03/09/2023-09:51:20] [I] minTiming: 1
[03/09/2023-09:51:20] [I] avgTiming: 8
[03/09/2023-09:51:20] [I] Precision: FP32
[03/09/2023-09:51:20] [I] Calibration:
[03/09/2023-09:51:20] [I] Refit: Disabled
[03/09/2023-09:51:20] [I] Sparsity: Disabled
[03/09/2023-09:51:20] [I] Safe mode: Disabled
[03/09/2023-09:51:20] [I] Restricted mode: Disabled
[03/09/2023-09:51:20] [I] Save engine:
[03/09/2023-09:51:20] [I] Load engine:
[03/09/2023-09:51:20] [I] NVTX verbosity: 0
[03/09/2023-09:51:20] [I] Tactic sources: Using default tactic sources
[03/09/2023-09:51:20] [I] timingCacheMode: local
[03/09/2023-09:51:20] [I] timingCacheFile:
[03/09/2023-09:51:20] [I] Input(s)s format: fp32:CHW
[03/09/2023-09:51:20] [I] Output(s)s format: fp32:CHW
[03/09/2023-09:51:20] [I] Input build shapes: model
[03/09/2023-09:51:20] [I] Input calibration shapes: model
[03/09/2023-09:51:20] [I] === System Options ===
[03/09/2023-09:51:20] [I] Device: 0
[03/09/2023-09:51:20] [I] DLACore:
[03/09/2023-09:51:20] [I] Plugins:
[03/09/2023-09:51:20] [I] === Inference Options ===
[03/09/2023-09:51:20] [I] Batch: Explicit
[03/09/2023-09:51:20] [I] Input inference shapes: model
[03/09/2023-09:51:20] [I] Iterations: 10
[03/09/2023-09:51:20] [I] Duration: 3s (+ 200ms warm up)
[03/09/2023-09:51:20] [I] Sleep time: 0ms
[03/09/2023-09:51:20] [I] Streams: 1
[03/09/2023-09:51:20] [I] ExposeDMA: Disabled
[03/09/2023-09:51:20] [I] Data transfers: Enabled
[03/09/2023-09:51:20] [I] Spin-wait: Disabled
[03/09/2023-09:51:20] [I] Multithreading: Disabled
[03/09/2023-09:51:20] [I] CUDA Graph: Disabled
[03/09/2023-09:51:20] [I] Separate profiling: Disabled
[03/09/2023-09:51:20] [I] Time Deserialize: Disabled
[03/09/2023-09:51:20] [I] Time Refit: Disabled
[03/09/2023-09:51:20] [I] Skip inference: Disabled
[03/09/2023-09:51:20] [I] Inputs:
[03/09/2023-09:51:20] [I] === Reporting Options ===
[03/09/2023-09:51:20] [I] Verbose: Disabled
[03/09/2023-09:51:20] [I] Averages: 10 inferences
[03/09/2023-09:51:20] [I] Percentile: 99
[03/09/2023-09:51:20] [I] Dump refittable layers:Disabled
[03/09/2023-09:51:20] [I] Dump output: Disabled
[03/09/2023-09:51:20] [I] Profile: Disabled
[03/09/2023-09:51:20] [I] Export timing to JSON file:
[03/09/2023-09:51:20] [I] Export output to JSON file:
[03/09/2023-09:51:20] [I] Export profile to JSON file:
[03/09/2023-09:51:20] [I]
[03/09/2023-09:51:20] [I] === Device Information ===
[03/09/2023-09:51:20] [I] Selected Device: NVIDIA Tegra X1
[03/09/2023-09:51:20] [I] Compute Capability: 5.3
[03/09/2023-09:51:20] [I] SMs: 1
[03/09/2023-09:51:20] [I] Compute Clock Rate: 0.9216 GHz
[03/09/2023-09:51:20] [I] Device Global Memory: 3964 MiB
[03/09/2023-09:51:20] [I] Shared Memory per SM: 64 KiB
[03/09/2023-09:51:20] [I] Memory Bus Width: 64 bits (ECC disabled)
[03/09/2023-09:51:20] [I] Memory Clock Rate: 0.01275 GHz
[03/09/2023-09:51:20] [I]
[03/09/2023-09:51:20] [I] TensorRT version: 8001
[03/09/2023-09:51:21] [I] [TRT] [MemUsageChange] Init CUDA: CPU +203, GPU +1, now: CPU 221, GPU 2592 (MiB)
[03/09/2023-09:51:21] [I] Start parsing network model
[03/09/2023-09:51:21] [I] [TRT] ----------------------------------------------------------------
[03/09/2023-09:51:21] [I] [TRT] Input filename: /home/jetson/Desktop/DIP/trt/aocr/lpr.onnx
[03/09/2023-09:51:21] [I] [TRT] ONNX IR version: 0.0.7
[03/09/2023-09:51:21] [I] [TRT] Opset version: 13
[03/09/2023-09:51:21] [I] [TRT] Producer name: tf2onnx
[03/09/2023-09:51:21] [I] [TRT] Producer version: 1.13.0 2c1db5
[03/09/2023-09:51:21] [I] [TRT] Domain:
[03/09/2023-09:51:21] [I] [TRT] Model version: 0
[03/09/2023-09:51:21] [I] [TRT] Doc string:
[03/09/2023-09:51:21] [I] [TRT] ----------------------------------------------------------------
[03/09/2023-09:51:21] [W] [TRT] onnx2trt_utils.cpp:364: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[03/09/2023-09:51:21] [E] [TRT] ModelImporter.cpp:720: While parsing node number 66 [If → “If__282:0”]:
[03/09/2023-09:51:21] [E] [TRT] ModelImporter.cpp:721: — Begin node —
[03/09/2023-09:51:21] [E] [TRT] ModelImporter.cpp:722: input: “Equal__275:0”
output: “If__282:0”
name: “If__282”
op_type: “If”
attribute {
name: “then_branch”
g {
node {
input: “StatefulPartitionedCall/model/tf.ones/ones:0”
output: “Identity__277:0”
name: “Identity__277”
op_type: “Identity”
domain: “”
}
name: “tf2onnx__276”
doc_string: “graph for If__282 then_branch”
output {
name: “Identity__277:0”
type {
tensor_type {
elem_type: 1
shape {
dim {
dim_param: “unk__1024”
}
dim {
dim_value: 10
}
}
}
}
}
}
type: GRAPH
}
attribute {
name: “else_branch”
g {
node {
input: “StatefulPartitionedCall/model/tf.ones/ones:0”
input: “const_axes__260”
output: “Unsqueeze__280:0”
name: “Unsqueeze__280”
op_type: “Unsqueeze”
domain: “”
}
name: “tf2onnx__279”
doc_string: “graph for If__282 else_branch”
output {
name: “Unsqueeze__280:0”
type {
tensor_type {
elem_type: 1
shape {
dim {
dim_param: “unk__1025”
}
dim {
dim_param: “unk__1026”
}
dim {
dim_param: “unk__1027”
}
}
}
}
}
}
type: GRAPH
}
domain: “”

[03/09/2023-09:51:21] [E] [TRT] ModelImporter.cpp:723: — End node —
[03/09/2023-09:51:21] [E] [TRT] ModelImporter.cpp:726: ERROR: onnx2trt_utils.cpp:190 In function isBroadcastValid:
[8] Cannot broadcast shapes that have different ranks!
[03/09/2023-09:51:21] [E] Failed to parse onnx file
[03/09/2023-09:51:21] [I] Finish parsing network model
[03/09/2023-09:51:21] [E] Parsing model failed
[03/09/2023-09:51:21] [E] Engine creation failed
[03/09/2023-09:51:21] [E] Engine set up failed

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

  1. validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.

In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

Hi,

I attached onnx and saved-model.

in the mean time, check_model does not throw any error.

output of --verbose:
https://drive.google.com/drive/folders/1vMhIVxgkmdeTMR_KbLntA9yk5b9qauiY?usp=sharing

Thanks in advance

Hi @dogu.budak ,
Apologies for the delay, are you still facing the issue?

Hi,

Yes, I am still facing the issue