Description
I have SavedModel format of a Tensorflow2 model and I follow the guide: https://developer.nvidia.com/blog/speeding-up-deep-learning-inference-using-tensorflow-onnx-and-tensorrt/
When I try to build engine (from onnx to tensorrt), I get the error from parser:
In node 66 (isBroadcastValid): UNSUPPORTED_NODE: Cannot broadcast shapes that have different ranks!
I also try it with trtexec, but the result is same.
Any help is appreciated.
Environment
Module: Jetson Nano ( 4GB)
Jetpack: 4.6
TensorRT Version: 8.0.1.6
GPU Type: Tegra X1
Nvidia Driver Version:
CUDA Version: 10.2.300
CUDNN Version: 8.2.1.32
Operating System + Version: Ubuntu 18.04 Bionic Beaver
Python Version (if applicable): 3.6.9
TensorFlow Version (if applicable): 2.5.0+nv21.8
Relevant Files
summary of the model:
savedmodel:
https://drive.google.com/drive/folders/1Fut3t5JwMHd_IkmnhtAAjMR2vMCqg7Sh?usp=sharing
onnx:
Steps To Reproduce
As GitHub - onnx/tensorflow-onnx: Convert TensorFlow, Keras, Tensorflow.js and Tflite models to ONNX indicates for a SavedModel:
python3 -m tf2onnx.convert --saved-model tensorflow-model-path --output model.onnx
then,
trtexec --onnx=onxx-model
&&&& RUNNING TensorRT.trtexec [TensorRT v8001] # trtexec --onnx=/home/jetson/Desktop/DIP/trt/aocr/lpr.onnx
[03/09/2023-09:51:20] [I] === Model Options ===
[03/09/2023-09:51:20] [I] Format: ONNX
[03/09/2023-09:51:20] [I] Model: /home/jetson/Desktop/DIP/trt/aocr/lpr.onnx
[03/09/2023-09:51:20] [I] Output:
[03/09/2023-09:51:20] [I] === Build Options ===
[03/09/2023-09:51:20] [I] Max batch: explicit
[03/09/2023-09:51:20] [I] Workspace: 16 MiB
[03/09/2023-09:51:20] [I] minTiming: 1
[03/09/2023-09:51:20] [I] avgTiming: 8
[03/09/2023-09:51:20] [I] Precision: FP32
[03/09/2023-09:51:20] [I] Calibration:
[03/09/2023-09:51:20] [I] Refit: Disabled
[03/09/2023-09:51:20] [I] Sparsity: Disabled
[03/09/2023-09:51:20] [I] Safe mode: Disabled
[03/09/2023-09:51:20] [I] Restricted mode: Disabled
[03/09/2023-09:51:20] [I] Save engine:
[03/09/2023-09:51:20] [I] Load engine:
[03/09/2023-09:51:20] [I] NVTX verbosity: 0
[03/09/2023-09:51:20] [I] Tactic sources: Using default tactic sources
[03/09/2023-09:51:20] [I] timingCacheMode: local
[03/09/2023-09:51:20] [I] timingCacheFile:
[03/09/2023-09:51:20] [I] Input(s)s format: fp32:CHW
[03/09/2023-09:51:20] [I] Output(s)s format: fp32:CHW
[03/09/2023-09:51:20] [I] Input build shapes: model
[03/09/2023-09:51:20] [I] Input calibration shapes: model
[03/09/2023-09:51:20] [I] === System Options ===
[03/09/2023-09:51:20] [I] Device: 0
[03/09/2023-09:51:20] [I] DLACore:
[03/09/2023-09:51:20] [I] Plugins:
[03/09/2023-09:51:20] [I] === Inference Options ===
[03/09/2023-09:51:20] [I] Batch: Explicit
[03/09/2023-09:51:20] [I] Input inference shapes: model
[03/09/2023-09:51:20] [I] Iterations: 10
[03/09/2023-09:51:20] [I] Duration: 3s (+ 200ms warm up)
[03/09/2023-09:51:20] [I] Sleep time: 0ms
[03/09/2023-09:51:20] [I] Streams: 1
[03/09/2023-09:51:20] [I] ExposeDMA: Disabled
[03/09/2023-09:51:20] [I] Data transfers: Enabled
[03/09/2023-09:51:20] [I] Spin-wait: Disabled
[03/09/2023-09:51:20] [I] Multithreading: Disabled
[03/09/2023-09:51:20] [I] CUDA Graph: Disabled
[03/09/2023-09:51:20] [I] Separate profiling: Disabled
[03/09/2023-09:51:20] [I] Time Deserialize: Disabled
[03/09/2023-09:51:20] [I] Time Refit: Disabled
[03/09/2023-09:51:20] [I] Skip inference: Disabled
[03/09/2023-09:51:20] [I] Inputs:
[03/09/2023-09:51:20] [I] === Reporting Options ===
[03/09/2023-09:51:20] [I] Verbose: Disabled
[03/09/2023-09:51:20] [I] Averages: 10 inferences
[03/09/2023-09:51:20] [I] Percentile: 99
[03/09/2023-09:51:20] [I] Dump refittable layers:Disabled
[03/09/2023-09:51:20] [I] Dump output: Disabled
[03/09/2023-09:51:20] [I] Profile: Disabled
[03/09/2023-09:51:20] [I] Export timing to JSON file:
[03/09/2023-09:51:20] [I] Export output to JSON file:
[03/09/2023-09:51:20] [I] Export profile to JSON file:
[03/09/2023-09:51:20] [I]
[03/09/2023-09:51:20] [I] === Device Information ===
[03/09/2023-09:51:20] [I] Selected Device: NVIDIA Tegra X1
[03/09/2023-09:51:20] [I] Compute Capability: 5.3
[03/09/2023-09:51:20] [I] SMs: 1
[03/09/2023-09:51:20] [I] Compute Clock Rate: 0.9216 GHz
[03/09/2023-09:51:20] [I] Device Global Memory: 3964 MiB
[03/09/2023-09:51:20] [I] Shared Memory per SM: 64 KiB
[03/09/2023-09:51:20] [I] Memory Bus Width: 64 bits (ECC disabled)
[03/09/2023-09:51:20] [I] Memory Clock Rate: 0.01275 GHz
[03/09/2023-09:51:20] [I]
[03/09/2023-09:51:20] [I] TensorRT version: 8001
[03/09/2023-09:51:21] [I] [TRT] [MemUsageChange] Init CUDA: CPU +203, GPU +1, now: CPU 221, GPU 2592 (MiB)
[03/09/2023-09:51:21] [I] Start parsing network model
[03/09/2023-09:51:21] [I] [TRT] ----------------------------------------------------------------
[03/09/2023-09:51:21] [I] [TRT] Input filename: /home/jetson/Desktop/DIP/trt/aocr/lpr.onnx
[03/09/2023-09:51:21] [I] [TRT] ONNX IR version: 0.0.7
[03/09/2023-09:51:21] [I] [TRT] Opset version: 13
[03/09/2023-09:51:21] [I] [TRT] Producer name: tf2onnx
[03/09/2023-09:51:21] [I] [TRT] Producer version: 1.13.0 2c1db5
[03/09/2023-09:51:21] [I] [TRT] Domain:
[03/09/2023-09:51:21] [I] [TRT] Model version: 0
[03/09/2023-09:51:21] [I] [TRT] Doc string:
[03/09/2023-09:51:21] [I] [TRT] ----------------------------------------------------------------
[03/09/2023-09:51:21] [W] [TRT] onnx2trt_utils.cpp:364: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[03/09/2023-09:51:21] [E] [TRT] ModelImporter.cpp:720: While parsing node number 66 [If → “If__282:0”]:
[03/09/2023-09:51:21] [E] [TRT] ModelImporter.cpp:721: — Begin node —
[03/09/2023-09:51:21] [E] [TRT] ModelImporter.cpp:722: input: “Equal__275:0”
output: “If__282:0”
name: “If__282”
op_type: “If”
attribute {
name: “then_branch”
g {
node {
input: “StatefulPartitionedCall/model/tf.ones/ones:0”
output: “Identity__277:0”
name: “Identity__277”
op_type: “Identity”
domain: “”
}
name: “tf2onnx__276”
doc_string: “graph for If__282 then_branch”
output {
name: “Identity__277:0”
type {
tensor_type {
elem_type: 1
shape {
dim {
dim_param: “unk__1024”
}
dim {
dim_value: 10
}
}
}
}
}
}
type: GRAPH
}
attribute {
name: “else_branch”
g {
node {
input: “StatefulPartitionedCall/model/tf.ones/ones:0”
input: “const_axes__260”
output: “Unsqueeze__280:0”
name: “Unsqueeze__280”
op_type: “Unsqueeze”
domain: “”
}
name: “tf2onnx__279”
doc_string: “graph for If__282 else_branch”
output {
name: “Unsqueeze__280:0”
type {
tensor_type {
elem_type: 1
shape {
dim {
dim_param: “unk__1025”
}
dim {
dim_param: “unk__1026”
}
dim {
dim_param: “unk__1027”
}
}
}
}
}
}
type: GRAPH
}
domain: “”[03/09/2023-09:51:21] [E] [TRT] ModelImporter.cpp:723: — End node —
[03/09/2023-09:51:21] [E] [TRT] ModelImporter.cpp:726: ERROR: onnx2trt_utils.cpp:190 In function isBroadcastValid:
[8] Cannot broadcast shapes that have different ranks!
[03/09/2023-09:51:21] [E] Failed to parse onnx file
[03/09/2023-09:51:21] [I] Finish parsing network model
[03/09/2023-09:51:21] [E] Parsing model failed
[03/09/2023-09:51:21] [E] Engine creation failed
[03/09/2023-09:51:21] [E] Engine set up failed