Can't convert Tensorflow Object Detection API models to onnx with dynamic batch size

Description

I have been using this guide from TensorRT to convert tf object detection api models to onnx. For explicit batch sizes it works perfect. However, we also wanted to create an onnx model with dynamic batch size input. When we run create_onnx.py script with --batch_size=-1 it fails. From what i read from source code of onnx-graphsurgeon -1 is used for dynamic size.

Here is the model I am trying to convert to onnx.

# Cutted most of beginning of the log because of character limit.
INFO:tf2onnx.optimizer:Optimizing ONNX model
INFO:tf2onnx.optimizer:After optimization: BatchNormalization -53 (60->7), Cast -481 (2037->1556), Const -430 (3364->2934), Gather +6 (488->494), Identity -193 (193->0), Less -2 (99->97), Mul -2 (504->502), Reshape +2 (388->390), Shape -8 (216->208), Slice -7 (427->420), Squeeze -22 (342->320), Transpose -253 (276->23), Unsqueeze -166 (478->312)
INFO:ModelHelper:TF2ONNX graph created successfully
INFO:ModelHelper:Model is ssd_mobilenet_v2_keras
INFO:ModelHelper:Height is 300
INFO:ModelHelper:Width is 300
INFO:ModelHelper:First NMS score threshold is 9.99999993922529e-09
INFO:ModelHelper:First NMS iou threshold is 0.20000000298023224
INFO:ModelHelper:First NMS max proposals is 100
Warning: Unsupported operator TensorListStack. No schema registered for this operator.
Warning: Unsupported operator TensorListStack. No schema registered for this operator.
INFO:ModelHelper:ONNX graph input shape: [-1, 300, 300, 3] [NCHW format set]
INFO:ModelHelper:Found Conv node 'StatefulPartitionedCall/ssd_mobile_net_v2keras_feature_extractor/model/Conv1/Conv2D' as stem entry
Warning: Unsupported operator TensorListStack. No schema registered for this operator.
Traceback (most recent call last):
  File "create_onnx.py", line 672, in <module>
    main(args)
  File "create_onnx.py", line 647, in main
    effdet_gs.update_preprocessor(args.batch_size, args.input_format)
  File "create_onnx.py", line 265, in update_preprocessor
    self.sanitize()
  File "create_onnx.py", line 159, in sanitize
    self.graph.fold_constants(fold_shapes=True)
  File "/usr/local/lib/python3.8/dist-packages/onnx_graphsurgeon/ir/graph.py", line 646, in fold_constants
    shape_of = lower_shape(tensor)
  File "/usr/local/lib/python3.8/dist-packages/onnx_graphsurgeon/ir/graph.py", line 639, in lower_shape
    shape = fold_func(tensor)
  File "/usr/local/lib/python3.8/dist-packages/onnx_graphsurgeon/ir/graph.py", line 554, in handle_shape
    inp = get_input(get_producer(tensor, "Shape"))
  File "/usr/local/lib/python3.8/dist-packages/onnx_graphsurgeon/ir/graph.py", line 544, in get_input
    inp = node.inputs[index]
IndexError: list index out of range

This is logged if i use onnx-graphsurgeon=0.3.16.

# Cutted most of beginning of the log because of character limit.
INFO:tf2onnx.optimizer:Optimizing ONNX model
INFO:tf2onnx.optimizer:After optimization: BatchNormalization -53 (60->7), Cast -481 (2037->1556), Const -430 (3364->2934), Gather +6 (488->494), Identity -193 (193->0), Less -2 (99->97), Mul -2 (504->502), Reshape +2 (388->390), Shape -8 (216->208), Slice -7 (427->420), Squeeze -22 (342->320), Transpose -253 (276->23), Unsqueeze -166 (478->312)
INFO:ModelHelper:TF2ONNX graph created successfully
INFO:ModelHelper:Model is ssd_mobilenet_v2_keras
INFO:ModelHelper:Height is 300
INFO:ModelHelper:Width is 300
INFO:ModelHelper:First NMS score threshold is 9.99999993922529e-09
INFO:ModelHelper:First NMS iou threshold is 0.20000000298023224
INFO:ModelHelper:First NMS max proposals is 100
Warning: Unsupported operator TensorListStack. No schema registered for this operator.
Warning: Unsupported operator TensorListStack. No schema registered for this operator.
INFO:ModelHelper:ONNX graph input shape: [-1, 300, 300, 3] [NCHW format set]
INFO:ModelHelper:Found Conv node 'StatefulPartitionedCall/ssd_mobile_net_v2keras_feature_extractor/model/Conv1/Conv2D' as stem entry
Warning: Unsupported operator TensorListStack. No schema registered for this operator.
Warning: Unsupported operator TensorListStack. No schema registered for this operator.
INFO:ModelHelper:Found Concat node 'StatefulPartitionedCall/concat_1' as the tip of BoxPredictor/ConvolutionalClassHead_
INFO:ModelHelper:Found Squeeze node 'StatefulPartitionedCall/Squeeze' as the tip of BoxPredictor/ConvolutionalBoxHead_
[W] colored module is not installed, will not use colors when logging. To enable colors, please install the colored module: python3 -m pip install colored
[W] 'Shape tensor cast elision' routine failed with: None
[W] colored module is not installed, will not use colors when logging. To enable colors, please install the colored module: python3 -m pip install colored
[W] 'Shape tensor cast elision' routine failed with: None
[W] colored module is not installed, will not use colors when logging. To enable colors, please install the colored module: python3 -m pip install colored
[W] 'Shape tensor cast elision' routine failed with: None
[W] colored module is not installed, will not use colors when logging. To enable colors, please install the colored module: python3 -m pip install colored
[W] 'fold_shape' routine failed with:
list index out of range
[W] colored module is not installed, will not use colors when logging. To enable colors, please install the colored module: python3 -m pip install colored
[W] Inference failed. You may want to try enabling partitioning to see better results. Note: Error was:
[ONNXRuntimeError] : 10 : INVALID_GRAPH : This is an invalid model. Error in Node:StatefulPartitionedCall/map/Shape : Node (StatefulPartitionedCall/map/Shape) has input size 0 not in range [min=1, max=1].
[W] colored module is not installed, will not use colors when logging. To enable colors, please install the colored module: python3 -m pip install colored
[W] 'fold_shape' routine failed with:
list index out of range
[W] colored module is not installed, will not use colors when logging. To enable colors, please install the colored module: python3 -m pip install colored
[W] Inference failed. You may want to try enabling partitioning to see better results. Note: Error was:
[ONNXRuntimeError] : 10 : INVALID_GRAPH : This is an invalid model. Error in Node:StatefulPartitionedCall/map/Shape : Node (StatefulPartitionedCall/map/Shape) has input size 0 not in range [min=1, max=1].
Traceback (most recent call last):
  File "create_onnx.py", line 672, in <module>
    main(args)
  File "create_onnx.py", line 648, in main
    effdet_gs.process_graph(args.first_nms_threshold, args.second_nms_threshold)
  File "create_onnx.py", line 621, in process_graph
    self.graph.outputs = first_nms(-1, True, first_nms_threshold)
  File "create_onnx.py", line 485, in first_nms
    anchors_tensor = self.extract_anchors_tensor(box_net_split)
  File "create_onnx.py", line 311, in extract_anchors_tensor
    anchors_y = get_anchor(0, "Add")
  File "create_onnx.py", line 300, in get_anchor
    if (node.inputs[1].values).size == 1: 
AttributeError: 'Variable' object has no attribute 'values'

Environment

TensorRT Version: 8.0.3 (on docker container)
GPU Type: Geforce 1060
Nvidia Driver Version: 510.54
CUDA Version: 11.6 (on host machine), 11.4 (on docker container)
CUDNN Version: 8.2.4 (on docker container)
Operating System + Version: Linux 5.15.28
Python Version (if applicable): 3.8.10
TensorFlow Version (if applicable): 2.6
Baremetal or Container (if container which image + tag): nvcr.io/nvidia/tensorflow:21.10-tf2-py3

Steps To Reproduce

Create a work environment from here. You can also use dockerfile below

FROM nvcr.io/nvidia/tensorflow:21.10-tf2-py3

RUN apt update && apt install -y \
    wget \
    unzip \
    git 

RUN mkdir -p /workspace && \
    cd /workspace && \
    wget https://github.com/protocolbuffers/protobuf/releases/download/v3.15.4/protoc-3.15.4-linux-x86_64.zip && \
    unzip protoc*.zip bin/protoc -d /usr/local && \
    git clone https://github.com/tensorflow/models.git --single-branch && \
    cd /workspace/models/research && \
    git checkout 08b6803 && \
    protoc object_detection/protos/*.proto --python_out=. && \
    cp object_detection/packages/tf2/setup.py ./ && \
    pip install tensorflow_text==2.6.0 && \
    pip install tf-models-official==2.6.0 && \
    pip --use-deprecated=legacy-resolver install .

RUN cd /workspace && \
    git clone https://github.com/NVIDIA/TensorRT.git && \
    cd TensorRT/samples/python/tensorflow_object_detection_api && \
    pip install -r requirements.txt && \
    pip install onnx-graphsurgeon==0.3.10 --index-url https://pypi.ngc.nvidia.com

Then run these commands

mkdir -p /tf2onnx
wget http://download.tensorflow.org/models/object_detection/tf2/20200711/ssd_mobilenet_v2_320x320_coco17_tpu-8.tar.gz -O /tf2onnx/ssd_mobilenet_v2_320x320_coco17_tpu-8.tar.gz
cd /tf2onnx
mkdir -p mn2_re
tar -xvf ssd_mobilenet_v2_320x320_coco17_tpu-8.tar.gz
cd  /workspace/models/research/object_detection
python exporter_main_v2.py \
    --input_type float_image_tensor \
    --trained_checkpoint_dir /tf2onnx/ssd_mobilenet_v2_320x320_coco17_tpu-8/checkpoint \
    --pipeline_config_path /tf2onnx/ssd_mobilenet_v2_320x320_coco17_tpu-8/pipeline.config \
    --output_directory /tf2onnx/mn2_re
cd /workspace/TensorRT/samples/python/tensorflow_object_detection_api
python create_onnx.py     --pipeline_config /tf2onnx/mn2_re/pipeline.config     --saved_model /tf2onnx/mn2_re/saved_model     --onnx /tf2onnx/mn2.onnx --first_nms_threshold 0.5  --batch_size -1 2>&1 | tee /tf2onnx/fail.log

You should be seeing logs from above.

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

  1. validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.

In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

Hello,
I think there is a misunderstanding. I am trying to convert tensorflow (more specifically a tensorflow object detection) model to onnx model using TensorRT guide. You can find the tensorflow model I am trying to convert here.

Hi,

We recommend you to please raise your concern on Issues · NVIDIA/TensorRT · GitHub to get better help.

Thank you.