Centernet mobilenet v2 from TFOD - failure when converting ONNX to TRT

I am interested in running the Centernet Mobilenetv2 from TF2 object zoo pretrained model link

I can convert the saved_model to ONNX with:
python -m tf2onnx.convert --saved-model saved_model/ --output model.onnx --opset 11

Then replace the input layer with FP32 using the graphsurgeon API.

However converting the resulting onnx model to TRT fails with the message:

[8] Assertion failed: cond.is_weights() && cond.weights().count() == 1 && "If condition must be a initializer!"

ONNX model (pre and post graphsurgeon) and verbose trtexec output are attached.

Runing on jetson nano 4GB and jetpack 4.5.1

trt_verbose.txt (72.5 KB)
model.onnx (9.1 MB)
updated_model.onnx (9.1 MB)

Hi,

This is a known issue due to the “IF” layer used in the model.
You can find some details below:

To solve this, we provide an example to convert the EfficientDet with a customized parser.
It is one of the TensorFlow object detection models, and also gets stuck in the “IF” operation.

You can check the example below for the details:

Thanks.

In the example you mentioned, on using the --legacy_plugins flag, I get:
Warning: Unsupported operator ResizeNearest_TRT. No schema registered for this operator.
TensorRt version: 7.2.1
Tensorflow v: 2.5

Thanks, that is really helpful. If I understand it correctly the problematic part is in the very start of the network and so you try to replace the whole block before the first convolution. I tried adapting the effdet sample code to the Centernet model I’m trying to use and conversion worked (and looks ok if I check it visually), but trtexec fails with:

...
[07/20/2021-00:21:25] [V] [TRT] ModelImporter.cpp:119: Searching for input: map/while/unstack__69:0
ERROR: ModelImporter.cpp:120 In function parseGraph:
[5] Assertion failed: ctx->tensors().count(inputName)

Any idea what is going on? trtexec and the model I’m feeding it are attached:

trtverbose.txt (49.4 KB)
out.onnx (9.1 MB)

Looking at it a bit more, it appears that the input tensor in that version of Centernet is connected to a subgraph, so the code in the efficient det example is not clearing it properly, in particular in this loop: TensorRT/create_onnx.py at c2668947ea9ba4c73eb1182c162101f09ff250fd · NVIDIA/TensorRT · GitHub

I tried recursing into that subgraph and clearing the input, but that failed with

[ONNXRuntimeError] : 10 : INVALID_GRAPH : This is an invalid model. Error in Node:StatefulPartitionedCall/map/while_loop : Node (map/while/TensorArrayV2Read/TensorListGetItem) has input size 0 not in range [min=2, max=2].

Any help is appreciated, I’m not familiar with ONNX format at all.

Having removed the IF node, I’m running into the following 2 issues:

  1. Resize op only supports "floor"nearest mode
ERROR: builtin_op_importers.cpp:2523 In function importResize:                                                                                                                                                    
[8] Assertion failed: (mode != "nearest" || nearest_mode == "floor") && "This version of TensorRT only supports floor nearest_mode!"  
  1. GatherND is not implemented in TRT
INVALID_ARGUMENT: getPluginCreator could not find plugin GatherND version 1

After adjusting the model architecture to remove some GatherNDs (from the multiclass score output branch) and replacing the remaining ones with Gather and hardcoding batch size of 1, I was able to create an ONNX model that trtexec could convert to an engine.

Inference time is 91ms (see centernetProfile.json (30.6 KB)) on a jetson nano with the 512x512 inputs. I have not checked whether the outputs are correct though.

Hi oryjkov

I am getting the same error for GatherND operator in tensorrt 8 for jetson nano. my onnx model has been generated using tensorflow 2.x and onnx with opset 11
Can you please explain the exact procedure you followed to remove GatherND layer ?

Unfortunately I didn’t keep my hacky code to make TF’s CenterNet run on Jetson, since I decided to stick to Mobilenetv2 SSD instead.

Regarding the GatherND problem, perhaps you will find this helpful:

One thing I noticed was that the TF centernet model used to have Gather nodes instead of GatherND, until this commit . So you could try going before that commit.

The other thing is to try seeing what GatherND does and change it to Gather. I think that if you have a batch size of 1 (which was enough for my use case), then it is not difficult. Try to understand what GatherND is doing, maybe google for Gather vs gatherND.

Here are the notes I took when I did the conversion:

  • Remove all GatherND from the model - modify the centernet meta arch so that all tf.gather_nd nodes are replaced with tf.gather with batch_dims=0. I’m assuming batch_size=1. (modified center_net_meta_arch.py)
  • remove multiclass_scores output as I didn’t need it and that branch contained some GatherNDs
  • Use graph surgeon to remove the image resizing and replace it with RGB normalization and NCHW reorder (modified create_onnx.py from the efficientdet sample)
  • use graph surgeon to make resize nodes use the floor mode
1 Like

@oryjkov as you have decided to stick to mobilenetv2 SSD.

This model has a GatherND node, how did you replace it with Gather.

If you can share the code snippet will be helpful