TF-TRT even make the input size of each layers are known, static model create also not successful

hello, I have tried to make my model’s input layer default input from (None, None, 1) to (1408, 960, 1 ), in order to make all the layers input is known and create a static model in TF-TRT
My model is retinaet model backbone is resnet-50, but when I use the TF-TRT API to create the static model, there are some problems happen. the log is :

2019-12-23 18:25:24.610102: W tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:737] TensorRT node gn2a_branch2c/TRTEngineOp_69 added for segment 69 consisting of 15 nodes failed: Internal: Input shapes must be fully defined when in static mode. Please try is_dynamic_op=True (shape was [?,32,?,?,8]). Fallback to TF…
2019-12-23 18:25:24.610109: W tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:737] TensorRT node gn2a_branch2c/TRTEngineOp_70 added for segment 70 consisting of 4 nodes failed: Internal: Input shapes must be fully defined when in static mode. Please try is_dynamic_op=True (shape was [?,32,?,?,8]). Fallback to TF…
2019-12-23 18:25:24.610115: W tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:737] TensorRT node gn2b_branch2a/TRTEngineOp_71 added for segment 71 consisting of 15 nodes failed: Internal: Input shapes must be fully defined when in static mode. Please try is_dynamic_op=True (shape was [?,32,?,?,2]). Fallback to TF…

It means that I also have a lot of place’ input shapes are not defined , and only 17 TRTEngineOp generate, but if I choose the (is_dynamic_op=True, there are 204 TRTEngineOp generate), why these kind of thing happen? my code of TF-TRT is as below:

converter = trt.TrtGraphConverter(
input_graph_def=frozen_graph,
nodes_blacklist=return_elements, #output nodes
max_batch_size=32,
is_dynamic_op=False,
max_workspace_size_bytes=1<<30,
precision_mode=trt.TrtPrecisionMode.FP32,
minimum_segment_size=1,
maximum_cached_engines=100)
trt_graph = converter.convert()

and my model is :

model.inputs
model.outputs
model.summary()

[<tf.Tensor ‘input_1:0’ shape=(?, 1408, 960, 1) dtype=float32>]
[<tf.Tensor ‘filtered_detections/map/TensorArrayStack/TensorArrayGatherV3:0’ shape=(?, 300, 4) dtype=float32>, <tf.Tensor ‘filtered_detections/map/TensorArrayStack_1/TensorArrayGatherV3:0’ shape=(?, 300) dtype=float32>, <tf.Tensor ‘filtered_detections/map/TensorArrayStack_2/TensorArrayGatherV3:0’ shape=(?, 300) dtype=int32>]


Layer (type) Output Shape Param # Connected to

input_1 (InputLayer) (None, 1408, 960, 1) 0


padding_conv1 (ZeroPadding2D) (None, 1414, 966, 1) 0 input_1[0][0]


conv1 (Conv2D) (None, 704, 480, 64) 3136 padding_conv1[0][0]


gn_conv1 (GroupNormalization) (None, 704, 480, 64) 128 conv1[0][0]


conv1_relu (Activation) (None, 704, 480, 64) 0 gn_conv1[0][0]


pool1 (MaxPooling2D) (None, 352, 240, 64) 0 conv1_relu[0][0]


res2a_branch2a (Conv2D) (None, 352, 240, 64) 4096 pool1[0][0]


gn2a_branch2a (GroupNormalizati (None, 352, 240, 64) 128 res2a_branch2a[0][0]


res2a_branch2a_relu (Activation (None, 352, 240, 64) 0 gn2a_branch2a[0][0]


padding2a_branch2b (ZeroPadding (None, 354, 242, 64) 0 res2a_branch2a_relu[0][0]


res2a_branch2b (Conv2D) (None, 352, 240, 64) 36864 padding2a_branch2b[0][0]


gn2a_branch2b (GroupNormalizati (None, 352, 240, 64) 128 res2a_branch2b[0][0]


res2a_branch2b_relu (Activation (None, 352, 240, 64) 0 gn2a_branch2b[0][0]


res2a_branch2c (Conv2D) (None, 352, 240, 256 16384 res2a_branch2b_relu[0][0]


res2a_branch1 (Conv2D) (None, 352, 240, 256 16384 pool1[0][0]


gn2a_branch2c (GroupNormalizati (None, 352, 240, 256 512 res2a_branch2c[0][0]


gn2a_branch1 (GroupNormalizatio (None, 352, 240, 256 512 res2a_branch1[0][0]


res2a (Add) (None, 352, 240, 256 0 gn2a_branch2c[0][0]
gn2a_branch1[0][0]


res2a_relu (Activation) (None, 352, 240, 256 0 res2a[0][0]


res2b_branch2a (Conv2D) (None, 352, 240, 64) 16384 res2a_relu[0][0]


gn2b_branch2a (GroupNormalizati (None, 352, 240, 64) 128 res2b_branch2a[0][0]


res2b_branch2a_relu (Activation (None, 352, 240, 64) 0 gn2b_branch2a[0][0]


padding2b_branch2b (ZeroPadding (None, 354, 242, 64) 0 res2b_branch2a_relu[0][0]
.
.
.
.
.
.
.
.
.
.
boxes (RegressBoxes) (None, 1125520, 4) 0 anchors[0][0]
regression[0][0]


classification_submodel (Model) multiple 4573120 P3[0][0]
P4[0][0]
P5[0][0]
P6[0][0]
P7[0][0]


clipped_boxes (ClipBoxes) (None, 1125520, 4) 0 input_1[0][0]
boxes[0][0]


classification (Concatenate) (None, 1125520, 24) 0 classification_submodel[1][0]
classification_submodel[2][0]
classification_submodel[3][0]
classification_submodel[4][0]
classification_submodel[5][0]


filtered_detections (FilterDete [(None, 300, 4), (No 0 clipped_boxes[0][0]
classification[0][0]

Total params: 44,381,604
Trainable params: 44,381,604
Non-trainable params: 0

what should I do if I want to create a static model?
Thank you in advance

1 Like

Hi,

For static mode, TensorRT requires all shapes in the model to be fully defined.
But even if you are using dynamic mode, an engine can be reused for a new input, if:

  • the engine batch size is greater than or equal to the batch size of new input, and
  • the non-batch dims match the new input

If your input dimension is not going to vary, same engine will be reused even if dynamic mode is used for graph generation.
Only additional initialization time is required during 1st run as compared to static mode.

The argument “maximum_cached_engines” can be used to control how many engines will be stored at a time.

Please refer to below link for more details:
https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#static-dynamic-mode

Thanks