Invalid argument: Input shapes are inconsistent on the batch dimension, for TRTEngineOp_0(for secondary inference))

Please provide complete information as applicable to your setup.

• Hardware Platform: RTX 2080
• DeepStream Version: 5.0
• NVIDIA GPU Driver Version: 440.33.01
Hi,
I wanted to run my primary and secondary models adding parameters:

optimization { execution_accelerators {
gpu_execution_accelerator : [ {
name : “tensorrt”
parameters { key: “precision_mode” value: “FP16” }
parameters { key: “max_workspace_size_bytes” value: “512000000”}
}]
}}

Both the models work fine without the above added accelerates, with rest of the config and pipeline untouched. In both the cases(with and without the portion of config added)the primary model has the batch size set to 2 and the secondary model with batch size set to 16 given an error.

The primary model runs fine with better throughput but if the accelerator set for secondary models gives error:

Invalid argument: Input shapes are inconsistent on the batch dimension, for TRTEngineOp_0.

This topic related to TF-TRT optimization. TF-TRT may not support all layers optimization to TRT cache. Suggestions
a. try batch-size 1 to see whether TF-TRT can work.
b. if a can work, it’s likely some layer cannot suppose multi-batch in TF-TRT. Workaround is like to tune the segment_size to see if the layer can be passed.
parameters { key: "minimum_segment_size" value: "30"}
c. if a doesn’t work, reference to GitHub - NVIDIA-AI-IOT/tf_trt_models: TensorFlow models accelerated with NVIDIA TensorRT to re-export a offline TF models. sometimes this can be helpful for multi-batch and also for performance.

Hi @bcao,
Could not follow up on the issue as I did not have the 2080 GPU for the period.

I tried the above suggestions and yes it did work for batch_size 1 so I tried adjusting the minimum_segment_size value but seems like the total graph gets converted below a given value otherwise none of the node changes. For ex in my case if I set the minimum_segment_size > 15 it does not change the graph and if I set minimum_segment_size <= 15 it throws the error.
I converted the graph seperately to TF-TRT graph but If I use it with same config.pbtxt it throws the following error-

Executor failed to create kernel. Invalid argument: The TF function for the TRT segment could not be empty
[[{{node TRTEngineOp_0}}]]
ERROR: infer_trtis_server.cpp:202 TRTIS: failed to get response status, trtis_err_str:INTERNAL, err_msg:The TF function for the TRT segment could not be empty
[[{{node TRTEngineOp_0}}]]
ERROR: infer_trtis_backend.cpp:515 TRTIS server failed to parse response with request-id:0 model:
ERROR: infer_trtis_backend.cpp:359 failed to specify dims after running inference failed on model:sgie_fp16, nvinfer error:NVDSINFER_TRTIS_ERROR

I could not find any proper type for platform for tf-trt graph so I had set that to tensorflow_graphdef.

Any further help will be appretiated.

We will check internally

Hey, Can you crate a topic under Deep Learning (Training & Inference) - NVIDIA Developer Forums and/or create an issue on Issues · NVIDIA-AI-IOT/tf_trt_models · GitHub

Hi @bcao,
I got my model optimized offline and it worked in deepstream pipeline. Although the original model with changes to the config.pbtxt still does not work.

The following post(section about optimization) was helpful for the conversion of FP32 TF model to optimized FP16 TF-TRT graph:
https://developer.nvidia.com/blog/deploying-models-from-tensorflow-model-zoo-using-deepstream-and-triton-inference-server/

Great, so can we close this topic and create a new topic as my last comment if you still need help.

Hi @bcao,
Yes, we can close the topic. Will raise a topic if further assistance is required.

Thanks.