Failed to run dynamic batch in deepstream

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU)
Jetson Nano
• DeepStream Version
5.0.1
• JetPack Version (valid for Jetson only)
JetPack 4.4
• TensorRT Version
7.1.3

I created tensorrt engine from onnx file in deepstream, and following info printed:
INFO: [FullDims Engine Info]: layers num: 2
0 INPUT kFLOAT input.1 3x320x320 min: 1x3x320x320 opt: 4x3x320x320 Max: 4x3x320x320
1 OUTPUT kFLOAT 869 8400x5 min: 0 opt: 0 Max: 0

Therefore I think the dynamic input has been set in deepstream successfully, however when I started my deepstream app, it failed with this error:

ERROR: [TRT]: Reshape_13: reshaping failed for tensor: 441
ERROR: [TRT]: shapeMachine.cpp (160) - Shape Error in executeReshape: reshape would change volume
ERROR: [TRT]: Instruction: RESHAPE_ZERO_IS_PLACEHOLDER{2 116 40 40} {4 2 58 40 40}
ERROR: Failed to enqueue trt inference batch
ERROR: Infer context enqueue buffer failed, nvinfer error:NVDSINFER_TENSORRT_ERROR
0:04:33.074389733 4391 0x5569176c50 WARN nvinfer gstnvinfer.cpp:1251:gst_nvinfer_input_queue_loop:<primary_gie> error: Failed to queue input batch for inferencing
ERROR from primary_gie: Failed to queue input batch for inferencing
Debug info: gstnvinfer.cpp(1251): gst_nvinfer_input_queue_loop (): /GstPipeline:pipeline/GstBin:primary_gie_bin/GstNvInfer:primary_gie
Quitting

WIthin this deepstream app config, I set batchsize=4 and num-sources=4(cam mp4). My suppose is that because of infer delay there are only 3 cam sources came into this batch at that moment, so it failed to deal with this case.

Suggestions and Help needed. Thanks!

Hey, Can you run your TensorRT engine file using trtexec well?

How to use it? I never used trtexec before. I just views some docs, still not clear. How to run trt engine file directly with trtexec? More specifically, I dont know how to set input and run it. I just did: trtexec --loadEngine=xxx, it prints out the similar error:

More Info:
When I run this app(batch_size=4) on my PC, it runs well
When I set batch_size=1 on jetson nano, it also runs well

I added the parameter –batch=4, and then also the similar results given as above

Hey, can you share which specific command are you using and the whole log instead of the screenshot with me? For how to use trtexec, you can run trtexec --help to check the details

Hi, I used this command: /usr/src/tensorrt/bin/trtexec --loadEngine=trt.engine --batch=4
the whole log is attached here.
trtexec_log.txt (2.9 KB)

Hope this will help.

Thanks, would you mind to share your onnx file and the whole repro with us?

I just followed this official repo: GitHub - NVIDIA-AI-IOT/deepstream-occupancy-analytics: This is a sample application for counting people entering/leaving in a building using NVIDIA Deepstream SDK, Transfer Learning Toolkit (TLT), and pre-trained models. This application can be used to build real-time occupancy analytics applications for smart buildings, hospitals, retail, etc. The application is based on deepstream-test5 sample application.

the config and onnx files attached here.

ds_debug.zip (3.3 MB)

Can try to apply Deepstream 5.0 patches - #4 by bcao and try again?

What patches do you mean, where I can find it?

I just solved this problem by building trt network layer by layer. But it costs too much work.

Check the last comment in the topic I shared above.

Hello, Mr Cao, I have tried. It didnt work for my case. In my case, the input size(eg. 3) is smaller than the targeted batch size (eg.4 ), so it failed to go through the trt network. I will try to export onnx again.