How to create a multi-batch secondary model when the number of primary instances is unknown?

I have a primary-gie and a secondary-gie for inference so when multiple primary instances were found, the same amount of secondary inference were triggered. As a result, the secondary-gie took significantly more time than the primary-gie even though the secondary-gie utilizes a smaller model.

I tried creating a secondary model with batch-size 2 but it ended up producing very bad results. So how can I improve the latency of secondary-gie?

My setup is the following:

Jetson Xavier
DeepStream 5.0
JetPack 4.4
TensorRT 7.1.3
NVIDIA GPU Driver Version 10.2

Hi,
The below link might be useful for you
https://docs.nvidia.com/deeplearning/tensorrt/best-practices/index.html#thread-safety

https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__STREAM.html
For multi threading/streaming, will suggest you to use Deepstream or TRITON
For more details, we recommend you to raise the query to the Deepstream or TRITON forum.

Thanks!

thanks! I’ll raise it in the deepstream forum