YOLOV3 example in DeepStream-Triton Integration

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) T4
• DeepStream Version docker image nvcr.io/nvidia/deepstream:5.1-21.02-triton
• JetPack Version (valid for Jetson only) n/a
• TensorRT Version 7.2.1
• NVIDIA GPU Driver Version (valid for GPU only) 450.51.06

• Issue Type( questions, new requirements, bugs)
Is there some example to properly configure the deployment of YOLOV3 model with DeepStream-Triton?. I used the sample included in the docker image objectDetector_Yolo to generate the engines and later on deployed the optimized TRT INT8 engines with DS-Triton 5.1, but the throughput went down, and even different results with the below modes :

Throughput FPS (avg) | INT8 | BS=1

  • Running TensorRT engine with DeepStream 5.1: 292
  • Running TensorRT engine in standalone mode (trtexec): 201
  • Running TennsorRT engine with DeepStream-Triton 5.1: 93

Also, why in the config files said batch size-size=1 is recommended?, I need to deploy the model with dynamic batching, so batch size must be BS>1

Throughput FPS (avg) | INT8 | BS=8
I run a second test with BS=8 vs BS=1 and it exposed poor performance 0.24X , running TensorRT engine with DeepStream 5.1:

BS =1 → **PERF: 246.29 (245.98)
BS =8 → **PERF: 60.31 (60.63)

Sorry for the late response, have you managed to get issue resolved?

hi @kayccc, in regards to the BS=1 vs BS=8 differences, I think the problem is how I am generating the TRT INT8 model with trtexec for dynamic batching. Is there some sample to convert a yolov model to TensorRT INT8 engine (extra int8 calibration needed) with dynamic batching?. I am trying to replicate this pipeline onnx->tensorrt-int8 (dynamic batching)

Suspect the triton in DS does not support dynamic shape… will check and get back to you.

Hi @mchi / @kayccc , I need to deploy the model in INT8 mode with dynamic batching on DS-Triton, but the YOLOV4 example in DeepStream says Following properties are always recommended: # batch-size(Default=1)

As you said, first we need to double-check if DS-Triton supports dynamic batching. Secondly, is there a sample that shows how to optimize a YOLOv4 Pytorch-ONNX to TensorRT engine INT8 mode with full INT8 calibration and dynamic input shapes?. I have generated the TensorRT engine INT8 mode in runtime with DeepStream but it doesn’t generate the engine with dynamic input shapes. Also using trtexec it built the engine with static input shapes (default BS=1) and doesn’t provide calibration capability.

Hi @virsg ,
DS 5.0 nvinferserver does not support dynamic batch, but DS5.1 supports.
So, could you upgrade to DS5.1 to take a try?

Thanks!

Hi @mchi, this particular reported issue is already using DS5.1, please see above. It seems DS5.1 does not support dynamic shape or batch size >1 either.

Hi @virsg ,
I think you have got what you want in
YOLOV4- DS-TRITON | Configuration specified max-batch 4 but TensorRT engine only supports max-batch 1, right?

Thanks!

Hi @mchi, thanks for your follow-up, I have further, could you please take a look at YOLOV4- DS-TRITON | Configuration specified max-batch 4 but TensorRT engine only supports max-batch 1.

Thanks!