Please provide complete information as applicable to your setup.
• Hardware Platform (Jetson / GPU) T4 • DeepStream Versiondocker image nvcr.io/nvidia/deepstream:5.1-21.02-triton • JetPack Version (valid for Jetson only) n/a • TensorRT Version 7.2.1 • NVIDIA GPU Driver Version (valid for GPU only) 450.51.06
• Issue Type( questions, new requirements, bugs)
Is there some example to properly configure the deployment of YOLOV3 model with DeepStream-Triton?. I used the sample included in the docker image objectDetector_Yolo to generate the engines and later on deployed the optimized TRT INT8 engines with DS-Triton 5.1, but the throughput went down, and even different results with the below modes :
Throughput FPS (avg) | INT8 | BS=1
Running TensorRT engine with DeepStream 5.1: 292
Running TensorRT engine in standalone mode (trtexec): 201
Running TennsorRT engine with DeepStream-Triton 5.1: 93
Also, why in the config files said batch size-size=1 is recommended?, I need to deploy the model with dynamic batching, so batch size must be BS>1
Throughput FPS (avg) | INT8 | BS=8
I run a second test with BS=8 vs BS=1 and it exposed poor performance 0.24X , running TensorRT engine with DeepStream 5.1:
hi @kayccc, in regards to the BS=1 vs BS=8 differences, I think the problem is how I am generating the TRT INT8 model with trtexec for dynamic batching. Is there some sample to convert a yolov model to TensorRT INT8 engine (extra int8 calibration needed) with dynamic batching?. I am trying to replicate this pipeline onnx->tensorrt-int8 (dynamic batching)
Hi @mchi / @kayccc , I need to deploy the model in INT8 mode with dynamic batching on DS-Triton, but the YOLOV4 example in DeepStream says Following properties are always recommended: # batch-size(Default=1)
As you said, first we need to double-check if DS-Triton supports dynamic batching. Secondly, is there a sample that shows how to optimize a YOLOv4 Pytorch-ONNX to TensorRT engine INT8 mode with full INT8 calibration and dynamic input shapes?. I have generated the TensorRT engine INT8 mode in runtime with DeepStream but it doesn’t generate the engine with dynamic input shapes. Also using trtexec it built the engine with static input shapes (default BS=1) and doesn’t provide calibration capability.
Hi @mchi, this particular reported issue is already using DS5.1, please see above. It seems DS5.1 does not support dynamic shape or batch size >1 either.