Please provide complete information as applicable to your setup.
• Hardware Platform (Jetson / GPU) GPU • DeepStream Version Docker 6.3 • JetPack Version (valid for Jetson only) • TensorRT Version • NVIDIA GPU Driver Version (valid for GPU only) • Issue Type( questions, new requirements, bugs) • How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing) • Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)
I’m using deepstream-app -c deepstream_app_config.txt and when I increase the batch-size in config_infer_primary_yoloV8.txt, which is offered by this repo. The primary_gie component latency doesn’t change and I’m confused about it. When the batch-size increases, shouldn’t the component latency decrease theoretically?
Therefore, I would like to know how DS uses batch in its inference? Is it feeding GstNvstreammux batch processed images serially to GstNvinfer or in parallel?
It is not true. The nvinfer batch-size only impact TensorRT inferencing time.
The nvinfer component is a GStreamer plugin, the pre-processing, inferencing and postprocessing are done inside nvinfer. They are asynchronizely done in different threads. DeepStream SDK FAQ - Intelligent Video Analytics / DeepStream SDK - NVIDIA Developer Forums
For yolov7 and yolov8, the postprocessing is relatively complicated, and if the postprocessing is done with CPU, it will take much longer time than TensorRT inferencing. So the nvinfer latency is mainly decided by the postprocessing but not TensorRT inferencing. And nvinfer batch size only impact TensorRT inferencing.
Depends on the nvinfer batch size and nvstreammux batch size. E.G. If you batch the frames with nvstreammux batch size 4 while you inferencing the batch with nvinfer batch size 1, the frames in the batch will be inferenced one by one.
Yeah, When the number of sources are equal to 8, I set the batch-size in nvstreammux to 8, and then change the batch-size in the config-file under PGIE from 1 to 8, when batch-size = 8, the component latency is almost unchanged. Could I understand the primary_gie component latency is not exactly yolov8n’s inference time for the 8 images in upstream nvstreammux?
And the PGIE component latency are composed by pre-processing, inferencing and postprocessing as you said, Without changing the nvstreammux batch-size, the elapsed time between pre-processing and post-processing should not change much. When the batch-size under the profile-file is larger, the inference elapsed time of the model should be less, shouldn’t the overall elapsed time of the PGIE component latency be less? I’m not sure if I’m understanding this correctly, so if there is a problem, I hope you can give me a more detailed explanation.
Meanwhile， If I just want to get the model pure reasoning time consuming, how could I do to get it?
Thanks again, sincerely looking forward to your reply.
I run the CLI as the repo offered, But it doesn’t look particularly right.
The video shows the results before post-processing, and Frame Latency and dequeueOutputBatch Time look worse than before, it doesn’t seem to have the desired effect。