Please provide complete information as applicable to your setup.
• Hardware Platform (Jetson / GPU) Jetson AGX Orin • DeepStream Version 6.3 • JetPack Version (valid for Jetson only) 5.1.2
I have a Python code running a model that has been converted to tensorRT using torch2trt(). This runs at about 60 FPS. When I use trtexec to profile the model, it reports a throughput of about the same. What kind of improvement in throughput and/or end-to-end latency would I get if I move to running this model within the Deepstream framework? Can I get some estimate on this?
If the other modules you involved in the DeepStream pipeline are all faster than the TRT engine, the pipeline can run with 60 FPS. If there is any other module which is slower than 60 FPS, the pipeline may be impacted buy the slowest module.
I do not have it running in Deepstream yet. I want to know if I am going to get a significant improvement in throughput and/or end-to-end latency. Is there any way I can make it run faster than 60 FPS if it is in Deepstream or is 60 FPS the best I can get.
I mean how did you get the 60FPS data? Which modules are running in the python code? What is the pipeline of your python code? What is the configuration of your python code?
Thank you for your replies.
I have a detector, classifier and tracker in my pipeline. 60FPS is what is reported by trtexec for the detector. The classifier runs much faster and is hence not the bottleneck.
The Python code is running inference using Pytorch. The Python pipeline takes in camera frames from shared memory and performs inference on them. There is a detector and classifier in the pipeline along with tracking. When all these elements are introduced, the throughput for the detector drops to about 40 FPS. If I move this pipeline into Deepstream, will it give me any benefit?
Since we don’t know how did you implement your python code, we can’t tell you whether DeepStream will improve or not.
DeepStream can guarantee the GPU hardware buffers are sharing between different modules. DeepStream helps to accelerate the video operations such as decoding, scaling, inferencing, … by the Nvidia hardware and avoid buffer copy between components.