Benchmarking Deepstream custom pipeline and ai models

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) :Jerson Orin NX
• DeepStream Version :Deepstream 7.1
• JetPack Version (valid for Jetson only) Jetpack 6.2
• TensorRT Version: TensorRT 10.3.0
• NVIDIA GPU Driver Version (valid for GPU only)
• Issue Type( questions, new requirements, bugs) : Benchmarking performance
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
**• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

**
Hi, I have developed a custom DeepStream C++ application on a Jetson Orin NX. The pipeline uses three CSI cameras (IMX219) connected via a custom daughterboard that splits the cameras into two lanes of four; out of these, three cameras are used.

Each camera operates at 3280 × 2464 resolution, 21 FPS, with a 160° wide field of view. The pipeline runs two primary inference models:

  • People detection using PeopleNet

  • Vehicle detection using TrafficCamNet

Both primary models are followed by secondary inference models. For people detection, secondary models include ReID, crowd analytics, pose estimation, and vandalism detection. For vehicle detection, a classification model is applied.

Additionally, the pipeline:

  • Splits the streams to record each camera independently

  • Publishes an RTMP stream to OvenMediaServer

This describes the overall pipeline architecture.

I would now like to benchmark the performance of this pipeline as well as the overall Jetson system performance. Are there any recommended tools or methodologies to collect performance metrics such as:

  • Inference performance

  • End-to-end pipeline throughput

  • Latency

  • Frame drops

  • GPU, CPU, and memory utilization

Any guidance on performance profiling and benchmarking for this setup would be greatly appreciated.

  1. “Inference performance”, are you talking about the whole inferencing speed of the pipeline or the inferencing speed of the models?
  2. “Latency”: are you talking about the end-to-end frame latency?
  3. “Frame drops”: There are different components in the pipeline which can drop frames, which components are you talking about?
  4. “GPU, CPU, and memory utilization”: There is “jtop” tool for monitor CPU, GPU, memory usage … A method to install jtop on Thor without --break-system-packages

From the whole pipeline view, the fps is the normal way we use to measure the performance of the whole pipeline. DeepStream SDK FAQ - Intelligent Video Analytics / DeepStream SDK - NVIDIA Developer Forums

Hi,

  1. I am using the TAO model that comes prebuilt with the board. I would like to understand the benchmark details for this model and also learn how to conduct benchmarking to evaluate the model’s performance within my pipeline.

  2. Regarding latency, I would like to measure the end-to-end (E2E) frame latency, as well as the latency introduced specifically at the inference stage.

  3. I am referring to overall frame drops across the entire pipeline. I would like to know how many frames are being dropped end-to-end and, if possible, identify which pipeline element is responsible for the drops.

  4. Thank you, I will look into it.

The single model performance can be measured by “trtexec” with the generated TensorRT engine, please refer to the TensorRT document. For model performance within pipeline, if there is only one model, the model TensorRT engine performance will not change. If there are multiple models in one pipeline, since the multiple model engines will share the GPU resources, the model performance for each model will be impacted by the others, and from the pipeline point, only the total performance is meaningful.

The E2E latency depends on the actual use case and how many components and resources are involved in your use case. E.G. If you input a RTSP video source to a DeepStream inferencing pipeline, the ethernet transferring latency should also be considered into the E2E latency while this part is not controlled by DeepStream.
DeepStream only provides the tool to measure DeepStream component latency. DeepStream SDK FAQ - Intelligent Video Analytics / DeepStream SDK - NVIDIA Developer Forums

It also depends on your actual use case, the DeepStream plugins can work with many third party GStreamer plugins to construct different pipelines, and just a few DeepStream plugins can drop frames, E.G. nvv4l2decoder, etc. The other DeepStream plugin never drop frames. It is easy for you to get the frame drop data from DeepStream plugins. But for the third party GStreamer plugins, you need to refer to their functions and document.

Okay, thank you for the support. We are planning to launch this product to customers as a camera streaming solution. Before doing so, we would like to benchmark the system’s limitations and overall performance.

Are there any additional metrics that you would recommend capturing and evaluating before deployment? Any guidance or support on this would be very helpful.

Regarding latency and frame drops, we are using directly connected CSI cameras. Network latency is observed on the sink side when the stream is sent to the server, and we can monitor those logs on the server. However, we would like to measure the latency occurring within the device itself—from the camera input up to the RTMP sink.

Thank you for the details and your continued support.

From DeepStream point, the way we measure the performance is introduced in Performance — DeepStream documentation

There is no update from you for a period, assuming this is not an issue anymore. Hence we are closing this topic. If need further support, please open a new one. Thanks.