NSight with AGX Orin and Deepstream + Triton

clutgen · June 11, 2025, 1:34pm

Hello,

I am trying to understand why there are always 4 processes of GstNvInferServer in parallel using deepstream on AGX Orin.

I have the same behaviour with the plugin GstNvInfer.

Here is the .nsys-rep file of one trial.
deepstream_profile_2_instances.zip (6.4 MB)

What ever the value in the config.pbtxt, it doesn’t change anything :

instance_group [
  {
    kind: KIND_GPU
    count: 1
  }
]

Thank you

yuweiw · June 12, 2025, 2:09am

Could you attach your whole pipeline and the version of DeepStream?

clutgen · June 16, 2025, 7:35am

As I can’t share my own pipeline because of NDA, I tried to reproduce the same behaviour using the repo deepstream_python_apps.

I have used this example : deepstream_python_apps/apps/deepstream-test1-rtsp-out/deepstream_test1_rtsp_out.py at master · NVIDIA-AI-IOT/deepstream_python_apps · GitHub

And modified it accordingly :

...
pgie = Gst.ElementFactory.make("nvinferserver", "primary-inference")
...
pgie.set_property('config-file-path', "pgie_config_triton_grpc.txt")

Deepstream 7.1

Here is the nvinferserverconfig file :
pgie_config_triton_grpc.txt (879 Bytes)

If I launch the command : nsys profile --trace=cuda,nvtx,osrt --output=deepstream_profile python deepstream_test_1_rtsp_out.py -i /opt/nvidia/deepstream/deepstream/samples/streams/sample_720p.h264

I got these results :

deepstream_profile_test_1_rtsp_out_triton.zip (8.3 MB)

You can see the same behaviour here.

Thanks for your help.

yuweiw · June 17, 2025, 8:59am

That’s timeline view. In fact, there is only one thread. You can check that with the Show In Events View, they are all in one TID.

You can also check our source code below.

sources\gst-plugins\gst-nvinferserver\gstnvinferserver.cpp
static GstFlowReturn gst_nvinfer_server_submit_input_buffer(
    GstBaseTransform* btrans, gboolean discont, GstBuffer* inbuf) {

clutgen · June 17, 2025, 11:53am

I agree with you, but why are there 4 batches that are handled in parallel while sometimes there is only one ?

If you take the previous example (deepstream_test1_rtsp_out.py), what is the bottleneck ? based on the nsys analysis profile.

Thank you

yuweiw · June 18, 2025, 3:20am

What bottleneck do you need to solve in your scenario? You can also get the latency of each plugin to check the bottleneck. Please refer to our Enable Latency measurement for deepstream sample apps.

clutgen · June 20, 2025, 8:51am

I have made some tests on my side, and I would like to understand what is causing the following drop in performance.

Here is my setup :

AGX Orin1 (deepstream pipeline - example deepstream_test_1_rtsp_out.py + triton server 25.05)
AGX Orin 2 (triton server 25.05)
Switch 10Gbit/s in between

I run the example using a 120fps 1920x1080 h264 video

TEST 1 - Only AGX Orin 1 (pipeline + triton on the same device)
The pipeline is running fine and the actual FPS is 120 fps

Here is the nsys file :
deepstream_test_1_rtsp_out_120fps_local.zip (7.8 MB)

TEST 2 - AGX Orin 1 (pipeline) + AGX Orin 2 (triton)
It is exactly the same config as TEST 1, except the url of the triton server in config.txt file for the nvinferserver plugin.
The pipeline FPS drops to around 93 fps

Here is the nsys file :
deepstream_test_1_rtsp_out_120fps_remote.zip (8.3 MB)

TEST 3 - AGX Orin 1 (perf_analyzer) + AGX Orin 2 (triton)
Here is the command :

perf_analyzer -m trafficcamnet -i grpc -u 192.168.121.2:8001 --concurrency-range 1:8 -b 1

and the results :

...
Request concurrency: 3
  Client: 
    Request count: 2575
    Throughput: 119.757 infer/sec
    Avg latency: 21630 usec (standard deviation 3304 usec)
    p50 latency: 21279 usec
    p90 latency: 25983 usec
    p95 latency: 27480 usec
    p99 latency: 30475 usec
    Avg gRPC time: 21591 usec ((un)marshal request/response 715 usec + response wait 20876 usec)
  Server: 
    Inference count: 2574
    Execution count: 2574
    Successful request count: 2574
    Avg request latency: 7235 usec (overhead 398 usec + queue 98 usec + compute input 1080 usec + compute infer 5030 usec + compute output 628 usec)
...

As you can see, the perf_analyzer can reach the 120 fps with the remote triton server. Therefore, it would suggest it is not a network issue.

How do you explain that there is a drop in FPS between local and remote triton server ? As it doesn’t seem to be a network issue because of the successful perf_analyzer results.

Thank you.

yuweiw · June 24, 2025, 3:13am

Could you please do the following tests separately to help analyze this issue?

Set the instance_group count to 3 and try
Use the perf_analyzer to get the perf data in the Only AGX Orin 1 case
Try to set the buffer-pool-size to 8 for the nvstreammux plugin and check if that can improve the perf.

clutgen · July 15, 2025, 8:32am

instance_group was already set to 5, but yes it has an impact if I reduce to 1 or 2.

In my case, I think that between tests, jetson_clocks were not always enabled (between reboot), that was causing the drop in performance.
I have made a service to set them automatically at boot. Since then, I see more consistance in performances.

yuweiw · July 15, 2025, 9:25am

Glad to hear that. If possible, you could share your method about how to made a service to set them automatically at boot, so that others can refer to that. Thanks

clutgen · July 15, 2025, 9:30am

Enable Jetson-clocks

sudo /usr/bin/jetson_clocks

You can create a service for this :

sudo nano /etc/systemd/system/jetson_clocks.service

[Unit]
Description=Jetson Clocks Service
After=multi-user.target

[Service]
Type=oneshot
ExecStart=/usr/bin/jetson_clocks
RemainAfterExit=true

[Install]
WantedBy=multi-user.target

sudo systemctl daemon-reload
sudo systemctl enable jetson_clocks.service

system · July 29, 2025, 9:31am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Deepstream 5.1 inference caps at 30fps DeepStream SDK	10	794	November 23, 2021
Deepstream-test3 4 video files bandwidth bottleneck DeepStream SDK gstreamer	7	633	October 12, 2021
Orin AGX runs deepstream example lagging DeepStream SDK	22	973	August 4, 2022
Nsight systems output for deepstream DeepStream SDK	4	576	November 7, 2023
Agx Orin - Triton inference DeepStream SDK deepstream	17	663	May 6, 2024
50 CCTV cameras video analysis DeepStream SDK	11	1043	October 12, 2021
How to maximize inferences/sec in a deepstream pipeline DeepStream SDK	13	1204	October 12, 2021
Performance about nvinfer and nvinferserver DeepStream SDK	6	1533	March 22, 2022
Model ran much slower in deepstream pipeline DeepStream SDK	2	779	October 12, 2021
Some question about Deep stream 5 DeepStream SDK	42	2137	October 12, 2021

NSight with AGX Orin and Deepstream + Triton

Enable Jetson-clocks

Related topics