AGX Orin - Optimisation of GPU usage

jdelvaux · April 11, 2024, 7:11am

Hello,

I’m using:

Nvidia Jetson AGX Orin Dev Kit
Jetpack 6 DP
Deepstream pipeline
Inference with Triton Server 2.43 (docker image)

I’m using a segmentation model developed in my project and I’m getting about 50-55FPS. I’d like to understand how I can enhance the performance because I noticed that GPU (through jtop) is used around 20%.

I used Nsight systems to investigate my CPU/GPU usage. I noticed a low GPU average usage (about 20-25%) but I have some trouble to identify what is causing the GPU to be under-utilized → i’d like to identify why the GPU is resting :

nsight_extract

The nsight report is around 2Gb.

That’s the first time I’m using nsight so any tips will be welcome. I did watch some webinar on it so I understand how the tool work but I’m unable to draw conclusions.

It would be really great to have a little help from you to continue my work.

AastaLLL · April 12, 2024, 3:05am

Hi,

Which backend do you use? TensorRT, PyTorch, or TensorFlow?
Would you mind attaching the nsys output so we can check the details to get more info?

Thanks.

jdelvaux · April 12, 2024, 6:23am

Hello,

I’m using TensorRT.
Here is the link

AastaLLL · April 18, 2024, 5:17am

Hi,

We tried to download the file but failed several times.
Could you help us re-check it?

Thanks.

jdelvaux · April 18, 2024, 7:13am

Here is a new link

AastaLLL · April 22, 2024, 9:25am

Hi,

Thanks a lot for re-uploading the file.

Confirmed that we can download and open the file.
Will share more info with you later.

AastaLLL · April 23, 2024, 9:02am

Hi,

Could you share more about the model and use case?
Based on your nsys file, the NVDEC takes 16.3~16.7ms seems to be the bottleneck of the pipeline.

Thanks.

jdelvaux · April 23, 2024, 9:12am

Hi AastaLLL,

Could you share more about the model and use case?

The model is a custom segmentation model. The use case is human/object detection.

Based on your nsys file, the NVDEC takes 16.3~16.7ms seems to be the bottleneck of the pipeline.

Thanks for the analysis.
I’m using a video file for that is encoded in 1080p 30fps (h264).
How can I make it run faster and use less time in decoding ?

AastaLLL · April 24, 2024, 6:45am

Hi,

Based on our spec here, h264 decoding can reach higher throughput.
Which nvpmodel do you use? Could you maximize the device’s performance and try it again?

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

Could you also turn off the Triton inference and test the decoding fps separately?
We also have some recommendations for performance optimization.
Please also give it a try.

https://docs.nvidia.com/metropolis/deepstream/6.3/dev-guide/text/DS_ref_app_deepstream.html#performance-optimization

Thanks.

jdelvaux · April 24, 2024, 6:51am

Hello,

Which nvpmodel do you use? Could you maximize the device’s performance and try it again?

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

Yes, I’ve already done that.

Could you also turn off the Triton inference and test the decoding fps separately?

What do you mean ? Using gstreamer only ? Or doing the inference ?

AastaLLL · April 25, 2024, 8:08am

Hi,

You should be able to turn on/off a component by modifying the configure file.
For example:

https://github.com/NVIDIA-AI-IOT/deepstream_triton_model_deploy/blob/master/faster_rcnn_inception_v2/config/source1_primary_faster_rcnn_inception_v2.txt

...
[primary-gie]
enable=0

Thanks.

jdelvaux · April 26, 2024, 7:47am

I’m not sure how it is changing anything since the inference is done after the decoding of video.

Anyway, if I use the config file you referenced, I got this error:

Creating nvstreammux
Creating nvinfer
Could not find group property
** ERROR: <gst_nvinfer_parse_config_file:1382>: failed.

AastaLLL · April 30, 2024, 4:18am

Hi,

Could you share your config/sample/model with us so we can check it further?
Please also share the reproducible steps as well.

Thanks.

jdelvaux · May 2, 2024, 12:22pm

Sorry for the delay.
I find the issue and I was able to perform the test.

As a reference, here is the config file I used:
config_file.txt (2.0 KB)
And here is link to the report

I checked and it is much faster.

To be able to share the code that I used for my request, how can I transfer it privately ? The project is under NDA.

jdelvaux · May 8, 2024, 6:45am

Hello,

Do you have news for me ?

jdelvaux · May 21, 2024, 7:59am

Hello,

Can I please get an update ?

AastaLLL · May 21, 2024, 9:26am

Hi,

Sorry for the late update.

In the config file you shared, the primary-gie is disabled.
Do you get a much faster result based on it?

Are you able to share the code with a privacy message or need some extra agreement?
Thanks.

jdelvaux · May 21, 2024, 9:32am

Yes, it was really fast. The report is in the post from May 2.

Are you able to share the code with a privacy message or need some extra agreement?

Privacy message should be fine. How do we get in touch through there ?

AastaLLL · May 21, 2024, 10:03am

Hi,

Are you able to measure the time without inference?
Since we originally thought the bottleneck comes from NVDEC, this should not change (~16ms) even if disabling the primary-gie.

The primary message is also a topic that can add comments so it should be fine (discussion).

Thanks.

jdelvaux · May 21, 2024, 2:12pm

I’ll try it.
I sent you a PM with the files so you can try to reproduce.

Topic		Replies	Views
Jetson AGX Orin 64 GB source30_1080p_dec_infer-resnet_tiled_display_int8 deepstream expected FPS DeepStream SDK deepstream	6	204	May 28, 2024
Jetson AGX Orin GPU Usage Jetson AGX Orin performance	3	2937	June 14, 2022
Inference slow even using TensorRT Jetson AGX Orin tensorrt	15	1926	November 6, 2023
How to Improve Performance of Semantic Segmentation on Jetson DeepStream SDK	3	629	August 2, 2023
Deepstream DeepStream SDK jetson-inference , deepstream	6	55	March 8, 2025
Tensorflow running very slow on Nvidia Jetson AGX Orin Jetson AGX Orin tensorflow	3	81	March 4, 2025
FPS performance drop issue on AGX Orin DeepStream SDK	11	1117	July 6, 2022
Can't run nvcr.io/nvidia/l4t-tensorrt:r8.2.1-runtime on Orin AGX Jetson AGX Orin tensorrt	19	1223	May 13, 2022
Extremely slow inference in TensorRT for live semantic segmentation model Jetson AGX Xavier tensorrt , tensorflow , jetson-inference	11	4419	April 12, 2022
Deepstreamer Pipeline: Optimisation GPU Utilisation DeepStream SDK gstreamer , fps , deepstream	22	185	December 12, 2024

AGX Orin - Optimisation of GPU usage

Related topics