Low performance while running model on DLA0, DLA1, and GPU at the same time on Jetson AGX Orin 64 GB

callbhargavp · November 3, 2022, 6:00am

Hello
We want to run a model on DLA0, DLA1, and GPU on Nvidia Jetson AGX Orin 64GB. We are following the below thread for the implementation.
Link:- DLA and GPU cores at the same time - #17 by angless

Once we run the inference on all DLA0, DLA1, and GPU, we are observing a very low throughput but the utilization is maximum.
Can someone suggest what could be the reason for the drop in throughput?

AastaLLL · November 3, 2022, 6:15am

Hi,

Have you maximized the device performance first?

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

Thanks.

callbhargavp · November 3, 2022, 6:17am

Yer. The device is at maximum capacity.

AastaLLL · November 4, 2022, 2:14am

Hi,

Could you share the benchmark data with us as well?
Thanks.

callbhargavp · November 4, 2022, 4:43am

Sure. I have attached the screenshot which includes FPS and Jetson Power GUI information.

AastaLLL · November 7, 2022, 7:57am

Hi,

Have you tried to infer the model with INT8 mode?
If not, would you mind giving it a try?

More, could you try to enlarge the CUDA wait queue size to see if it helps?
https://docs.nvidia.com/deploy/mps/index.html#topic_5_2_4

export CUDA_DEVICE_MAX_CONNECTIONS=32

Thanks.

system · November 29, 2022, 6:48am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

ramc · February 14, 2023, 3:59pm

Also check out the DLA github page for samples and resources or to report issues: Recipes and tools for running deep learning workloads on NVIDIA DLA cores for inference applications.

We have a FAQ page that addresses some common questions that we see developers run into: Deep-Learning-Accelerator-SW/FAQ

Topic		Replies	Views
The Throughput is too slow in Nvidia jetson AGX ORin DLA Jetson AGX Orin cuda , cudnn , dla	4	479	January 31, 2024
Keys to optimization a network on AGX Orin DLA for latency Jetson AGX Orin tensorrt , dla	2	821	October 6, 2023
Getting less throughput while enabling DLAs on Jetson AGX Orin Jetson AGX Orin dla	5	758	February 23, 2023
Compute time in DLA slower than expected Jetson AGX Orin dla	5	906	July 28, 2023
Run AI models completely on Jetson AGX Orin DLAs Jetson Nano dla	4	370	April 20, 2024
DLA performance less (around half) than what's expected Jetson AGX Orin dla	5	46	December 9, 2024
Power consumption for GPU after offloading to DLA Jetson AGX Orin power , dla	7	405	April 17, 2024
GeMM performance on Orin DLA Jetson AGX Orin tensorrt , cuda , jetson-inference	10	880	February 21, 2024
The power consumption of DLA on orin is much higher than that of GPU？ Jetson AGX Orin tensorrt	5	424	October 24, 2023
DLA-v2 is slower than DLA-v1 Jetson AGX Orin tensorrt , jetson-inference	8	2476	July 6, 2022

Low performance while running model on DLA0, DLA1, and GPU at the same time on Jetson AGX Orin 64 GB

Related topics