Low performance while running model on DLA0, DLA1, and GPU at the same time on Jetson AGX Orin 64 GB

Hello
We want to run a model on DLA0, DLA1, and GPU on Nvidia Jetson AGX Orin 64GB. We are following the below thread for the implementation.
Link:- DLA and GPU cores at the same time - #17 by angless

Once we run the inference on all DLA0, DLA1, and GPU, we are observing a very low throughput but the utilization is maximum.
Can someone suggest what could be the reason for the drop in throughput?

Hi,

Have you maximized the device performance first?

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

Thanks.

Yer. The device is at maximum capacity.

Hi,

Could you share the benchmark data with us as well?
Thanks.

Sure. I have attached the screenshot which includes FPS and Jetson Power GUI information.

Hi,

Have you tried to infer the model with INT8 mode?
If not, would you mind giving it a try?

More, could you try to enlarge the CUDA wait queue size to see if it helps?
https://docs.nvidia.com/deploy/mps/index.html#topic_5_2_4

export CUDA_DEVICE_MAX_CONNECTIONS=32

Thanks.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.