We want to run a model on DLA0, DLA1, and GPU on Nvidia Jetson AGX Orin 64GB. We are following the below thread for the implementation.
Link:- DLA and GPU cores at the same time - #17 by angless
Once we run the inference on all DLA0, DLA1, and GPU, we are observing a very low throughput but the utilization is maximum.
Can someone suggest what could be the reason for the drop in throughput?
Have you maximized the device performance first?
$ sudo nvpmodel -m 0
$ sudo jetson_clocks
Yer. The device is at maximum capacity.
Could you share the benchmark data with us as well?
Sure. I have attached the screenshot which includes FPS and Jetson Power GUI information.
Have you tried to infer the model with INT8 mode?
If not, would you mind giving it a try?
More, could you try to enlarge the CUDA wait queue size to see if it helps?
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.