How to use GPU + 2 DLA can be 100FPS for YoloV3 on Xavier


Continues this topic

In this link:, report inferencing FPS is close to 100 FPS of YOLO-V3(608x608) on AGX Xavier with TensorRT.(Figure 3)

we try more method in this issue topic link
, but not solve this problem, so we create new topic focus on “How to use GPU + 2 DLA can be 100FPS for YoloV3 on Xavier”.

So far we have tried to use GPU+2DLA follow command:

sudo nvpmodel -m 0
sudo jetson_clocks

Terminal 1 command:

./trtexec --onnx='yolov3.onnx' --workspace=26 --int8 --useSpinWait --iterations=100

Terminal 2 command:

./trtexec --onnx=yolov3.onnx --workspace=30 --int8 --useSpinWait --iterations=100 --useDLACore=0 --allowGPUFallback

Terminal 3 command:

./trtexec --onnx=yolov3.onnx --workspace=26 --int8 --useSpinWait --iterations=100 --useDLACore=1 --allowGPUFallback

and result of FPS in this figure

Can the official provide actual example code?
Because this problem has been going on for a long time.



It looks like the GPU performance drop once the DLA launched.
We are reproducing this issue and will update more information asap.


1 Like


Sorry for keeping waiting.

The performance drop is caused by some GPU resource is occupied by the DLA fallback layer.
We are checking this with our internal team. Will update more information with you later.


1 Like

Ok, we will wait for your reply!



Sorry for keeping you waiting.

We have a new software release (JetPack4.4 GA) and a benckmark script here:

Based on this script, we can get 1098fps on the YOLOv3-tiny with 416 resolution.
More detail, please check our latest benchmark report here:


1 Like

@AastaLLL I was able to make the benchmark scripts work!
How can I actually develop a customized Yolov3 (tiny or not) to achieve such FPS?
i.e. Run it on a video stream.

Hi mhk5,

Please help to open a new topic for your issue. Thanks