Increase FPS on Yolo4 model

Dear all,

Greetings ! This is my first project on Jetson Xavier development kit – JetPack 4.6 L4T 32.6.1. (installed 512 GB NVMeM.2 SSD). I am running Yolo 4 with Darknet on custom trained model. (Wall crack images). The model detects wall crack from live Tello drone video. Yolo4 custom model weights is 256 MB
During the inference Xavier running on Power Mode 30W - 6 core i am getting only 10 FPS. because of this low FPS there is some issue on detection. When run on 15W - Desktop mode, gives 7 to 8 FPS.** . To get reasonable accuracy at least 18 to 20 FPS or above needed. When i show this demo to my boss, he ask why such a low FPS and wonder the model running on Xavier CPU or GPU. (he has high hope for Jetson Xavier)

though my script executed without any specific instruction to run on - -gpus 0
as explained in this blog -

Here is data flow → Tello Drone Flying and send video to → Jetson Xavier - that video is feed into the → Yolo4 model during the inference (instead web came the video comes from external source). All run in single script ( ) with 4 threads

python3 --weightspath yolov4-custom_final.weights --configpath yolov4-custom.cfg --datapath /home/jetson/darknet/data/CrackDetection/

Running this same model on Windows Laptop - i7-10750H CPU @ 2.60GHz RTX 2060 (6GB) with i am getting 16 to 18 FPS.

So my question ,

  1. Is there any specific way to check /ensure the model runs on GPU itself on Jetson Xavier ?
  2. Is there any basic optimization rules to rule of thumb (Power Mode?) while running to heavy models like YOLO V4 ?
  3. or am i missing some basic points? Any suggestion is welcome.



1. The simplest way is to check GPU loading with tegrastats tool.
If the inference is well optimized, it’s expected to see the GPU utilization reach 99%. (GR3D_FREQ 99%@1377)

$ sudo tegrastats

2. It is recommended to use our TensorRT library for inference.
You can also use Deepstream to get a better-optimized pipeline on Jetson
Below is a sample for YOLOv4 for your reference:


Deep stream is interesting, As beginner and project schedule is short, so i may not time to try this time. On the link the, i saw to TWO methods to run the TensorRT sample of YOLOv4.
Is the YOLOv4 integrating with DeepStream 5.0 more straight forward and beginner friendly?

update: I have tried the YOLOV4-Tiny model-which I getting the 30-40 FPS and predication accuracy reasonable. ‘’


This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.