YOLOv3 on AGX Xavier - How to increase FPS?

Hi everybody,

for real-time object detection I installed Jetpack 4.2.3 (including Deepstream and TensorFlow) and YOLOv3 / darknet (https://github.com/AlexeyAB/darknet) with

GPU=1
CUDNN=1
OPENCV=1
CUDNN_HALF=1

on Jetson AGX Xavier.

To reach a higher framerate I reduced input width and height in yolov3.cfg and set on MaxN Power Mode.
For the input size I tested some values between 32 to 416, and yolov3 reaches 30 FPS at 288 x 288 or lower.

No my question is, if there are others ways how the framerate could be increased. For example in the following post is written, that sbd had around 50 FPS.

https://devtalk.nvidia.com/default/topic/1049402/deepstream-sdk/deepstream-yolo-app-performance-vs-tensor-core-optimized-yolo-darknet/post/5395416/#5395416

I read that in some networks is the possibility to change from FP32 to FP16 or INT8. Is there a posiibility to do that in yolov3?

Or does someone has other ideas how I could reach more FPS?

(I know I could change to tiny-yolov3, but I have to use yolov3)

I am thankful for any ideas, because I am a bit confused with understanding everything.

Hi,

It’s recommended to give our DeepstreamSDK a try first.
You can find the YOLO sample in /opt/nvidia/deepstream/deepstream-4.0/sources/objectDetector_Yolo/.

Different from the darknet pipeline, Deepstream have optimized the buffer/architecure for the Jetson platform.
It will give you a much better performance.

More, you can also check this topic to update the network size from 608 to 416 for acceleration:
https://devtalk.nvidia.com/default/topic/1064871/deepstream-sdk/deepstream-gst-nvstreammux-change-width-and-height-doesn-t-affect-fps/post/5392823/#5392823
Then, you can also inference the model in INT8 mode for extra performance.

Thanks.

Hi AastaLL,

at first, thanks you for your suggestion. I completed all instructions written in README at objectDetector_Yolo. Now it is possible to start YOLov3 via deepstream.

Now I would like to use a webcam (Microsoft LifeCam HD-3000) for real time detection.

I saw, that in deepstream_app_config_yoloV3.txt at line 42 is the reference for the video, which is used by the yolo-app:

[source0]
enable=1
#Type - 1=CameraV4L2 2=URI 3=MultiURI
type=3
uri=file://../../samples/streams/sample_1080p_h264.mp4
num-sources=1
gpu-id=0

Do you now which line I have to add instead of:

uri=file://../../samples/streams/sample_1080p_h264.mp4

I can see my camera at

//dev/video0

Is it right, that I also have to set:

type=1

Are there some more settings I have to change for real time detection?

Thanks again for your help!

Hi,

YES.

Please set the type=1 for a USB camera and also set the camera-v4l2-dev-node information.

[source0]
enable=1
#Type - 1=CameraV4L2 2=URI 3=MultiURI
type=1
camera-width=1280
camera-height=720
camera-fps-n=30
camera-fps-d=1
camera-v4l2-dev-node=0  # for /dev/video0

Thanks.

Hi AstaaLL,

thanks for this reply, it was really helpful. The webcam works fine and YOLOv3 runs.

Now I want to point back to my initial question, which was to reach higher fps with YOLOv3 on the AGX.
YOLOv3 runs with deepstream at around 26 FPS. I varied the network size to 416, 320 and 160. All gave the same FPS (please see attached picures).
Could it be, that maybe I missed to change something in the .cfg file, in config_infer_primary_yoloV3.txt or in deepstream_app_config_yoloV3.txt. (See also attached files)

In comparison if I use darknet, changing the input size brings a huge difference in the output FPS. When I use 416, YOLO reaches around 20 FPS. But when I reduce the size to 320, YOLO reaches around 27 FPS and with an input of 160 it reaches 31 FPS. (see also attached files)

Do you have an idea what I could have done wrong, because right now, I don’t see any improvement of using Deepstream instead of Darknet.

Thanks again!







deepstream_app_config_yoloV3.txt (3.68 KB)
config_infer_primary_yoloV3.txt (3.13 KB)
yolov3_Deepstream.cfg.txt (8.15 KB)
yolov3_Darknet.cfg.txt (8.17 KB)

Hi,

Please remember to delete the previous TensorRT engine file.

TensorRT will convert the YOLO model into TensorRT engine when the first-time launch.
To save compiling time, the generated engine will be serialized into disk for the next time launch.

So if you update the network architecture, please make sure to compile the engine from the model rather than de-serializing.
The easiest way is to delete the engine file each time you launch a new architecture.

model-engine-file=model_b1_int8.engine

Thanks.