Yolov3 fps rather low on TX2

HIVE · July 25, 2020, 12:34pm

Dear all,
I intend to use TX2 running yolov3 via deepstream with USB camera, and the sink is via rtsp. The code looks like this:

deploy.txt (13.8 KB)

The FPS is only around 5 fps, which is rather low. Could someone tell me if my code has something wrong or not. If the GPU is activated at full speed, in my opinion, yolov3 on TX2 could reach more than 15 fps.

• Hardware Platform (Jetson / GPU) Jetson TX2
• DeepStream Version 5.0
• JetPack Version (valid for Jetson only) 4.4
• TensorRT Version aligned with 4.4
• NVIDIA GPU Driver Version (valid for GPU only) aligned with 4.4

mchi · July 26, 2020, 1:24pm

could you share your pgie config - config.txt ?
Did you the model on fp16 mode?
You could refer to measure the inference perf - GitHub - NVIDIA-AI-IOT/deepstream_tao_apps: Sample apps to demonstrate how to deploy models trained with TAO on DeepStream

And, the YoloV3 provided by TLT could have better inference perf.

Thanks!

HIVE · July 26, 2020, 1:45pm

Of course. The config file is:

[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
#0=RGB, 1=BGR
model-color-format=0
custom-network-config=prune_32_shortcut_yolov3.cfg
model-engine-file=model_full_gpu0_fp32.engine
labelfile-path=labels.txt
model-file=prune_32_shortcut_sparse-yolov3-full-mAP48.1.weights
#int8-calib-file=yolov3-calibration.table.trt7.0

network-mode=0
num-detected-classes=80
gie-unique-id=1
network-type=0
is-classifier=0

cluster-mode=2
maintain-aspect-ratio=1
parse-bbox-func-name=NvDsInferParseCustomYoloV3
custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
engine-create-func-name=NvDsInferYoloCudaEngineGet
#scaling-filter=0
#scaling-compute-hw=0

[class-attrs-all]
nms-iou-threshold=0.3
threshold=0.7

I use a pruned yolov3 model. It works fine, but the fps is low. I will start measure the performance you mentioned. The fp16 mode behaves the same, with no obvious distinction on FPS. The same code on Titan V is around 30fps, also not normal.

mchi · July 26, 2020, 1:55pm

when you run trtexec to profile the inference time of the FP32 and FP16 trt engine, please “–dumpProfile” option to dump the layer time so that we can find out why fp16 perf is almost the same as fp32 perf.

How to add “–dumpProfile” option:
https://elinux.org/TensorRT/PerfIssues#trtexec

mchi · July 29, 2020, 5:20am

Hi @HIVE,
Did you check the perf?

HIVE · July 29, 2020, 8:59am

@mchi I did, but it does not work out. The compilation of TRT OSS is always problematic.

I tried to pinpoint the pipeline that drags down fps: I use usb cam only without RTSP, and the fps is also low, so I speculate the problem stems from the usb cam pipeline in deepstream-test1-usb-cam. If I abandon deepstream and just run yolov3 (also reading usb cam as input), the fps is normal with both pruned and non-pruned yolov3 model (the pruned model has 2x fps than non-pruned model).

I am highly suspicious that the usb cam app has bug in its pipeline.

mchi · July 30, 2020, 10:54am

Hi @HIVE,
Sorry for delay!
Please try changing “network-mode=0” to “network-mode=1” in your pgie configuration.

Thanks!

Topic		Replies	Views
low FPS using deepstream SDK 4.0 on Jetson TX2 DeepStream SDK	3	537	October 12, 2021
Yolov5m in DeepStream is not work well DeepStream SDK	5	378	June 28, 2022
deepstream-yolo-app performance vs Tensor-Core optimized yolo-darknet DeepStream SDK	9	3632	October 12, 2021
Deepstream yolov4 process multiple streams is slow DeepStream SDK	7	1373	November 30, 2021
Yolov3's inference too heavy for Jetson Nano? Jetson Nano	3	670	October 15, 2021
Low FPS for Frcnn model DeepStream SDK	12	561	January 25, 2022
Deepstream 4 + yolov3 multi source slow DeepStream SDK	9	1816	October 12, 2021
Deepstream 6.0 Python Yolo bad performance DeepStream SDK	8	1668	December 28, 2021
Deepstream 4.0 on YoloV3(Webcam vs. Video) DeepStream SDK	4	1104	October 12, 2021
run yolov3-tiny with tensorRT model Jetson Nano	7	3403	January 4, 2020

Yolov3 fps rather low on TX2

Related topics