Yolov3 fps rather low on TX2

Dear all,
I intend to use TX2 running yolov3 via deepstream with USB camera, and the sink is via rtsp. The code looks like this:

deploy.txt (13.8 KB)

The FPS is only around 5 fps, which is rather low. Could someone tell me if my code has something wrong or not. If the GPU is activated at full speed, in my opinion, yolov3 on TX2 could reach more than 15 fps.

• Hardware Platform (Jetson / GPU) Jetson TX2
• DeepStream Version 5.0
• JetPack Version (valid for Jetson only) 4.4
• TensorRT Version aligned with 4.4
• NVIDIA GPU Driver Version (valid for GPU only) aligned with 4.4

could you share your pgie config - config.txt ?
Did you the model on fp16 mode?
You could refer to measure the inference perf - GitHub - NVIDIA-AI-IOT/deepstream_tao_apps: Sample apps to demonstrate how to deploy models trained with TAO on DeepStream

And, the YoloV3 provided by TLT could have better inference perf.

Thanks!

Of course. The config file is:

[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
#0=RGB, 1=BGR
model-color-format=0
custom-network-config=prune_32_shortcut_yolov3.cfg
model-engine-file=model_full_gpu0_fp32.engine
labelfile-path=labels.txt
model-file=prune_32_shortcut_sparse-yolov3-full-mAP48.1.weights
#int8-calib-file=yolov3-calibration.table.trt7.0

network-mode=0
num-detected-classes=80
gie-unique-id=1
network-type=0
is-classifier=0

cluster-mode=2
maintain-aspect-ratio=1
parse-bbox-func-name=NvDsInferParseCustomYoloV3
custom-lib-path=nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
engine-create-func-name=NvDsInferYoloCudaEngineGet
#scaling-filter=0
#scaling-compute-hw=0

[class-attrs-all]
nms-iou-threshold=0.3
threshold=0.7

I use a pruned yolov3 model. It works fine, but the fps is low. I will start measure the performance you mentioned. The fp16 mode behaves the same, with no obvious distinction on FPS. The same code on Titan V is around 30fps, also not normal.

when you run trtexec to profile the inference time of the FP32 and FP16 trt engine, please “–dumpProfile” option to dump the layer time so that we can find out why fp16 perf is almost the same as fp32 perf.

How to add “–dumpProfile” option:
https://elinux.org/TensorRT/PerfIssues#trtexec

Hi @HIVE,
Did you check the perf?

@mchi I did, but it does not work out. The compilation of TRT OSS is always problematic.

I tried to pinpoint the pipeline that drags down fps: I use usb cam only without RTSP, and the fps is also low, so I speculate the problem stems from the usb cam pipeline in deepstream-test1-usb-cam. If I abandon deepstream and just run yolov3 (also reading usb cam as input), the fps is normal with both pruned and non-pruned yolov3 model (the pruned model has 2x fps than non-pruned model).

I am highly suspicious that the usb cam app has bug in its pipeline.

Hi @HIVE,
Sorry for delay!
Please try changing “network-mode=0” to “network-mode=1” in your pgie configuration.

Thanks!