The FPS is only around 5 fps, which is rather low. Could someone tell me if my code has something wrong or not. If the GPU is activated at full speed, in my opinion, yolov3 on TX2 could reach more than 15 fps.
• Hardware Platform (Jetson / GPU) Jetson TX2 • DeepStream Version 5.0 • JetPack Version (valid for Jetson only) 4.4 • TensorRT Version aligned with 4.4 • NVIDIA GPU Driver Version (valid for GPU only) aligned with 4.4
I use a pruned yolov3 model. It works fine, but the fps is low. I will start measure the performance you mentioned. The fp16 mode behaves the same, with no obvious distinction on FPS. The same code on Titan V is around 30fps, also not normal.
when you run trtexec to profile the inference time of the FP32 and FP16 trt engine, please “–dumpProfile” option to dump the layer time so that we can find out why fp16 perf is almost the same as fp32 perf.
@mchi I did, but it does not work out. The compilation of TRT OSS is always problematic.
I tried to pinpoint the pipeline that drags down fps: I use usb cam only without RTSP, and the fps is also low, so I speculate the problem stems from the usb cam pipeline in deepstream-test1-usb-cam. If I abandon deepstream and just run yolov3 (also reading usb cam as input), the fps is normal with both pruned and non-pruned yolov3 model (the pruned model has 2x fps than non-pruned model).
I am highly suspicious that the usb cam app has bug in its pipeline.