Low FPS for Frcnn model

truffle · December 13, 2021, 5:57am

Please provide complete information as applicable to your setup.

• Hardware Platform (Jetson / GPU) Tx2
• DeepStream Version 5.1
• JetPack Version (valid for Jetson only) 4.5.1
• TensorRT Version 7.1.3
• NVIDIA GPU Driver Version (valid for GPU only)
• Issue Type( questions, new requirements, bugs)
• How to reproduce the issue ? (This is for bugs. Including which sample app is using, the configuration files content, the command line used and other details for reproducing)
• Requirement details( This is for new requirement. Including the module name-for which plugin or for which sample application, the function description)

Hi, currently i have two trained object detection models (yolov4 and frcnn). Both is of TLT ver 3.0. i managed to export both models of data type fp32 and also managed to run both models in the deepstream sdk. However, the problem is that fps average fps timing for the frcnn model is around 1.5 sec while the average fps timing for the yolov4 model is around 14sec. the model size for the yolov4 model is 544x960 while the model size for the frcnn model is 540x960. Was wondering if this is common or is there an issue with my work. If it is an issue, can u help advise on how to solve it? Here are my codes, engine files and configuration settings if required…
pgie_frcnn_tlt_config.txt (2.1 KB)
frcnn_resnet_18.epoch74_fp32.etlt_b1_gpu0_fp32.engine (72.3 MB)
deepstream_frcnn_drone_v5.py (12.7 KB)

truffle · December 13, 2021, 6:00am

i couldnt upload my encodded video file as the file size is above 100 mb

AastaLLL · December 13, 2021, 6:57am

Hi,

Would you mind uploading the data to an online drive and sharing the link with us?

Please note that we have a newer Deepstream 6.0 package.
It’s always recommended to use our latest software for the best performance.

Here are two things want to check with you first.
1. Have you maximized the device performance first?

$ sudo nvpmodel -m 2
$ sudo jetson_clocks

2. Could you double-check if the TensorRT engine is re-generated every launch?
Or it de-serializes the model directly.

Thanks.

truffle · December 13, 2021, 7:12am

Hello, thank you for your reply. I am unable to convert to deepstream 6.0 as there is the deadline for this project is coming to an end. Yes i did try to maximize the device performance but the FPS timing for the frcnn model remains unchanged. Whereas for the TensorRT engine, it does not re-generated every launch as i commented out the tlt-encoded-model variable in the configuration file. Here is the link to the encoded video file: https://drive.google.com/drive/folders/1FbgJQHZNzawtbMUTj9EQ1XsvwKdnLroR?usp=sharing
Thank you so much for your help

AastaLLL · December 20, 2021, 3:35am

Hi,

We want to reproduce this issue in our environment to check it in-depth.
Would you mind sharing the frcnn_resnet_18.epoch74_fp32.etlt model with us as well?

Thanks.

truffle · December 20, 2021, 3:39am

Here is the exported model
frcnn_resnet_18.epoch74_fp32.etlt (47.4 MB)

AastaLLL · December 21, 2021, 6:31am

Hi,

We saw you have improved the performance of the YOLOv4 case.
Is the fix also works for the FRCNN case?

Thanks.

truffle · December 21, 2021, 6:42am

The fix doesnt work for the Frcnn model. I have tried running the frcnn model through a .h264 video file instead of doing a live recording, the processing rate still remains at ~1fps.

AastaLLL · December 28, 2021, 8:11am

Hi,

We run your model with TensorRT through tao-converter.

$ ./tao-converter -k nvidia_tlt -d 3,540,960 frcnn_resnet_18.epoch74_fp32.etlt -t fp32 -e tmp.engine
$ /usr/src/tensorrt/bin/trtexec --loadEngine=tmp.engine

The model is relatively complicated and it takes around 489.346 ms to finish one inference.
So the ~1fps performance sounds reasonable after including all the components in the pipeline.

[12/28/2021-08:00:34] [I] === Performance summary ===
[12/28/2021-08:00:34] [I] Throughput: 2.04354 qps
[12/28/2021-08:00:34] [I] Latency: min = 485.931 ms, max = 496.887 ms, mean = 489.338 ms, median = 488.046 ms, percentile(99%) = 496.887 ms
[12/28/2021-08:00:34] [I] End-to-End Host Latency: min = 485.94 ms, max = 496.895 ms, mean = 489.346 ms, median = 488.053 ms, percentile(99%) = 496.895 ms
[12/28/2021-08:00:34] [I] Enqueue Time: min = 2.13623 ms, max = 2.2998 ms, mean = 2.22013 ms, median = 2.21877 ms, percentile(99%) = 2.2998 ms
[12/28/2021-08:00:34] [I] H2D Latency: min = 0.301758 ms, max = 0.308838 ms, mean = 0.305035 ms, median = 0.305237 ms, percentile(99%) = 0.308838 ms
[12/28/2021-08:00:34] [I] GPU Compute Time: min = 485.62 ms, max = 496.578 ms, mean = 489.029 ms, median = 487.739 ms, percentile(99%) = 496.578 ms
[12/28/2021-08:00:34] [I] D2H Latency: min = 0.00292969 ms, max = 0.00537109 ms, mean = 0.00452881 ms, median = 0.0045166 ms, percentile(99%) = 0.00537109 ms
[12/28/2021-08:00:34] [I] Total Host Walltime: 4.89347 s
[12/28/2021-08:00:34] [I] Total GPU Compute Time: 4.89029 s
[12/28/2021-08:00:34] [I] Explanations of the performance metrics are printed in the verbose logs.
[12/28/2021-08:00:34] [I]
&&&& PASSED TensorRT.trtexec [TensorRT v8001] # /usr/src/tensorrt/bin/trtexec --loadEngine=tmp.engine
[12/28/2021-08:00:34] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 692, GPU 5988 (MiB)

Below are some improvements you can try:

Use fp16 mode instead
Apply pruning at the training time.

Thanks.

truffle · December 28, 2021, 9:09am

Hi I did run apply pruning when training my model and also tried exporting in fp16 mode. However, for my fp16 model, the inference runtime still takes ~2fps which is still pretty slow, was wondering is there any other way to speed up the fps timing

truffle · December 28, 2021, 9:11am

I have another model Yolov4, trained on the same data, and the model size is relative similar ( 544, 960). Both the fp16 Yolov4 and frcnn model run at the same deepstream code, the fps for yolov4 is ~16fps but for the frcnn is ~2fps. May i know the reason why?

AastaLLL · December 29, 2021, 6:16am

Hi,

That’s because FRCNN is a relatively complicated model.
The inference time depends on the depth and layer type used in the model.

For pruning, below is document for your reference:
https://docs.nvidia.com/tao/archive/tlt-20/tlt-user-guide/text/pruning_model.html

Thanks.

system · January 25, 2022, 2:16am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Why I can't get 40 FPS for TLT YOLOv3 ResNet18 FP16 in 320x320? DeepStream SDK tensorrt , performance	7	845	October 12, 2021
GPU frame rate maxes when the GPU util isn't at max DeepStream SDK	6	1008	November 9, 2021
Deepstream yolov4 process multiple streams is slow DeepStream SDK	7	1387	November 30, 2021
Low fps when doing object detection on jetson nano Jetson Nano jetson-inference	19	9095	March 1, 2022
Yolov8 nvinferserver fp16 not working DeepStream SDK	8	1012	September 20, 2023
Instructions to integrate TAO 3.0 YoloV4 model into DeepStream produce no output on Jetson NX DeepStream SDK	10	404	December 5, 2023
Yolov3 fps rather low on TX2 DeepStream SDK	7	635	October 12, 2021
What kind of hardware rigs can support 100+ videos analytics using deepstream? DeepStream SDK hw	30	1838	October 12, 2021
Sudden high latenty in deepstream DeepStream SDK deepstream	17	75	May 27, 2025
Low FPS for pruned tao toolkit models on deepstream DeepStream SDK	30	105	August 1, 2024

Low FPS for Frcnn model

Related topics