Hello everyone.
For some time now I have been testing the performance of some models in the object detection activity.
I tested the models on different types of videos with different quality and different FPS numbers. What happens to me is that large models get the same number of FPS on all videos. On the other hand, if I use a small model, I get a lower FPS number on low quality videos and a higher FPS number on high quality videos.
Why does this happen?
I state that I use TensorRT as an optimizer.
Thank you.
Hi,
Would you mind sharing more details about your comparison?
Do you use trtexec?
Which model, precision and data format do you use?
It will be good if you can share a sample and model to reproduce the issue directly.
Thanks.
Thanks for the reply.
For my tests I use the SSD-Mobilenet-V1 model, which already exists within the repository, within DetectNet. The overall size of this model is around 30.7 MB.
Using the GetNetworkFPS () function, on several videos, I get a certain inference rate.
Now, if I went to modify the model (SSD-MobileNet-V1) with a simplified version, having dimensions equal to 3.5 MB, the number of FPS returned is obviously higher than the previous ones. What I can’t understand is why I get more FPS on high quality videos than on low quality videos.
I state that TensorRT is used on both models (i work with “.engine” extension).
I am attaching an image in which the comparison is shown. The red bars indicate the FPS number of the larger model, while the green bars indicate the number of FPS obtained on the simplified model.
Videos (V) and webcams (W) have the following resolutions:
-V1: 240p_60fps.mp4,
-V2: 360p_30fps.mp4,
-V3: 480p_30fps.mp4,
-V4: 720p_30fps.mp4,
-V5: 1080p_30fps.mp4,
-V6: 1080p_60fps.mp4,
-W1: 720p_60 fps,
-W2: 1080p_30 fps
As for precision, I think it’s FP16.
I hope I have been detailed.
Thank you.
Hi,
In general, the network input size is fixed on training time. (ex. 300x300 for SSD-Mobilenet-V1)
For each input resolution, the pre-processing will first rescale it to network input size for inference.
So the performance of a model usually depends on the size of the network mainly.
Based on your image, we expect the performance like the red one (large model)
Do you find any difference in the detected objects among the videos?
It’s possible that the performance is affected by the rendering steps.
Thanks.
Thanks for the reply.
The results I showed you are obtained without using the following commands:
$ sudo nvpmodel -m 0
$ sudo jetson_clocks
After executing these commands, the results obtained are totally different. On each video I get an increase of about 280% compared to the number of fps obtained on the large model.
How come I get a similar result on different quality videos?
Thank you.
Hi,
Thanks for the update.
That’s because the network size is fixed. (ex. 300x300)
So for the different input videos, the first step is to downscale the resolution into the network size.
(ex. 1080p → 300x300, 720p → 300x300)
Since the main computational part comes from the inference.
The executing time will be very similar on the same input network resolution.
Thanks.
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.