Slow FPS using SSD-Mobilenetv2

Running the objectDetector_SSD on deepstream, I was able to achieve 20fps on fp16 mode whcih is less than the 39 fps promised on official docs. I do use the recommended power supply, boosted the clocks and running in maximum performance mode. Although I was able to get around 37 fps follwoing the official bench marking instructions.

I converted the weights from http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v2_coco_2018_03_29.tar.gz.

Is there anyway to get 39fps on SSD-Mobilenetv2 using deepstream

P.S: However I noticed that the weights that I converted are 60mb which is 40mb higher than ones found in the official bench marking instructions. The weights over there were downloaded from https://nvidia.box.com/shared/static/8oqvmd79llr6lq1fr43s4fu1ph37v8nt.gz.I was not able to get the deepstream-app working in the latter.

Hi sivashiere96, ~20 FPS is the expected performance on Nano for the 90-class MS COCO SSD-Mobilenet-v2 model (see here).

39 FPS is the expected performance on the 37-class PETS SSD-Mobilenet-v2 model (see here).

So if you do not require the full 90 classes (which is probably many classes for a typical analytics application), you could re-train the model with fewer classes and achieve higher performance.

Thanks you dusty_nv. Is there any official documentation on how to do this. I have been trying to convert my model trained using tensorflow to be deployed in deepstream. However I’m not able to do it as the application keeps running into some errors. This is what is happening when I try to convert and run the application. https://devtalk.nvidia.com/default/topic/1066726/jetson-nano/convert-ssd-mobilenet-to-uff/post/5402769/#5402769. I think nvidia should give an official TLT training in docker as the many people get the jetson nano based on the ssd mobilenet benchmarks that has been advertised prominently .

Aasta will follow-up in your other topic to hopefully get your model successfully converted. In the meantime, if you haven’t already you may also want to check out these links:

Isn’t it an attempt to present the device in a positive light deceiving the customers?

Figure 1 shows results from inference benchmarks across popular models available online.

The models available online are of 90 classes, not 37.

I bought Jetson Nano due to the SSD-Mobilenet-v2 benchmark of 39 FPS that has been advertised by Nvidia. The inference actually takes 40 ms, therefore the theoretical maximum is only 25 FPS, while the practical is even less.

Thus, the Jetson Nano is not much faster than its competitors.

Hi @asmirnou, the results from the Nano benchmarks page use the same models across the different platforms, so in that case all devices were using the 37-class PETS model.

In practice the 90-class COCO model isn’t often deployed in real-world scenarios, it has too many classes for a practical application. It is more likely that the model would be re-trained with the classes used by the application, which would improve performance by reducing the number of object classes.