Object Detection with MobileNet-SSD slower than mentioned speed

dusty_nv · November 1, 2019, 1:51pm

I am using the same model in jetson-inference and also get 22FPS here with SSD-Mobilenet-v2 with detectnet-console and detectnet-camera, so not sure why it is running slower for zeyuchen2016.

zeyuchen2016, what camera are you using and at what resolution? Is your Nano running in 5W mode or 10W?

zeyuchen2016 · November 2, 2019, 12:19pm

In jetbot, But jut run ./detecnet_camera.

RAM 2264/3963MB (lfb 54x4MB) SWAP 54/4096MB (cached 3MB) IRAM 0/252kB(lfb 252kB) CPU [67%@921,63%@921,off,off] EMC_FREQ 6%@1600 GR3D_FREQ 94%@76 APE 25 PLL@24C CPU@27.5C iwlwifi@32C PMIC@100C GPU@26C AO@33C thermal@26.75C POM_5V_IN 2688/2688 POM_5V_GPU 120/120 POM_5V_CPU 560/560

Raspberry Pi Camerra v2 (IMX219 tensor)

jetbot@jetbot:~/test/jetson-inference/build/aarch64/bin$ ./detectnet-camera
[gstreamer] initialized gstreamer, version 1.14.5.0
[gstreamer] gstCamera attempting to initialize with GST_SOURCE_NVARGUS, camera 0
[gstreamer] gstCamera pipeline string:
nvarguscamerasrc sensor-id=0 ! video/x-raw(memory:NVMM), width=(int)1280, height=(int)720, framerate=30/1, format=(string)NV12 ! nvvidconv flip-method=2 ! video/x-raw ! appsink name=mysink
[gstreamer] gstCamera successfully initialized with GST_SOURCE_NVARGUS, camera 0

detectnet-camera:  successfully initialized camera device
    width:  1280
   height:  720
    depth:  12 (bpp)

TensorRT version 5.0.6-1+cuda10.0

head -n 1 /etc/nv_tegra_release

R32(release)…

jkjung13 · November 2, 2019, 12:59pm

RAM 2264/3963MB (lfb 54x4MB) SWAP 54/4096MB (cached 3MB) IRAM 0/252kB(lfb 252kB) CPU [67%@921,63%@921,off,off] EMC_FREQ 6%@1600 GR3D_FREQ 94%@76 APE 25 PLL@24C CPU@27.5C iwlwifi@32C PMIC@100C GPU@26C AO@33C thermal@26.75C POM_5V_IN 2688/2688 POM_5V_GPU 120/120 POM_5V_CPU 560/560

The CPU and GPU don’t seem to run at maximum speed on your Jetson Nano. Try to set Jetson Nano into MAX-N mode and re-run the test.

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

zeyuchen2016 · November 2, 2019, 1:22pm

RAM 2264/3963MB (lfb 54x4MB) SWAP 54/4096MB (cached 3MB) IRAM 0/252kB(lfb 252kB) CPU [67%@921,63%@921,off,off] EMC_FREQ 6%@1600 GR3D_FREQ 94%@76 APE 25 PLL@24C CPU@27.5C iwlwifi@32C PMIC@100C GPU@26C AO@33C thermal@26.75C POM_5V_IN 2688/2688 POM_5V_GPU 120/120 POM_5V_CPU 560/560

The CPU and GPU don’t seem to run at maximum speed on your Jetson Nano. Try to set Jetson Nano into MAX-N mode and re-run the test.
$ sudo nvpmodel -m 0
$ sudo jetson_clocks

Ok，I shutdown .

Run above commandline，The SSD-MobileNet-V2 speed up to 18-19FPS!

Could be faster？

zeyuchen2016 · November 2, 2019, 1:58pm

I have try this https://devtalk.nvidia.com/default/topic/1050377/jetson-nano/deep-learning-inference-benchmarking-instructions/ SSD-mobilenet-V2

Time taken for inference is 26.2917 ms.

dusty_nv · November 2, 2019, 3:41pm

By default, JetBot runs the Nano in 5W mode due to the battery power supply - so SSD detector would run slower in 5W mode vs 10W mode.

zeyuchen2016 · November 3, 2019, 1:29am

sudo nvpmodel -m 0
sudo nvpmodel -q

NV Power Mode:MAXN
0

SSD-mobilenet-v2 speed ~19FPS

In 5W mode,Only have ~14FPS

Another question is In Jetson Benchmarks | NVIDIA Developer,

SSD-mobilenet-v2 have three different input size 960544 480272 300*300, and have different speed.

But I run code in jetson-inference/detectnet-camera-2.md at master · dusty-nv/jetson-inference · GitHub

./detectnet-camera                             # using SSD-Mobilenet-v2, default MIPI CSI camera (1280x720)

and

./detectnet-camera --width=640 --height=480    # using SSD-Mobilenet-v2, default MIPI CSI camera (640x480)

their speed are the same as.

zeyuchen2016 · November 3, 2019, 11:30am

How to change my TF pb file to use in detectnet_camera?

WHat resources could be reference?

Thanks

Update1:

I follow this code [url]https://github.com/AastaNV/TRT_object_detection/blob/master/config/model_ssd_mobilenet_v2_coco_2018_03_29.py#L11[/url]

But,I could not understand why use graph.remove some node?It not support on TensorRT?

I use tmp.uff that generated in main.py, ./detectnet_camera --model=/path/to/tmp.uff ,But some errors reported.

Like Input shape not match…

Update2:

I see this TensorRT/samples/opensource/sampleUffSSD at v5.1.5 · NVIDIA/TensorRT · GitHub

But could not found convert-to-uff in jetson nano.

zeyuchen2016 · November 4, 2019, 1:37am

In jetston nano
sudo python3 convert_to_uff.py ~/test/TRT_object_detection/model/ssd_mobilenet_v2_coco_2018_03_29/frozen_inference_graph.pb -o hello.uff -O NMS -p /usr/src/tensorrt/samples/sampleUffSSD/config.py

Then

./detectnet-camera --model=./networks/hello.uff --class_labels=./networks/tmp/ssd_coco_labels.txt

[TRT]   TensorRT version 5.0.6
[TRT]   loading NVIDIA plugins...
[TRT]   completed loading NVIDIA plugins.
[TRT]   detected model format - UFF  (extension '.uff')
[TRT]   desired precision specified for GPU: FASTEST
[TRT]   requested fasted precision for device GPU without providing valid calibrator, disabling INT8
[TRT]   native precisions detected for GPU:  FP32, FP16
[TRT]   selecting fastest native precision for GPU:  FP16
[TRT]   attempting to open engine cache file ./networks/hello.uff.1.1.GPU.FP16.engine
[TRT]   cache file not found, profiling network model on device GPU
[TRT]   device GPU, loading /home/jetbot/test/jetson-inference/build/aarch64/bin/ ./networks/hello.uff
[TRT]   FeatureExtractor/MobilenetV2/Conv/Relu6: elementwise inputs must have same dimensions or follow broadcast rules (input dimensions were [1,32,150,150] and [1,1,1])
[TRT]   FeatureExtractor/MobilenetV2/expanded_conv/depthwise/depthwise: at least three non-batch dimensions are required for input
[TRT]   UFFParser: Parser error: FeatureExtractor/MobilenetV2/expanded_conv/depthwise/BatchNorm/batchnorm/mul_1: The input to the Scale Layer is required to have a minimum of 3 dimensions.
[TRT]   failed to parse UFF model './networks/hello.uff'
[TRT]   device GPU, failed to load ./networks/hello.uff

Any people could help me ?

How to get the uff which could be used in detectnet_camera?

Thank you very much!

jkjung13 · November 5, 2019, 9:24am

I have implemented video pipelining design in my TensorRT SSD demo program. The new code is ‘trt_ssd_async.py’. Comparing with my previous (non-async) implementation, FPS improved from 22.8 to 26 when I tested ssd_mobilenet_v1_coco on the huskies.jpg image!

$ python3 trt_ssd_async.py --model ssd_mobilenet_v1_coco \
                           --image \
                           --filename ${HOME}/project/tf_trt_models/examples/detection/data/huskies.jpg

Check out details in: https://github.com/jkjung-avt/tensorrt_demos

I took the “TRT_object_detection” example code (GitHub - AastaNV/TRT_object_detection: Python sample for referencing object detection model with TensorRT) and implemented a python program which could do real-time object detection using various input image/video sources.

The demo code converts the trained ssd_mobilenet_v1_coco model to UFF, then to TensorRT engine (bin). When I tested this TRT optimized ssd_mobilenet_v1_coco model on Jetson Nano (JetPack-4.2.2), the frame rate was ~22.8 fps which I think is very good.

I also tested my custom trained model, ssd_mobilenet_v1_egohands (hand detector). The frame rate was even higher (27~28 fps), but detection was not good…

https://youtu.be/3ieN5BBdDF0

Check out my GitHub repo (demo #3) for details: GitHub - jkjung-avt/tensorrt_demos: TensorRT MODNet, YOLOv4, YOLOv3, SSD, MTCNN, and GoogLeNet

grand_yanx · November 20, 2019, 2:14am

dusty_nv:

Dustin, how have you gotten SSD-Mobilenet-V2 to work in TensorRT? Do you have a sample somewhere?

Hi elias_mir, it was converted from a TensorFlow model to UFF. Here are the directions to run the sample:
Copy the ssd-mobilenet-v2 archive from here to the ~/Downloads folder on Nano.
$ cd ~/Downloads/
$ wget --no-check-certificate 'https://nvidia.box.com/shared/static/8oqvmd79llr6lq1fr43s4fu1ph37v8nt.gz' -O ssd-mobilenet-v2.tar.gz
$ tar -xvf ssd-mobilenet-v2.tar.gz
$ cd ssd-mobilenet-v2
$ sudo cp -R sampleUffSSD_rect /usr/src/tensorrt/samples
$ sudo cp sample_unpruned_mobilenet_v2.uff /usr/src/tensorrt/data/ssd/
$ sudo cp image1.ppm /usr/src/tensorrt/data/ssd/
Compile the sample
$ cd /usr/src/tensorrt/samples/sampleUffSSD_rect
$ sudo make
Run the sample to measure inference performance
$ sudo jetson_clocks
$ cd /usr/src/tensorrt/bin
$ sudo ./sample_uff_ssd_rect

Hi dusty,I do followed the guide,but the follow problem encountered

Compiling: sampleUffSSD.cpp
sampleUffSSD.cpp:22:15: error: ‘gLogger’ was declared ‘extern’ and later ‘static’ [-fpermissive]
 static Logger gLogger;
               ^~~~~~~
In file included from ../common/common.h:55:0,
                 from BatchStreamPPM.h:9,
                 from sampleUffSSD.cpp:12:
../common/logger.h:55:15: note: previous declaration of ‘gLogger’
 extern Logger gLogger;
               ^~~~~~~
../Makefile.config:173: recipe for target '../../bin/dchobj/sampleUffSSD.o' failed
make: *** [../../bin/dchobj/sampleUffSSD.o] Error 1

But when I comment out the 22 line like this"//static Logger gLogger;" in the sampleUffSSD.cpp，I encountered the follow problem again,
Can you help me,thanks!

Compiling: sampleUffSSD.cpp
Linking: ../../bin/sample_uff_ssd_rect_debug
../../bin/dchobj/sampleUffSSD.o: In function `loadModelAndCreateEngine(char const*, int, nvuffparser::IUffParser*, nvinfer1::IHostMemory*&)':
/usr/src/tensorrt/samples/sampleUffSSD_rect/sampleUffSSD.cpp:141: undefined reference to `gLogger'
/usr/src/tensorrt/samples/sampleUffSSD_rect/sampleUffSSD.cpp:141: undefined reference to `gLogger'
/usr/src/tensorrt/samples/sampleUffSSD_rect/sampleUffSSD.cpp:148: undefined reference to `gLogger'
/usr/src/tensorrt/samples/sampleUffSSD_rect/sampleUffSSD.cpp:148: undefined reference to `gLogger'
/usr/src/tensorrt/samples/sampleUffSSD_rect/sampleUffSSD.cpp:185: undefined reference to `gLogger'
../../bin/dchobj/sampleUffSSD.o:/usr/src/tensorrt/samples/sampleUffSSD_rect/sampleUffSSD.cpp:185: more undefined references to `gLogger' follow
collect2: error: ld returned 1 exit status
../Makefile.config:161: recipe for target '../../bin/sample_uff_ssd_rect_debug' failed
make: *** [../../bin/sample_uff_ssd_rect_debug] Error 1

grand_yanx · November 20, 2019, 2:29am

Hello:
The sample in the path /usr/src/tensorrt/samples/sampleUffSSD/ only can test one image,Can anyone help me test multiple images using this sample,I am not familiar with C++,tanks!!!

jkjung13 · November 20, 2019, 8:37am

My demo #3 (ssd) in jkjung-avt/tensorrt_demos GitHub repository is implemented purely in python. It already supports video file, image file or camera as input. Check out the links below:

https://github.com/jkjung-avt/tensorrt_demos
https://jkjung-avt.github.io/tensorrt-ssd/
https://jkjung-avt.github.io/speed-up-trt-ssd/

grand_yanx · November 21, 2019, 3:19am

Hi zeyuchen2016,I do followed the guide,but the follow problem encountered

Compiling: sampleUffSSD.cpp
sampleUffSSD.cpp:22:15: error: ‘gLogger’ was declared ‘extern’ and later ‘static’ [-fpermissive]
static Logger gLogger;
^~~~~~~
In file included from ../common/common.h:55:0,
from BatchStreamPPM.h:9,
from sampleUffSSD.cpp:12:
../common/logger.h:55:15: note: previous declaration of ‘gLogger’
extern Logger gLogger;
^~~~~~~
../Makefile.config:173: recipe for target '../../bin/dchobj/sampleUffSSD.o' failed
make: *** [../../bin/dchobj/sampleUffSSD.o] Error 1

But when I comment out the 22 line like this"//static Logger gLogger;" in the sampleUffSSD.cpp，I encountered the follow problem again,
Can you help me,thanks!

Compiling: sampleUffSSD.cpp
Linking: ../../bin/sample_uff_ssd_rect_debug
../../bin/dchobj/sampleUffSSD.o: In function `loadModelAndCreateEngine(char const*, int, nvuffparser::IUffParser*, nvinfer1::IHostMemory*&)':
/usr/src/tensorrt/samples/sampleUffSSD_rect/sampleUffSSD.cpp:141: undefined reference to `gLogger'
/usr/src/tensorrt/samples/sampleUffSSD_rect/sampleUffSSD.cpp:141: undefined reference to `gLogger'
/usr/src/tensorrt/samples/sampleUffSSD_rect/sampleUffSSD.cpp:148: undefined reference to `gLogger'
/usr/src/tensorrt/samples/sampleUffSSD_rect/sampleUffSSD.cpp:148: undefined reference to `gLogger'
/usr/src/tensorrt/samples/sampleUffSSD_rect/sampleUffSSD.cpp:185: undefined reference to `gLogger'
../../bin/dchobj/sampleUffSSD.o:/usr/src/tensorrt/samples/sampleUffSSD_rect/sampleUffSSD.cpp:185: more undefined references to `gLogger' follow
collect2: error: ld returned 1 exit status
../Makefile.config:161: recipe for target '../../bin/sample_uff_ssd_rect_debug' failed
make: *** [../../bin/sample_uff_ssd_rect_debug] Error 1

#51

grand_yanx · November 21, 2019, 5:45am

thanks very much,I’m following thses approach.

grand_yanx · November 21, 2019, 8:34am

Hi,jkjung!
I have see your guidance to install TensorFlow1.12.2 on Jetson Nano,follow this approach can I install TensorFlow1.10.1 ? I want to install TensorFlow1.10.1 on Nano first and then test whether it able to run the ssd_mobilenet_v2 demo.
Or can you provid the TensorFlow1.10.1 install guidance,thanks very much!！！！

jkjung13 · November 21, 2019, 9:05am

@grand_yanx, I suggest you to use tensorflow 1.12.x, as I stated in the README.md:

dusty_nv · November 21, 2019, 2:23pm

Hi grand_yanx, try just removing “static” as opposed to commenting out the entire line:

//static Logger gLogger;
Logger gLogger;

grand_yanx · November 22, 2019, 1:02am

Hi,jkjung13
Because my other system not the jetson series installed the tensorflow 1.10.1,I want to install the same version,if it’s not compatible by then，I install the tensorflow 1.12.x again,thanks!!

grand_yanx · November 22, 2019, 3:38am

dusty_nv:

But when I comment out the 22 line like this"//static Logger gLogger;" in the sampleUffSSD.cpp，I encountered the follow problem again,
Can you help me,thanks!

Hi grand_yanx, try just removing “static” as opposed to commenting out the entire line:
//static Logger gLogger;
Logger gLogger;

Hi,dusty_nv,I removing “static” as:Logger gLogger;then can run correctly,the average inference time is 27.7185 ms,but when I exchange the sample_unpruned_mobilenet_v2.uff to myself .uff file witch is converted from ssd_mobilenet_v2 model using the method “python3.6 /usr/lib/python3.6/dist-packages/uff/bin/convert_to_uff.py --input_file …”,the inference time becomes 39.8804 ms,I want to know how your sample_unpruned_mobilenet_v2.uff file converted or can you supply the method of how to generate the .uff file like sample_unpruned_mobilenet_v2.uff?

thanks very much!!!

Topic		Replies	Views
Deep Learning Inference Benchmarking Instructions Jetson Nano	134	47592	May 30, 2023
What almost everyone with a nano is looking for Jetson Nano	65	6217	October 15, 2021
Object Detection on GPUs in 10 Minutes Technical Blog	8	593	October 20, 2019
Converting Caffe model to TensorRT Jetson TX2	33	11504	October 18, 2021
Mobilenet_V2 sampleUffSSD not Working -- Help Please! TensorRT	12	2056	February 28, 2020
Running SSD mobilenet v2 with live USB camera capture on jetson nano Jetson Nano	12	3621	October 18, 2021
SSD Mobilenet V2 TensorRT optimization for Jetson TX2 Jetson TX2 tensorrt	6	1862	October 18, 2021
Hello AI World - new object detection training and video interfaces Jetson Nano	29	4499	April 20, 2021
Object Detection Performance Jetson Tx2 slower than expected Jetson TX2	22	14696	October 18, 2021
Jetson Nano Brings AI Computing to Everyone Technical Blog	71	1154	March 13, 2020

Object Detection with MobileNet-SSD slower than mentioned speed

Related topics