Object Detection with MobileNet-SSD slower than mentioned speed

I am using the same model in jetson-inference and also get 22FPS here with SSD-Mobilenet-v2 with detectnet-console and detectnet-camera, so not sure why it is running slower for zeyuchen2016.

zeyuchen2016, what camera are you using and at what resolution? Is your Nano running in 5W mode or 10W?

In jetbot, But jut run ./detecnet_camera.

RAM 2264/3963MB (lfb 54x4MB) SWAP 54/4096MB (cached 3MB) IRAM 0/252kB(lfb 252kB) CPU [67%@921,63%@921,off,off] EMC_FREQ 6%@1600 GR3D_FREQ 94%@76 APE 25 PLL@24C CPU@27.5C iwlwifi@32C PMIC@100C GPU@26C AO@33C thermal@26.75C POM_5V_IN 2688/2688 POM_5V_GPU 120/120 POM_5V_CPU 560/560

Raspberry Pi Camerra v2 (IMX219 tensor)

jetbot@jetbot:~/test/jetson-inference/build/aarch64/bin$ ./detectnet-camera
[gstreamer] initialized gstreamer, version 1.14.5.0
[gstreamer] gstCamera attempting to initialize with GST_SOURCE_NVARGUS, camera 0
[gstreamer] gstCamera pipeline string:
nvarguscamerasrc sensor-id=0 ! video/x-raw(memory:NVMM), width=(int)1280, height=(int)720, framerate=30/1, format=(string)NV12 ! nvvidconv flip-method=2 ! video/x-raw ! appsink name=mysink
[gstreamer] gstCamera successfully initialized with GST_SOURCE_NVARGUS, camera 0

detectnet-camera:  successfully initialized camera device
    width:  1280
   height:  720
    depth:  12 (bpp)

TensorRT version 5.0.6-1+cuda10.0

head -n 1 /etc/nv_tegra_release

R32(release)


RAM 2264/3963MB (lfb 54x4MB) SWAP 54/4096MB (cached 3MB) IRAM 0/252kB(lfb 252kB) CPU [67%@921,63%@921,off,off] EMC_FREQ 6%@1600 GR3D_FREQ 94%@76 APE 25 PLL@24C CPU@27.5C iwlwifi@32C PMIC@100C GPU@26C AO@33C thermal@26.75C POM_5V_IN 2688/2688 POM_5V_GPU 120/120 POM_5V_CPU 560/560

The CPU and GPU don’t seem to run at maximum speed on your Jetson Nano. Try to set Jetson Nano into MAX-N mode and re-run the test.

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

OkI shutdown .

Run above commandlineThe SSD-MobileNet-V2 speed up to 18-19FPS!

Could be faster

I have try this https://devtalk.nvidia.com/default/topic/1050377/jetson-nano/deep-learning-inference-benchmarking-instructions/ SSD-mobilenet-V2

Time taken for inference is 26.2917 ms.

By default, JetBot runs the Nano in 5W mode due to the battery power supply - so SSD detector would run slower in 5W mode vs 10W mode.

sudo nvpmodel -m 0
sudo nvpmodel -q

NV Power Mode:MAXN
0

SSD-mobilenet-v2 speed ~19FPS

In 5W mode,Only have ~14FPS

Another question is In Jetson Benchmarks | NVIDIA Developer,

SSD-mobilenet-v2 have three different input size 960544 480272 300*300, and have different speed.

But I run code in jetson-inference/detectnet-camera-2.md at master · dusty-nv/jetson-inference · GitHub

./detectnet-camera                             # using SSD-Mobilenet-v2, default MIPI CSI camera (1280x720)

and

./detectnet-camera --width=640 --height=480    # using SSD-Mobilenet-v2, default MIPI CSI camera (640x480)

their speed are the same as.

How to change my TF pb file to use in detectnet_camera?

WHat resources could be reference?

Thanks

Update1:

I follow this code [url]https://github.com/AastaNV/TRT_object_detection/blob/master/config/model_ssd_mobilenet_v2_coco_2018_03_29.py#L11[/url]

But,I could not understand why use graph.remove some node?It not support on TensorRT?

I use tmp.uff that generated in main.py, ./detectnet_camera --model=/path/to/tmp.uff ,But some errors reported.

Like Input shape not match


Update2:

I see this TensorRT/samples/opensource/sampleUffSSD at v5.1.5 · NVIDIA/TensorRT · GitHub

But could not found convert-to-uff in jetson nano.

In jetston nano
sudo python3 convert_to_uff.py ~/test/TRT_object_detection/model/ssd_mobilenet_v2_coco_2018_03_29/frozen_inference_graph.pb -o hello.uff -O NMS -p /usr/src/tensorrt/samples/sampleUffSSD/config.py

Then

./detectnet-camera --model=./networks/hello.uff --class_labels=./networks/tmp/ssd_coco_labels.txt
[TRT]   TensorRT version 5.0.6
[TRT]   loading NVIDIA plugins...
[TRT]   completed loading NVIDIA plugins.
[TRT]   detected model format - UFF  (extension '.uff')
[TRT]   desired precision specified for GPU: FASTEST
[TRT]   requested fasted precision for device GPU without providing valid calibrator, disabling INT8
[TRT]   native precisions detected for GPU:  FP32, FP16
[TRT]   selecting fastest native precision for GPU:  FP16
[TRT]   attempting to open engine cache file ./networks/hello.uff.1.1.GPU.FP16.engine
[TRT]   cache file not found, profiling network model on device GPU
[TRT]   device GPU, loading /home/jetbot/test/jetson-inference/build/aarch64/bin/ ./networks/hello.uff
[TRT]   FeatureExtractor/MobilenetV2/Conv/Relu6: elementwise inputs must have same dimensions or follow broadcast rules (input dimensions were [1,32,150,150] and [1,1,1])
[TRT]   FeatureExtractor/MobilenetV2/expanded_conv/depthwise/depthwise: at least three non-batch dimensions are required for input
[TRT]   UFFParser: Parser error: FeatureExtractor/MobilenetV2/expanded_conv/depthwise/BatchNorm/batchnorm/mul_1: The input to the Scale Layer is required to have a minimum of 3 dimensions.
[TRT]   failed to parse UFF model './networks/hello.uff'
[TRT]   device GPU, failed to load ./networks/hello.uff

Any people could help me ?

How to get the uff which could be used in detectnet_camera?

Thank you very much!

I have implemented video pipelining design in my TensorRT SSD demo program. The new code is ‘trt_ssd_async.py’. Comparing with my previous (non-async) implementation, FPS improved from 22.8 to 26 when I tested ssd_mobilenet_v1_coco on the huskies.jpg image!

$ python3 trt_ssd_async.py --model ssd_mobilenet_v1_coco \
                           --image \
                           --filename ${HOME}/project/tf_trt_models/examples/detection/data/huskies.jpg

Check out details in: https://github.com/jkjung-avt/tensorrt_demos

Hi dusty,I do followed the guide,but the follow problem encountered

Compiling: sampleUffSSD.cpp
sampleUffSSD.cpp:22:15: error: ‘gLogger’ was declared ‘extern’ and later ‘static’ [-fpermissive]
 static Logger gLogger;
               ^~~~~~~
In file included from ../common/common.h:55:0,
                 from BatchStreamPPM.h:9,
                 from sampleUffSSD.cpp:12:
../common/logger.h:55:15: note: previous declaration of ‘gLogger’
 extern Logger gLogger;
               ^~~~~~~
../Makefile.config:173: recipe for target '../../bin/dchobj/sampleUffSSD.o' failed
make: *** [../../bin/dchobj/sampleUffSSD.o] Error 1

But when I comment out the 22 line like this"//static Logger gLogger;" in the sampleUffSSD.cppI encountered the follow problem again,
Can you help me,thanks!

Compiling: sampleUffSSD.cpp
Linking: ../../bin/sample_uff_ssd_rect_debug
../../bin/dchobj/sampleUffSSD.o: In function `loadModelAndCreateEngine(char const*, int, nvuffparser::IUffParser*, nvinfer1::IHostMemory*&)':
/usr/src/tensorrt/samples/sampleUffSSD_rect/sampleUffSSD.cpp:141: undefined reference to `gLogger'
/usr/src/tensorrt/samples/sampleUffSSD_rect/sampleUffSSD.cpp:141: undefined reference to `gLogger'
/usr/src/tensorrt/samples/sampleUffSSD_rect/sampleUffSSD.cpp:148: undefined reference to `gLogger'
/usr/src/tensorrt/samples/sampleUffSSD_rect/sampleUffSSD.cpp:148: undefined reference to `gLogger'
/usr/src/tensorrt/samples/sampleUffSSD_rect/sampleUffSSD.cpp:185: undefined reference to `gLogger'
../../bin/dchobj/sampleUffSSD.o:/usr/src/tensorrt/samples/sampleUffSSD_rect/sampleUffSSD.cpp:185: more undefined references to `gLogger' follow
collect2: error: ld returned 1 exit status
../Makefile.config:161: recipe for target '../../bin/sample_uff_ssd_rect_debug' failed
make: *** [../../bin/sample_uff_ssd_rect_debug] Error 1

Hello:
The sample in the path /usr/src/tensorrt/samples/sampleUffSSD/ only can test one image,Can anyone help me test multiple images using this sample,I am not familiar with C++,tanks!!!

My demo #3 (ssd) in jkjung-avt/tensorrt_demos GitHub repository is implemented purely in python. It already supports video file, image file or camera as input. Check out the links below:

https://github.com/jkjung-avt/tensorrt_demos
https://jkjung-avt.github.io/tensorrt-ssd/
https://jkjung-avt.github.io/speed-up-trt-ssd/

Hi zeyuchen2016,I do followed the guide,but the follow problem encountered

Compiling: sampleUffSSD.cpp
sampleUffSSD.cpp:22:15: error: ‘gLogger’ was declared ‘extern’ and later ‘static’ [-fpermissive]
static Logger gLogger;
^~~~~~~
In file included from ../common/common.h:55:0,
from BatchStreamPPM.h:9,
from sampleUffSSD.cpp:12:
../common/logger.h:55:15: note: previous declaration of ‘gLogger’
extern Logger gLogger;
^~~~~~~
../Makefile.config:173: recipe for target '../../bin/dchobj/sampleUffSSD.o' failed
make: *** [../../bin/dchobj/sampleUffSSD.o] Error 1

But when I comment out the 22 line like this"//static Logger gLogger;" in the sampleUffSSD.cppI encountered the follow problem again,
Can you help me,thanks!

Compiling: sampleUffSSD.cpp
Linking: ../../bin/sample_uff_ssd_rect_debug
../../bin/dchobj/sampleUffSSD.o: In function `loadModelAndCreateEngine(char const*, int, nvuffparser::IUffParser*, nvinfer1::IHostMemory*&)':
/usr/src/tensorrt/samples/sampleUffSSD_rect/sampleUffSSD.cpp:141: undefined reference to `gLogger'
/usr/src/tensorrt/samples/sampleUffSSD_rect/sampleUffSSD.cpp:141: undefined reference to `gLogger'
/usr/src/tensorrt/samples/sampleUffSSD_rect/sampleUffSSD.cpp:148: undefined reference to `gLogger'
/usr/src/tensorrt/samples/sampleUffSSD_rect/sampleUffSSD.cpp:148: undefined reference to `gLogger'
/usr/src/tensorrt/samples/sampleUffSSD_rect/sampleUffSSD.cpp:185: undefined reference to `gLogger'
../../bin/dchobj/sampleUffSSD.o:/usr/src/tensorrt/samples/sampleUffSSD_rect/sampleUffSSD.cpp:185: more undefined references to `gLogger' follow
collect2: error: ld returned 1 exit status
../Makefile.config:161: recipe for target '../../bin/sample_uff_ssd_rect_debug' failed
make: *** [../../bin/sample_uff_ssd_rect_debug] Error 1

#51

thanks very much,I’m following thses approach.

Hi,jkjung!
I have see your guidance to install TensorFlow1.12.2 on Jetson Nano,follow this approach can I install TensorFlow1.10.1 ? I want to install TensorFlow1.10.1 on Nano first and then test whether it able to run the ssd_mobilenet_v2 demo.
Or can you provid the TensorFlow1.10.1 install guidance,thanks very much!

@grand_yanx, I suggest you to use tensorflow 1.12.x, as I stated in the README.md:

Hi grand_yanx, try just removing “static” as opposed to commenting out the entire line:

//static Logger gLogger;
Logger gLogger;

Hi,jkjung13
Because my other system not the jetson series installed the tensorflow 1.10.1,I want to install the same version,if it’s not compatible by thenI install the tensorflow 1.12.x again,thanks!!

Hi,dusty_nv,I removing “static” as:Logger gLogger;then can run correctly,the average inference time is 27.7185 ms,but when I exchange the sample_unpruned_mobilenet_v2.uff to myself .uff file witch is converted from ssd_mobilenet_v2 model using the method “python3.6 /usr/lib/python3.6/dist-packages/uff/bin/convert_to_uff.py --input_file 
”,the inference time becomes 39.8804 ms,I want to know how your sample_unpruned_mobilenet_v2.uff file converted or can you supply the method of how to generate the .uff file like sample_unpruned_mobilenet_v2.uff?

thanks very much!!!