How to increase inference speed on JETSON NANO (4GB)

@Morganh @andiyael
I am using a SSD(caffe) model for person detection on Jetson Nano.
I have 2 files:

  1. Mobile_SSD_deploy.caffemodel
  2. Mobile_SSD_deploy_prototxt.txt

I am using Python for this project.
I am using OpenCV with CUDA backend for video processing.

When I run the script using the model the inference is very slow. Please suggest me how can I improve the performance and run it smoothly on Jetson Nano.

Hi,

First, please make sure you have maximized the device performance:

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

Do you also use OpenCV for DNN inferencing?
If yes, it’s recommended to use TensorRT instead.

For example, below is the command for MNIST model benchmarking.
You can check the performance before the switching.

$ /usr/src/tensorrt/bin/trtexec --deploy=/usr/src/tensorrt/data/mnist/mnist.prototxt --model=/usr/src/tensorrt/data/mnist/mnist.caffemodel --output=prob
$ /usr/src/tensorrt/bin/trtexec --deploy=/usr/src/tensorrt/data/mnist/mnist.prototxt --model=/usr/src/tensorrt/data/mnist/mnist.caffemodel --output=prob --fp16 

Thanks.

Hi @AastaLLL
Thanks for the quick reply.
I have tried the following command:

/usr/src/tensorrt/bin/trtexec --deploy=/home/hp/Documents/model/MobileNetSSD_deploy.prototxt.txt --model=/home/hp/Documents/model/MobileNetSSD_deploy.caffemodel --output=prob

But I am getting the following error:
Caffe Parser: Invalid axis in softmax layer - TensorRT expects NCHW input. Negative axis is not supported in TensorRT, please use positive axis indexing
error parsing layer type Softmax index 116

I am using the following files:

MobileNetSSD_deploy.caffemodel (22.1 MB)
MobileNetSSD_deploy.prototxt.txt (28.7 KB)

@AastaLLL @Morganh please help.

Thanks.

Hi,

We check your issue on a JetPack 4.6 environment.

The error is caused by the non-supported Flatten layer.
To workaround this, you can rewrite the Flatten layer with a Reshape layer at the same axis.

For example:

--- a/MobileNetSSD_deploy.prototxt.txt
+++ b/MobileNetSSD_deploy.prototxt.txt
@@ -1183,11 +1183,16 @@ layer {
 }
 layer {
   name: "conv11_mbox_loc_flat"
-  type: "Flatten"
+  type: "Reshape"
   bottom: "conv11_mbox_loc_perm"
   top: "conv11_mbox_loc_flat"
-  flatten_param {
-    axis: 1
+  reshape_param {
+    shape {
+      dim: 0
+      dim: -1
+      dim: 1
+      dim: 1
+    }
   }
 }
 layer {

After this update, we can run your model with following command successfully.

$ /usr/src/tensorrt/bin/trtexec --deploy=MobileNetSSD_deploy.prototxt.txt --model=MobileNetSSD_deploy.caffemodel --output=detection_out --output=keep_count

Attached the modified prototxt file for your reference: MobileNetSSD_deploy.prototxt.txt (29.4 KB)

Thanks.