Deep Learning Inference Benchmarking Instructions

dusty_nv · April 17, 2019, 11:42pm

Hi all, below you will find the procedures to run the Jetson Nano deep learning inferencing benchmarks from this blog post with TensorRT.

note: for updated JetPack 4.4 benchmarks, please use github.com/NVIDIA-AI-IOT/jetson_benchmarks

While using one of the recommended power supplies, make sure you Nano is in 10W performance mode (which is the default mode):

$ sudo nvpmodel -m 0
$ sudo jetson_clocks

Using other lower-capacity power supplies may lead to system instabilities or shutdown during the benchmarks.

SSD-Mobilenet-V2

Copy the ssd-mobilenet-v2 archive from here to the ~/Downloads folder on Nano.

$ cd ~/Downloads/
$ wget --no-check-certificate 'https://nvidia.box.com/shared/static/8oqvmd79llr6lq1fr43s4fu1ph37v8nt.gz' -O ssd-mobilenet-v2.tar.gz
$ tar -xvf ssd-mobilenet-v2.tar.gz
$ cd ssd-mobilenet-v2
$ sudo cp -R sampleUffSSD_rect /usr/src/tensorrt/samples
$ sudo cp sample_unpruned_mobilenet_v2.uff /usr/src/tensorrt/data/ssd/
$ sudo cp image1.ppm /usr/src/tensorrt/data/ssd/

Apply the following patches to the sample, depending on your JetPack version:

JetPack 4.4 or newer

patch for /usr/src/tensorrt/samples/sampleUffSSD_rect/sampleUffSSD.cpp

20,21d19
< using namespace sample;
< using namespace std;
23c21
< /*static Logger gLogger;*/
---
> static Logger gLogger;
171c169
<     builder->setMaxWorkspaceSize(1024 * 1024 * 128); // We need about 1GB of scratch space for the plugin layer for batch size 5.
---
>     builder->setMaxWorkspaceSize(128_MB); // We need about 1GB of scratch space for the plugin layer for batch size 5.

patch for /usr/src/tensorrt/samples/sampleUffSSD_rect/Makefile

3d2
< EXTRA_DIRECTORIES = ../common

JetPack 4.3 or JetPack 4.2.1

patch for /usr/src/tensorrt/samples/sampleUffSSD_rect/sampleUffSSD.cpp

19a20
> using namespace std;
21c22
< static Logger gLogger;
---
> /*static*/ Logger gLogger;
169c170
<     builder->setMaxWorkspaceSize(128_MB); // We need about 1GB of scratch space for the plugin layer for batch size 5.
---
>     builder->setMaxWorkspaceSize(1024 * 1024 * 128); // We need about 1GB of scratch space for the plugin layer for batch size 5.

Compile the sample

$ cd /usr/src/tensorrt/samples/sampleUffSSD_rect
$ sudo make

Run the sample to measure inference performance

$ cd /usr/src/tensorrt/bin
$ sudo ./sample_uff_ssd_rect

Image Classification (ResNet-50, Inception V4, VGG-19)

The resources needed to run these models are available here. Copy each of these .prototxt files to the /usr/src/tensorrt/data/googlenet folder on your Jetson Nano.

ResNet-50

$ cd /usr/src/tensorrt/bin
$ ./trtexec --output=prob --deploy=../data/googlenet/ResNet50_224x224.prototxt --fp16 --batch=1

Inception V4

$ cd /usr/src/tensorrt/bin
$ ./trtexec --output=prob --deploy=../data/googlenet/inception_v4.prototxt --fp16 --batch=1

VGG-19

$ cd /usr/src/tensorrt/bin
$ ./trtexec --output=prob --deploy=../data/googlenet/VGG19_N2.prototxt --fp16 --batch=1

U-Net Segmentation

Copy the output_graph.uff model file from here to the home folder on your Jetson Nano or any directory of your preference.

Run the U-Net inference benchmark:

$ cd /usr/src/tensorrt/bin
$ sudo ./trtexec --uff=~/output_graph.uff --uffInput=input_1,1,512,512 --output=conv2d_19/Sigmoid --fp16

Pose Estimation

Copy the pose_estimation.prototxt file from here to the /usr/src/tensorrt/data/googlenet folder of your Nano.

Run the OpenPose inference benchmark:

$ cd /usr/src/tensorrt/bin/
$ sudo ./trtexec --output=Mconv7_stage2_L2 --deploy=../data/googlenet/pose_estimation.prototxt --fp16 --batch=1

Super Resolution

Download the require files to run inference on the Super Resolution neural network.

$ sudo wget --no-check-certificate 'https://nvidia.box.com/shared/static/a99l8ttk21p3tubjbyhfn4gh37o45rn8.gz' -O Super-Resolution-BSD500.tar.gz

Unzip the downloaded file

$ sudo tar -xvf Super-Resolution-BSD500.tar.gz

Run the Super Resolution inferencing benchmark:

$ cd /usr/src/tensorrt/bin
$ sudo ./trtexec --output=output_0 --onnx=<path to the .onnx file in the unzipped folder above> --fp16 --batch=1

Tiny YOLO v3

Install pre-requisite packages

$ sudo apt-get install libgstreamer-plugins-base1.0-dev libgstreamer1.0-dev libgflags-dev

Download trt-yolo-app

$ cd ~
$ git clone -b restructure https://github.com/NVIDIA-AI-IOT/deepstream_reference_apps

If you are using JetPack 4.3 or newer, apply the following git patch to the deepstream_reference_apps source:

diff --git a/yolo/config/yolov3-tiny.txt b/yolo/config/yolov3-tiny.txt
index ec12c53..47e46a6 100644
--- a/yolo/config/yolov3-tiny.txt
+++ b/yolo/config/yolov3-tiny.txt
@@ -47,7 +47,7 @@
 # nms_thresh : IOU threshold for bounding box candidates. Default value is 0.5
 
 #Uncomment the lines below to use a specific config param
-#--precision=kINT8
+--precision=kHALF
 #--calibration_table_path=data/calibration/yolov3-tiny-calibration.table
 #--engine_file_path=
 #--print_prediction_info=true
diff --git a/yolo/lib/ds_image.cpp b/yolo/lib/ds_image.cpp
index 36a394c..9e4ff5b 100644
--- a/yolo/lib/ds_image.cpp
+++ b/yolo/lib/ds_image.cpp
@@ -88,7 +88,7 @@ DsImage::DsImage(const std::string& path, const int& inputH, const int& inputW)
     cv::copyMakeBorder(m_LetterboxImage, m_LetterboxImage, m_YOffset, m_YOffset, m_XOffset,
                        m_XOffset, cv::BORDER_CONSTANT, cv::Scalar(128, 128, 128));
     // converting to RGB
-    cv::cvtColor(m_LetterboxImage, m_LetterboxImage, CV_BGR2RGB);
+    cv::cvtColor(m_LetterboxImage, m_LetterboxImage, cv::COLOR_BGR2RGB);
 }
 
 void DsImage::addBBox(BBoxInfo box, const std::string& labelName)
@@ -106,7 +106,7 @@ void DsImage::addBBox(BBoxInfo box, const std::string& labelName)
         = cv::getTextSize(labelName, cv::FONT_HERSHEY_COMPLEX_SMALL, 0.5, 1, nullptr);
     cv::rectangle(m_MarkedImage, cv::Rect(x, y, tsize.width + 3, tsize.height + 4), color, -1);
     cv::putText(m_MarkedImage, labelName.c_str(), cv::Point(x, y + tsize.height),
-                cv::FONT_HERSHEY_COMPLEX_SMALL, 0.5, cv::Scalar(255, 255, 255), 1, CV_AA);
+                cv::FONT_HERSHEY_COMPLEX_SMALL, 0.5, cv::Scalar(255, 255, 255), 1, cv::LINE_AA);
 }
 
 void DsImage::showImage() const
@@ -142,4 +142,4 @@ std::string DsImage::exportJson() const
             json << "}";
     }
     return json.str();
-}
\ No newline at end of file
+}
diff --git a/yolo/lib/trt_utils.h b/yolo/lib/trt_utils.h
index 359bfea..96a5a39 100644
--- a/yolo/lib/trt_utils.h
+++ b/yolo/lib/trt_utils.h
@@ -28,11 +28,12 @@ SOFTWARE.
 #define __TRT_UTILS_H__
 
 /* OpenCV headers */
-#include <opencv/cv.h>
+//#include <opencv/cv.h>
 #include <opencv2/core/core.hpp>
 #include <opencv2/dnn/dnn.hpp>
 #include <opencv2/highgui/highgui.hpp>
 #include <opencv2/imgproc/imgproc.hpp>
+#include <opencv2/imgcodecs/legacy/constants_c.h>
 
 #include <set>
 
diff --git a/yolo/lib/yolo.cpp b/yolo/lib/yolo.cpp
index 117a49f..2b7435e 100644
--- a/yolo/lib/yolo.cpp
+++ b/yolo/lib/yolo.cpp
@@ -423,7 +423,7 @@ void Yolo::createYOLOEngine(const nvinfer1::DataType dataType, Int8EntropyCalibr
               << " precision : " << m_Precision << " and batch size :" << m_BatchSize << std::endl;
 
     m_Builder->setMaxBatchSize(m_BatchSize);
-    m_Builder->setMaxWorkspaceSize(1 << 20);
+    m_Builder->setMaxWorkspaceSize(1024 * 1024 * 8);
 
     if (dataType == nvinfer1::DataType::kINT8)
     {

Install other requirements

$ cd ~/deepstream_reference_apps/yolo
$ sudo sh prebuild.sh

Compile and install app

$ cd apps/trt-yolo
$ mkdir build && cd build
$ cmake -D CMAKE_BUILD_TYPE=Release ..
$ make && sudo make install
$ cd ../../..

For the sample image data set, you can download 500 images (need to be in .png) format to any folder on your Jetson Nano, just use 1 image file, or use a test set of 5 images that we've provided here.
- Navigate your terminal to:
```
$ cd ~/deepstream_reference_apps/yolo/data
```
- Open the file “test_images.txt”
- In the above file, you need to provide the full path to each of the 500 images you downloaded. For example, if your first image is located in the Downloads directory, the path you would enter in line 1 would be:
```
/home/<username>/Downloads/<image file name>.png
```
- Alternatively, you could provide the path to just one image and copy that line 500 times in that file.
- A sample set of images (5 images of varying resolutions, repeated 100 times) along with the test_images.txt file have been uploaded here. You can use this data set if you don’t want to download your own images.
- Go to the folder ‘config’ and open file ‘yolov3-tiny.txt'
- In the file yolov3-tiny.txt, search for “--precision=kINT8” and replace “kINT8” with “kHALF” to change the inference precision to FP16 mode. Also you will need to uncomment this line. (if you applied the patch for JetPack 4.3 above, this step has already been done)
- Save the file

Now run the Tiny YOLO inference:

$ cd ~/deepstream_reference_apps/yolo
$ sudo trt-yolo-app --flagfile=config/yolov3-tiny.txt

luisma · April 18, 2019, 6:38pm

hello, I followed the commands for the SSD-Mobilenet-V2, getting a crash.
I think it’s because I only use the micro-usb charger (10 W) to feed the nano jetson.
And since then it doesn’t boot-up either.
Can you confirm that the jetson nano is unable to boot with a micro-usb loader after executing the jetson_clock command?
Is it possible to modify some file of the sd card (from another device) to revert the changes produced by jetson_clock?
Thanks

dusty_nv · April 18, 2019, 6:42pm

Hi luisma, can you try re-flashing your SD card with the original image?

I’ll add a note to the post above about using one of the recommended power supplies to run the benchmarks, thanks.

luisma · April 18, 2019, 6:52pm

sorry , i do not want to re-flashing because I’ve worked so hard on it. I’d like to reconfigure it.

dusty_nv · April 18, 2019, 6:56pm

It’s possible that during the abrupt shutdown, the filesystem on the SD card got corrupted, which is why it may no longer boot. Do you have a second SD card that you could try flashing with the original image? Alternatively, would recommend trying one of the DC barrel jack adapters or one of the recommended USB power supplies and seeing if that helps (although jetson_clocks behavior gets reset upon reboot, and nvpmodel -m 0 profile is already the default).

You could also try plugging your SD card into a Linux PC (or another machine that can read ext4) and see if you can mount it to recover your files.

luisma · April 18, 2019, 6:59pm

Thanks , Thats great !!
And what files must be restored ?
Perhaps l4t_dfs.conf and put a little script in etc/rc.local like with :
jetson-clock --restore

dusty_nv · April 18, 2019, 7:05pm

It is unclear which files are corrupt/damaged and would need to be restored. You could try using fsck utility from a PC to check for errors.

Baring that, the purpose of mounting the SD card on PC would be to backup your files before re-flashing the SD card.

luisma · April 18, 2019, 7:11pm

Thank you very much. i will try.

hunterjm · April 24, 2019, 8:51pm

Do you have the benchmarking instructions for the SSD ResNet-18 model?

dusty_nv · April 25, 2019, 11:29am

Hi hunterjm, looking into what these are now. Stay tuned, thanks.

a7ypical · April 29, 2019, 8:57am

Hi,

Can you specify which openpose network did you use and can you also post the weights?

Thanks.

atyshka · April 29, 2019, 1:45pm

@a7ypical sadly I don’t think they have the weights. They used an open source model from here which does not post the weights: https://github.com/opencv/open_model_zoo/blob/master/intel_models/human-pose-estimation-0001/human-pose-estimation-0001.prototxt

Either you’ll have to train your own or try and convert another model to tensorrt,

Freemanix · May 3, 2019, 9:13am

Hi,

I tried the above mentioned mobilenet_v2 SSD example and the results are not encouraging, to be honest. It detects nothing on sample images. Are you sure the image data are being normalized correctly for this network?

What is the TF source model for the sample_unpruned_mobilenet_v2.uff? According to sample source, it should have 37 classes, but MS COCO has much more classes.

I would like to be able to go through TF → UFF → TensorRT with mobilenet_v2 SSD and to try different dimensions, too. Can you share your code somewhere?

Thank you

dusty_nv · May 3, 2019, 1:17pm

Hi Freemanix, you would want to freeze the PB graph from TensorFlow and export it to UFF similar to these documents:

atyshka · May 3, 2019, 4:15pm

I’m having a strange issue now. The mobilenet sample code you posted works just fine. But now when I attempt to build regular sampleUffSSD instead of sampleUffSSD_rect, the executable is named sampleUffSSD but runs the code of sampleUffSSD_rect. So I now have two executables, sampleUffSSD and sampleUffSSD_rect, that both seem to run the code of sampleUffSSD_rect. Is something messed up with the makefiles?

Update: Renaming the files and running make clean fixed it

atyshka · May 3, 2019, 5:23pm

@Freemanix I noticed the same where nothing is detected in this network. It seems suspicious that the code that generated detections was commented out in the example program.

dusty_nv · May 4, 2019, 1:01am

The benchmark is for the network - that sample does post-processing which was commented out to get an accurate performance result of the network, as different applications and platforms apply pre/post-processing differently.

Freemanix · May 6, 2019, 2:07pm

Of course, I wrote my own result parsing. The problem is not in commented out code, but in the network inference results. I tried to work with the similar sample, sampleUffSSD in the tensorrt samples, but when i convert frozen graph for ssd_inception_v2_coco_2017_11_17 to .uff file, the sample fails with:

../data/ssd/sample_ssd_relu6.uff
Begin parsing model...
ERROR: UFFParser: Graph error: Cycle graph detected
ERROR: sample_uff_ssd: Fail to parse

As a result, I was unable to run reasonably fast valid SSD on Jetson Nano so far.

bl5218 · May 22, 2019, 6:00pm

Hello! It seems that the link to the Unet files is wrong. Can you fix that? Can you provide any details about the use Unet architecture, or other useful resources for segmentation? Thank you!

dusty_nv · May 22, 2019, 7:22pm

Hi bl5218, the UNet and pose estimation model share the same folder on Google Drive. The UNet model is output_graph.uff (and the prototxt from that folder is for the pose estimation benchmark). Sorry for the confusion — it should work ok though.

For other resource on semantic segmentation network, see this tutorial:
https://github.com/dusty-nv/jetson-inference/blob/master/docs/segnet-dataset.md

Topic		Replies	Views
Object Detection with MobileNet-SSD slower than mentioned speed Jetson Nano	92	18771	October 14, 2021
What almost everyone with a nano is looking for Jetson Nano	65	6217	October 15, 2021
ONNX model with Jetson-Inference using GPU Jetson Xavier NX tensorrt , jetson-inference , onnx	38	5645	October 18, 2021
Yolo for Jetson DeepStream SDK	41	6315	August 13, 2024
Jetson Nano Brings AI Computing to Everyone Technical Blog	71	1154	March 13, 2020
Jetson nano start the Docker an error occurred while training your detection model ：Segmentation fault (core dumped) Jetson Nano jetson-inference	7	1234	April 21, 2022
Hello AI World - new object detection training and video interfaces Jetson Nano	29	4498	April 20, 2021
Object Detection Performance Jetson Tx2 slower than expected Jetson TX2	22	14694	October 18, 2021
Model inferencing with TensorRT on Jetson (TX2) Jetson TX2	4	946	October 18, 2021
Converting Caffe model to TensorRT Jetson TX2	33	11501	October 18, 2021

Deep Learning Inference Benchmarking Instructions

Related topics