SSD Mobilenet V2 TensorRT optimization for Jetson TX2

austin.anderson · February 26, 2020, 5:22pm

I’ve been working to optimize an SSD Mobilenet V2 model to run in TensorRT on my Jetson, some info on versioning:

Jetson TX2
cuda - 10.0
TRT - 5.6.1
TF - 1.14.0
Distributor ID: Ubuntu
Description: Ubuntu 18.04.2 LTS
Release: 18.04
Codename: bionic

Originally I’d optimized using the standard TF-TRT flow and that works and it increases speed on a 300x300 image from about 1 FPS (TF only) to 4 FPS (TF-TRT). That’s ok, but ideally I’d like to try and get the 10-20 FPS that’s reported widely with the full TRT optimization which I can also more easily call from C++.

So we’ve been wrestling with this flow for a while and following great tutorials from dusty-nv:

github.com

dusty-nv/jetson-inference/blob/master/docs/detectnet-console-2.md

<img src="https://github.com/dusty-nv/jetson-inference/raw/master/docs/images/deep-vision-header.jpg" width="100%">
<p align="right"><sup><a href="imagenet-camera-2.md">Back</a> | <a href="detectnet-camera-2.md">Next</a> | </sup><a href="../README.md#hello-ai-world"><sup>Contents</sup></a>
<br/>
<sup>Object Detection</sup></s></p>

# Locating Objects with DetectNet
The previous recognition examples output class probabilities representing the entire input image.  Next we're going to focus on **object detection**, and finding where in the frame various objects are located by extracting their bounding boxes.  Unlike image classification, object detection networks are capable of detecting many different objects per frame.

<img src="https://github.com/dusty-nv/jetson-inference/raw/dev/docs/images/detectnet.jpg" >

The [`detectNet`](../c/detectNet.h) object accepts an image as input, and outputs a list of coordinates of the detected bounding boxes along with their classes and confidence values.  [`detectNet`](../c/detectNet.h) is available to use from [Python](https://rawgit.com/dusty-nv/jetson-inference/python/docs/html/python/jetson.inference.html#detectNet) and [C++](../c/detectNet.h).  See below for various [pre-trained detection models](#pre-trained-detection-models-available)  available for download.  The default model used is a [91-class](../data/networks/ssd_coco_labels.txt) SSD-Mobilenet-v2 model trained on the MS COCO dataset, which achieves realtime inferencing performance on Jetson with TensorRT. 

As examples of using the `detectNet` class, we provide sample programs for C++ and Python:

- [`detectnet.cpp`](../examples/detectnet/detectnet.cpp) (C++) 
- [`detectnet.py`](../python/examples/detectnet.py) (Python) 

These samples are able to detect objects in images, videos, and camera feeds.  For more info about the various types of input/output streams supported, see the [Camera Streaming and Multimedia](aux-streaming.md) page.

### Detecting Objects from Images

This file has been truncated. show original

and jkjung-avt:

https://github.com/jkjung-avt/tensorrt_demos#ssd

So I understand that mostly the optimization flow people have had success with is TF → UFF → TRT with some plugin work to get to UFF because of unsupported operations in TF. We had initially chosen the UFF flow because there were a lot of examples and people sounded like they had difficulty with the ONNX flow:

https://github.com/jkjung-avt/tensorrt_demos/issues/43#issuecomment-584534847

We’re now running into some significant issues with UFF and TF version support:

NOTE: UFF has been tested with TensorFlow 1.12.0. Other versions are not guaranteed to work
WARNING: The version of TensorFlow installed on this system is not guaranteed to work with UFF.
UFF Version 0.6.3

and we did end up running into issues specifically with TensorFlow 1.14.0 and cuda 10.0, and the best we could tell the solution others had suggested was to revert to TF 1.12.0 and cuda 9.0 which isn’t ideal as that’d require us to revert to an older version of the os/bsp.

So I guess my questions are:

Has anyone had luck using UFF with these newer versions of TF/Cuda?
Has anyone had any luck with the ONNX flow for ssd-mobilenet-v2?

Again all the official examples seem to be for UFF, but then UFF is supposed to be deprecated soon and ONNX is supposed to be the new format with support. Can we expect some ONNX demos like the official UFF demos anytime soon?

Thanks!

AastaLLL · March 2, 2020, 3:39am

Hi,

The log looks like a warning rather than an error.
Please reflash the device with JetPack 4.3 to get the latest TensorRT package.

1. Here is a sample that can work with TF v1.14.0 model and CUDA 10.0:
https://github.com/AastaNV/TRT_object_detection
We don’t use CUDA 9.0 for a while so it’s not recommended to revert the CUDA back to v9.0.

2. Sorry that we don’t use ONNX-based ssd-mobilenet-v2 since the official format is TensorFlow.
But we do have some ONNX example here:
/usr/src/tensorrt/samples/python/yolov3_onnx

Thanks.

austin.anderson · March 4, 2020, 12:02am

Hey AastaLLL,

We were able to work with your sample to get our detector flow running at rate which is awesome, thanks for the quick help there!

With regards to the second aspect of my question it really was in reference to this release note:

and specifically the line:

“Deprecation of Caffe Parser and UFF Parser - We are deprecating Caffe Parser and UFF Parser in TensorRT 7. They will be tested and functional in the next major release of TensorRT 8, but we plan to remove the support in the subsequent major release. Plan to migrate your workflow to use tf2onnx, keras2onnx or TensorFlow-TensorRT (TF-TRT) for deployment.”

So while ssd-mobilenet-v2 is indeed in TensorFlow it sounds like based on this note, to optimize this in the future the process would be:

TF → (via tf2onnx) → ONNX → TRT

as UFF is being deprecated and ONNX is the only format being supported going forward. And in my original post I meant to link this comment:

https://github.com/jkjung-avt/tensorrt_demos/issues/43#issuecomment-575872093

which seems to support what we’d found in our initial research which is there are some examples of the UFF flow, but few examples for the new ONNX flow which again seems to be the only option in the future.

So for now, we’re happy we have a working system, so again thanks for the help there, but I am curious if there are examples for this new flow coming soon?

Thanks.

AastaLLL · March 17, 2020, 9:16am

Hi,

Thanks for your update.

We don’t have ssd-mobilenet-v2 onnx sample now.
But I think you can update the sample on your own.

First, you will need to convert the model into onnx format by keras2onnx.
After that, you can update the sample from nvuffparser into nvonnxparser.

The usage is pretty similar and here is the document for your reference:
https://docs.nvidia.com/deeplearning/sdk/tensorrt-api/c_api/classnvonnxparser_1_1_i_onnx_config.html

Thanks.

austin.anderson · March 17, 2020, 3:23pm

Awesome, yeah that should work, we’ll try it out, thanks so much for the great answers!

akulov.eugen · October 30, 2020, 2:39am

Hello, austin.anderson
So, did you resolve onnx2trt for ssd mobilenst v2?