TensorFlow object detection and image classification accelerated for NVIDIA Jetson

jkjung13 · September 26, 2018, 1:25am

I’ve created a complete tutorial about how to train your custom model (a hand detector) and deploy the model with ‘tf_trt_models’ onto JTX2. Refer to the following link for details.

[url]https://devtalk.nvidia.com/default/topic/1042106/jetson-tx2/how-to-train-a-custom-object-detector-and-deploy-it-onto-jtx2-with-tf-trt-tensorrt-optimized-/[/url]

jkjung13 · October 8, 2018, 6:44am

I tried to make ‘tf_trt_models’ to work for ‘faster_rcnn’ and ‘rfcn’ models. It was not straightforward (see reference below). I had to put it quite a few hacks to be able to build TF-TRT optimized graphs for those models. For example, I reduced the number of region proposals in the ‘faster_rcnn’/‘rfcn’ models from 300 to 32 (otherwise I’d be stuck with Out Of Memory issues on JTX2). I was finally able to get the models to work to a certain degree (not completely working yet…). And I put my latest code in my GitHub repository.

https://github.com/jkjung-avt/tf_trt_models

Then I measured and compared the following TF-TRT optimized models on JTX2 (in MAX-N mode).

ssd_mobilenet_v1_coco           (90 classes):   43.5 ms
ssd_inception_v2_coco           (90 classes):   45.9 ms
ssd_mobilenet_v1_egohands          (1 class):   24.5 ms
ssd_mobilenet_v2_egohands          (1 class):   28.7 ms
ssdlite_mobilenet_v2_egohands      (1 class):   28.9 ms
ssd_inception_v2_egohands          (1 class):   25.9 ms
rfcn_resnet101_egohands            (1 class):    351 ms
faster_rcnn_resnet50_egohands      (1 class):    226 ms
faster_rcnn_resnet101_egohands     (1 class):    317 ms
faster_rcnn_inception_v2_egohands  (1 class):    117 ms

Reference: https://github.com/NVIDIA-Jetson/tf_trt_models/issues/6#issuecomment-425857759

cesarw.segura · April 23, 2019, 10:29pm

Hello,

I am working in Jetson TX2 with Jetpack 3.3 which has a TensorFlow v1.9 and TensorRT v4, currently I am testing SSD Inception V2 model getting an average of 50 ms per frame with no tensorRT optimization for my frozen model. I am stuck in the optimization process as I want to further reduce the inference time with trt.create_inference_graph function.

trt_graph = trt.create_inference_graph(
input_graph_def=frozen_graph,
outputs=output_names,
max_batch_size=1,
max_workspace_size_bytes=1 << 25,
precision_mode=‘FP16’,
minimum_segment_size=50
)

I get the following error:
2019-04-23 21:31:36.655950: I tensorflow/core/grappler/devices.cc:51] Number of eligible GPUs (core count >= 8): 0
2019-04-23 21:31:41.939334: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:515] subgraph conversion error for subgraph_index:0 due to: “Invalid argument: Output node ‘FeatureExtractor/InceptionV2/InceptionV2/Mixed_3b/concat-4-LayoutOptimizer’ is weights not tensor” SKIPPING…( 844 nodes)

I saw this issue in one of the previous comments, was there any solution for this?
Thank you!!

jkjung13 · May 24, 2019, 6:27am

I have a solution to the “extremely long model loading time problem” of TF-TRT now. Please check out my blog post for details: [url]https://jkjung-avt.github.io/tf-trt-revisited/[/url].

varun365 · May 31, 2019, 11:09am

Hi jkjung13

I have flashed jetpack4.2 on my TX2 board.

I trained an object detection model using Feature pyramid network(model size is 242MB).

Whenever I try the inference code on TX2 - the process is getting killed just before starting the session for processing frame(it’s able to load the model though).

I tried limiting/allocating the full memory usage using tensorflow (version 1.13), didn’t work.

any solution for this ?

Thanks in advance.

jkjung13 · June 3, 2019, 2:45pm

@varun365 There is a very similar issue report on GitHub: [url]https://github.com/NVIDIA-AI-IOT/tf_trt_models/issues/6#issuecomment-498207648[/url]

The problem is clearly due to out-of-memory. I’m not able to solve it. Hopefully TF-TRT would be improved over time and such larger models would work on a future release of it.