Examples for porting from Tensorflow to TensorRT4 object detection inference

I’m quite lost in the TenosrRT docs, I hope this is the right forum for this question…

After reading the release details about how to take a frozen TF and use TensorRT to optimize it, the rest of the documentation doesn’t explicitly mention on the usage of the model compared to how it was used in TF.
This might be trivial to TensorRT users, but I’m missing on how to actually use the optimized model. Perhaps something like this example (https://github.com/tensorflow/models/blob/master/research/object_detection/object_detection_tutorial.ipynb) but with TensorRT?

Thanks in advance!

Thanks for the suggestion. Once you have optimized the network graph you get a graph that can be used to run inference.

Are you wondering how to run a network graph for inference in TensorFlow, or asking something else?

That’s exactly my question - once I have the optimized graph I just need to run it with TF? All the examples of tensorRT showed using it’s own API to do the inference so that’s why I was worried that simply using the resulting graph won’t work…

eyalu,

Great. So there are several approaches to running inference on a TensorFlow graph in TensorRT.

  1. Using python and a technology called UFF following instructions like the ones at the github repo you linked earlier: https://github.com/tensorflow/models/blob/master/research/object_detection/object_detection_tutorial.ipynb

The section under “detection” shows how to run the model for inference. There are further instructions for this technique at
https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#tensorflowworkflow and a sample at https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#mnist_uff_sample.

A common feature that all of these have is that you use the TensorFlow python environment to run the model but the model is fundamentally being run in TensorFlow. Once you’ve exported to UFF the only part of TensorFlow you are using is the python interpreter. This means that you can also use the same (or very similar) commands in a python interpreter that isn’t also running the TensorFlow module.

  1. Once you’ve used the tools above to get the model running in TensorRT you can also save engine to a file using
    trt.utils.write_engine_to_file(). That file can be re-read using just the C++ interface (no Python required). Basically you open the file using standard C/C++, deserialize the engine, and then call the enqueue() method. This can be done in a really light-weight standalone C++ application or, of course, integrated into a larger C++ application.

  2. All of the above techniques rely on TensorRT to perform all the inference. This works when all the model layers are supported in TensorRT. If you want to mix custom TensorFlow operators with TensorRT graph execution there is new technique available in TensorFlow 1.7

This gives you the best of both worlds. The flexibility of tensorflow and much of the performance of TensorRT. To learn more check out these two blogs

https://devblogs.nvidia.com/tensorrt-integration-speeds-tensorflow-inference/

https://developers.googleblog.com/2018/03/tensorrt-integration-with-tensorflow.html

With this approach running inference is just the same as running inference in TensorFlow … TensorFlow takes care of Running TRT on the appropriate sub-graph for you.

Hope this helps!

We created a new “Deep Learning Training and Inference” section in Devtalk to improve the experience for deep learning and accelerated computing, and HPC users:
https://devtalk.nvidia.com/default/board/301/deep-learning-training-and-inference-/

We are moving active deep learning threads to the new section.

URLs for topics will not change with the re-categorization. So your bookmarks and links will continue to work as earlier.

-Siddharth