I’m quite lost in the TenosrRT docs, I hope this is the right forum for this question…
After reading the release details about how to take a frozen TF and use TensorRT to optimize it, the rest of the documentation doesn’t explicitly mention on the usage of the model compared to how it was used in TF.
This might be trivial to TensorRT users, but I’m missing on how to actually use the optimized model. Perhaps something like this example (https://github.com/tensorflow/models/blob/master/research/object_detection/object_detection_tutorial.ipynb) but with TensorRT?
That’s exactly my question - once I have the optimized graph I just need to run it with TF? All the examples of tensorRT showed using it’s own API to do the inference so that’s why I was worried that simply using the resulting graph won’t work…
A common feature that all of these have is that you use the TensorFlow python environment to run the model but the model is fundamentally being run in TensorFlow. Once you’ve exported to UFF the only part of TensorFlow you are using is the python interpreter. This means that you can also use the same (or very similar) commands in a python interpreter that isn’t also running the TensorFlow module.
Once you’ve used the tools above to get the model running in TensorRT you can also save engine to a file using
trt.utils.write_engine_to_file(). That file can be re-read using just the C++ interface (no Python required). Basically you open the file using standard C/C++, deserialize the engine, and then call the enqueue() method. This can be done in a really light-weight standalone C++ application or, of course, integrated into a larger C++ application.
All of the above techniques rely on TensorRT to perform all the inference. This works when all the model layers are supported in TensorRT. If you want to mix custom TensorFlow operators with TensorRT graph execution there is new technique available in TensorFlow 1.7
This gives you the best of both worlds. The flexibility of tensorflow and much of the performance of TensorRT. To learn more check out these two blogs
With this approach running inference is just the same as running inference in TensorFlow … TensorFlow takes care of Running TRT on the appropriate sub-graph for you.