Run TF-TRT graph through TF C++ API

Hello,
I want to use TF-TRT Python API to optimize the graph,
and then use TF C++ API for deployment on NVIDIA Xavier.
Is TF C++ API capable of running TF-TRT optimized graph?

Also, what is the preferred way of deploying TRT optimized model on Jetson?
According to item 8.1 of https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html
“Note: The UFF Parser which is used to parse a network in UFF format will be deprecated in a future release. The recommended method of importing TensorFlow models to TensorRT is using TensorFlow with TensorRT (TF-TRT).”
While item 7.2 says to use TRT C++ API with UFF as an intermediate format.
Thanks.

Hello,

You can convert the model to TRT using TF-TRT and serialize it to a .plan file. Then deserialize the .plan file using the C++ API.
See:
https://docs.nvidia.com/deeplearning/dgx/tf-trt-user-guide/index.html#tensorrt-plan
and
https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#serial_model_c

Thanks.

Hi,
Thanks for the links but still did not understand either.
Until Now, I had TF and TRT on my workstation with Tesla v-100
The workflow was:

  1. Build model and parse to UFF on workstation
uff.from_tensorflow()
  1. Convert to engine and the serialize on one xavier TRT C++ API
ParseUFF...
  ICudaEngine *engine = builder->buildCudaEngine(*network);
  IHostMemory *serializedEngine = engine->serialize(); 
  planFile.write((char *)serializedEngine->data(), serializedEngine->size()); 
  planFile.close();
  1. Deploy and deserialize on multiple Xavier TRT C++ API
IRuntime* runtime = createInferRuntime(gLogger);
ICudaEngine* engine = runtime->deserializeCudaEngine(modelData, modelSize, nullptr);

Now if I use the TF-TRT api on my workstation with

trt.create_inference_graph
  1. Is the model serialized?

  2. Can I deserialize on Xavier directly with TRT C++ API?
    you write :
    Note: Serialized engines are not portable across platforms or TensorRT versions. Engines are specific to the exact GPU model they were built on (in addition to platforms and the TensorRT version).

  3. If not can you please indicate the steps?
    a. how/where to build the engine?
    b. how/where to serailize?

I’m a little bit confused here …

Thanks

https://docs.nvidia.com/deeplearning/dgx/tf-trt-user-guide/index.html#tensorrt-plan
states that “This feature requires that your entire model converts to TensorRT”.
What about nets that don’t convert entirely, like SSD+Mobilenet?
Can I just load and run TF-TRT optimized graph through TF C++ API in a standard way (I know that I need to build TF C++ API from source on Xavier).

Can I just load and run TF-TRT optimized graph through TF C++ API in a standard way

Hi NVESJJ
Could you answer this question? Is there C++ sample code to run TF-TRT ?

Hello NVIDIA,

Would that be possible to have samples for C++ like requested above?

Also, when we do following builds of TensorFlow,

  • bazel build --config=tensorrt //tensorflow:libtensorflow.so
  • bazel build --config=tensorrt //tensorflow/tools/lib_package:libtensorflow

Are the generated C/C++ libraries have TensorRT support as well or it is only the pip package?

Thanks in advance!

We are working on a C++ sample for TF-TRT.
Please stay tuned.

It will be published soon at https://github.com/tensorflow/tensorrt

1 Like

Thanks for the update!

Hi,

want to ask the same question " Can I just load and run TF-TRT optimized graph through TF C++ API in a standard way ?" and any update on C++ sample for TF-TRT?

1 Like

Is this example already online?
Actually, I ask this because my model did not work with pure TensorRT 5 like on the jetson nano, and I wanted to try with tensorflow.

It has been 6 months since this post and there is still no C++ example in the TF-TRT repository. Is this ever going to happen?

2 Likes

Yeah I am starting to look into TF-TRT too, I was able to convert/build an object detection (od api2) model, but I cant see any clear c++ sample for infering it.