[Still not solved] How to convert a tensorflow model to a trt-Engine for AGX Pegasus?

In order to make this topic not too complicated, i try to not provide any specific source code or go too deep into implementation details.

What I try to accomplish:
I have a particular tensorflow implementation (written in python) and try to make it run without having tensorflow installed on AGX Pegasus/ as a standalone.

What i found out so far by reading the official tensorrt documentation(https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html):

There are basically two ways.

  1. Try to convert that tensorflow model/protobuf file into a uff or onnx file. Write a c++ program that imports the graph definition with a uff respective onnx parser and use a PluginFactory if any unsupported layer appear. Write an IPlugin class for that unsupported (custom) layer. Afterwards, serialize that optimized graph.
  2. Use TF-TRT. According to https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html (chapter 2.9) you have to use the create_inference_graph() method, that replaces automatically tensorflow nodes by TRTEngineOps. These optimized operations are then written to a plan file.

My understanding of the conversion mechanism:
I tried all of the approaches listed above, but none worked properly. I am still not 100% sure if I understood the general mechanism of the conversion.

  1. With uff-parser: As i see it, the python API checks during conversion to uff, whether a layer of the target model is in the list of supported layer. If this is the case, it marks it as a custom layer. The unsupported layer is renamed to _Op, where Op is the kind of operation, that is performed (e.g. _ExpandDims, _RandomUniform etc.). Finally, an uff file is created.

    That uff file is then parsed in a C++ program.
    During parsing of the respective uff file, that “new” layername (e.g. _ExpandDims) is then analyzed and a layer is created accordingly. Since in the source code there is no other information provided about the unsupported layer, it is impossible to tell where the operation takes place in the graph and what the desired output dimension is. So it might happen, that two Expand_dims operations are parsed, one with 3 dim. output and one with 4 dimensional output. The problem here is, that the Tensorrt IPlugin (that is the custom layer class) needs the output dimension to be defined manually. Since you write one custom layer for one kind of operation and have to define the output dimension, i see no way to implement it.

    How is the uff-parsing supposed to work? Why do we define one custom layer with manually defined output dimension for one kind of operation and not for one particular layer?

  2. With TF-TRT: Only supported subgraphs are converted to TRTEngineOps. To my understanding, a subgragh is a small part of the whole graph consisting of multiple nodes/operations. So if one operation in the subgraph is not supported, the whole subgraph is not converted. That is at least my understanding. Furthermore, there seem to be no way to implement custom layers in TF-TRT.

Question
Is my basic understanding about the conversion from tensorflow to a TRT engine right? Does a third option exist to make a tensorflow model run as a trt standalone? How to solve that problem with different output dimensions, I explained earlier?

@any_Nvidia_developer What is the right way to get that conversion done?

Thanks in advance.

Ok I think I found something on the problem with the output dimension. It is possible to access the filters/kernels applied to that specific layer, because they are passed as additional information to the constructor. So it must be possible to access the number of these kernels as well. Since on kernel creates one feature map/channel, the number of kernels is equivalent to the output channel© dimension.
( See the sample code https://github.com/NVIDIA/TensorRT/blob/release/5.1/plugin/gridAnchorPlugin/gridAnchorPlugin.cpp, getOutputDimension() )

I still do not know what to do, if the number of dimensions differ for one operation type, on different parts of the graph.