TensorRT doesn't perform properly the Tensorflow concat and\or reshape commands

First, I tried to find if my issue was already discussed but I couldn’t find any specific topic that answered it.

If there is a topic like this please refer me.

My platform:
The Tensorflow pb (which was converted to the uff file) was generated under the following platform:
Linux distro and version - Linux-x86_64, Ubuntu, 16.04
GPU type – GTX-1070TI
nvidia driver version – 384.130
CUDA version – 8.0.44
CUDNN version – 6.0.21
Python version – 3.5.2
Tensorflow version – 1.4.1
TensorRT version – Not used

The TensorRT uff was generated under the following platform:
Linux distro and version - Linux-x86_64, Ubuntu, 16.04
GPU type - GeForce GTX 1080
nvidia driver version - 396.26
CUDA version - Release 9.0, V9.0.252
CUDNN version - 7.1.4
Python version – 3.5.2
Tensorflow version – 1.8
TensorRT version –

The TensorRT CUDA engine was executed under the following platform:
Jetson TX2 developer kit board:
Linux distro and version –
Ubuntu 16.04.5 LTS (Xenial Xersus)
L4T - #R28 (release), REVISION 2.1, GCID: 11272647, BOARD: t186ref, EABI: aarch64, DATE: Thu May 17 07:29:06 UTC 2018
GPU type - As part of the Jetson TX2 developer kit board
JetPack – 3.2.1 (But TensorRT and CUDNN were updated according to JetPack 3.3 versions)
nvidia driver version - As part of the JetPack
CUDA version - Release 9.0, V9.0.252
CUDNN version - 7.1.5
Python version – Not used
Tensorflow version – Not used
TensorRT version –

Problem description:
I’m using the TensorRT C++ APIs in order to inference a CNN model (Yolo3) that was developed and trained using Tensorflow and TensorRT Python APIs.
The model has three outputs based on three different sizes tiles devision of the inputs images.
The model was developed twice, first with NHWC format and second with NCHW format.
All model filters (layers) and tensors were updated according to the required format.
The used dataset is COCO.
For both formats, when the inference process is activated using only Tensorflow C++ (on Windows, TF was built by me from sources), APIs it works OK and all required detections are detected.
When NCHW format is used in order to be able to use the TensorRT on my Jetson only the first model output is OK and the two others outputs content are wrong.

The only difference between the first model output and the two others is a upsample command that was added in order to support the required tile size.

This is the upsample implementation:

class upsample(BaseOp):
    def upsample(self, factor):
        # upsampling using concat and reshape
        # TODO: enable custom factor
        with tf.name_scope('upsample'):
            x = self.inp.out

            if self.channelOrder == 'NHWC':
                x = tf.transpose(x, perm=[0, 3, 1, 2])  # BHWC -->BCHW

            size = x.get_shape().as_list()
            b = size[0]
            c = size[1]
            h = size[2]
            w = size[3]
            x = tf.reshape(x, [-1, c, h, w, 1])
            x = tf.concat([x, x], axis=3)
            x = tf.concat([x, x], axis=4)
            x = tf.reshape(x, (-1, c, h * factor, w * factor))

            if self.channelOrder == 'NHWC':
                x = tf.transpose(x, perm=[0, 2, 3, 1])
    return x

The suspicious commands are the reshape and\or concat.

Please advise.

ForNvidia.zip (9.93 KB)

did you find any solution to this problem? I also need to perform the reshape operation during forward pass and is stuck here.

Unfortunately No,
I’m still struggle to find the problem source but I’m more than sure that it is related to the reshape command.

I know that TensorRT 3 didn’t support this command at all and while I tried to convert the Tensorflow pb file to TensorRT uff file the parser ignored it (No failure or error was raised during the parsing process).
The following link say that:

Is it still true for TensorRT 4 and TensorRT 5?

I noticed that the following enum doesn’t support reshape layer (TensorRT 5):

enum class LayerType : int
    kCONVOLUTION = 0,      //!< Convolution layer.
    kFULLY_CONNECTED = 1,  //!< Fully connected layer.
    kACTIVATION = 2,       //!< Activation layer.
    kPOOLING = 3,          //!< Pooling layer.
    kLRN = 4,              //!< LRN layer.
    kSCALE = 5,            //!< Scale Layer.
    kSOFTMAX = 6,          //!< SoftMax layer.
    kDECONVOLUTION = 7,    //!< Deconvolution layer.
    kCONCATENATION = 8,    //!< Concatenation layer.
    kELEMENTWISE = 9,      //!< Elementwise layer.
    kPLUGIN = 10,          //!< Plugin layer.
    kRNN = 11,             //!< RNN Layer.
    kUNARY = 12,           //!< UnaryOp Operation Layer.
    kPADDING = 13,         //!< Padding Layer.
    kSHUFFLE = 14,         //!< Shuffle Layer.
    kREDUCE = 15,          //!< Reduce layer.
    kTOPK = 16,            //!< TopK Layer.
    kGATHER = 17,          //!< Gather Layer.
    kMATRIX_MULTIPLY = 18, //!< Matrix Multiply Layer.
    kRAGGED_SOFTMAX = 19,  //!< Ragged softmax Layer.
    kCONSTANT = 20,        //!< Constant Layer.
    kRNN_V2 = 21,          //!< RNNv2 layer.
    kIDENTITY = 22,        //!< Identity layer.
    kPLUGIN_V2 = 23        //!< PluginV2 Layer.

If yes,
Are there any alternatives to use?
Is it possible to replace during runtime (after the network creation and uff phrasing) the Tensorflow reshape layer with TensorRT layer or my own CUDA kernel?



TensorRT does support reshape layer. It is implemented using the kSHUFFLE layer type. If the pb to UFF file conversion successfully happens without any warnings/errors all layers should be supported. For any unsupported layers/operations plugins can be used.

If the final outputs still don’t match, can you provide the exact layer name where they start diverging with the native TF outputs?

Sorry for the delayed response.
I’m trying to create a small model which will reproduce the problem and then I will send you all relevant material with execution results.

We are still working (not continuously) to create the smallest model that reproduce the problem.
We added some probe (debug) outputs to the model in order to generate all of them using the TF pb file session running and TRT uff file context execution(that was generated from the pb file without any warnings\errors).

By using their comparisons results we will be able to tell you where the problem source exactly is:

Then I will send you the exact layer name where they start diverging with the native TF outputs.
For now we only proved that the diverging start when we call to the upsample function I added in my previous response.

In the mean time:
Regarding to the plugin option:
I couldn’t find an example, based on c++, how I can replace an existing layer (that was developed by a TF API and was part of the converted uff file) with a new plugin layer.

All examples that I could find show how to generate new model using TRT plugin layers, but what I’m looking for is how to replace a TF layer with TRT plugin layer.

Did I miss something?

Is there any example that demonstrate this replacement process?
Is there any other way to do that?

I succeed to learn the samplePlugin which implemented for the NvCaffeParser and adapt it to work with the NvUffParser but then I learned that the NvUffParser isPlugin and createPlugin methods get only operation name and not layer name which is a problem for me because my model contains more than one upsample block (which contains the reshape and concat tensorflow operations).

Can you offer please any way to handle this issue?
Is there a correct way to search all upsample layers and replace them with plugin layer?

When TensorRT NvUffParser will support layer name?


I created a small model which contains only the upsample logic which described above.
I added a probe debug output between these lines:

x = tf.reshape(x, [-1, c, h, w, 1])
x = tf.concat([x, x], axis=3)
x = tf.concat([x, x], axis=4)
x = tf.reshape(x, (-1, c, h * factor, w * factor))

The frozen graph model (pb file) was successfully converted to uff file without any errors or warnings.

The model was inferenced twice:

  • TF C++ path
  • TRT C++ path

The input file was verified that it the same for both inference paths.
Out1 was compared and found OK.
Out2 was compared and found wrong.

After analyzing the differences between both Out2 paths files, I found that the concatenation is done but wrongly.

Attached is ForNvidia.zip file which contains the following:

  1. upsample.py - Model code
  2. Up.pb - Generated pb file
  3. onlyUp.pbtxt - Generated pb text file
  4. outputNames - Model probes outputs names
  5. Up_TF_Nodes_List.txt - List of all pb nodes list that was identified during the TF C++ inference path
  6. Pb_To_Uff_Conversion_Nodes_List.txt - List of all nodes that was indentified during the conversion of pb to uff file
  7. nvUffParser_Report.txt - List of all layers that was identified during the TRT C++ path uff parsing operation
  8. buildCudaEngine_Report.txt - List of all optimization steps that were done by the TRT while building the CUDA engine
  9. TF Dir - Include all TF C++ probes outpus raw data
  10. TRT Dir - Include all TRT C++ probes outpus raw data
  11. Up.uff - uff file
  12. Up.bin - CUDA engine serialized file

Please analyze the attached material and help me to solve this issue.

Thanks a lot!

ForNvidia.zip (9.93 KB)

The problem was sloved.

It was figured out that the TensorRT Concatenation works properly in case the required axis is the Channels which means 1.

As I described before, the following commands purpose:

x = tf.reshape(x, [-1, c, h, w, 1])

x = tf.concat([x, x], axis=3)

x = tf.concat([x, x], axis=4)

x = tf.reshape(x, (-1, c, h * factor, w * factor))

was to replace the Tensorflow upsample command which isn’t supported by the TensorRT.

By implementing a TensorRT plugin which implemets the original Tensorflow upsample logic and replacing all above Tensorflow operations (using graphsurgeon tool) with it, I bypassed this error.