TensorRT added layer before output

I have been working around TensorRT conversions from TensorFlow frozen graphs.

And I wonder why this ILayer appears on the conversion from uff with the official parser:

conv1_bn/FusedBatchNorm_1 (32, 112, 111)
<b>conv1_relu/Relu6_HL_1804289383 (32, 112, 111)</b>
conv1_relu/Relu6 (32, 112, 111)

or this

reshape_1/Reshape (1, 1, 1024)
(Unnamed Layer* 226) [Shuffle] (1, 1, 1024)
conv_preds/convolution (1024, 1, 1)
conv_preds/BiasAdd (1000, 1, 1)
(Unnamed Layer* 234) [Shuffle] (1000, 1, 1)
reshape_2/Reshape (1, 1, 1000)
<b>act_softmax/Softmax_HL_846930886 (1000, 1, 1)</b>
act_softmax/Softmax (1000, 1, 1)

In both fragments the last line was marked as output for the parser and the tf-uff conversor and the bold one is automatically created.

Accuracy is fine if I execute the whole model, however if I “cut” the model this layers appears and breaks the results… What is the purpose of this layer? Is there a way to prevent the parser from generating it? I checked the file created be the tf-uff conversor (pbtxt) and that layer is not present there, so I assume is a matter of TensorRT optimization

Im using TensorRT 6 and TF 1.14



Hi Ignacio,

Can you put together a small repro package with (1) TF model, (2) uff-converter command, and (3) script/commands to parse UFF -> TensorRT?

Also the usual info just for completeness:

Provide details on the platforms you are using:
Linux distro and version
GPU type
Nvidia driver version
CUDA version
CUDNN version
Python version [if using python]
Tensorflow version
TensorRT version
If Jetson, OS, hw versions

Describe the problem


Include any logs, source, models (.uff, .pb, etc.) that would be helpful to diagnose the problem.

If relevant, please include the full traceback.


Please provide a minimal test case that reproduces your error.

Sorry for the late response.

Jetson nano image release 4.3
Nvidia driver version: Preinstalled driver
Tegra release: REVISION: 3.1, GCID: 18186506, BOARD: t210ref, EABI: aarch64, DATE: Tue Dec 10 06:58:34 UTC 2019
CUDA version: 10.0
CUDNN version: 7.6.3
Python version 3.6.9
Tensorflow version: 1.14
TensorRT version: 6.0.1

I have solved the “problem” that I had by changing the output of the cut model to the layer added on the uff-tensorRT parsing process. In short, the output of the original model at layer 20 is not the same as the output given by a model which only includes those 20 first layers as the UffParser is adding a new layer. Seems that in the sample case I provide the output operation is being performed twice, thus changing the weights of the pre-trained network.

After some tests I realized that the added layer was actually providing the original output of the layer. I provide you two .pb models with its text description and also the corresponding .uff model for each.

As I see it, there may be no problem as the accuracy of the inference on the complete model is only 1 or 2% bellow the original which I consider fair enough considering the performance gain.

I cannot upload the models to the post so you can find them here https://drive.switch.ch/index.php/s/3BY3X0dEF3m2JKb

As UFF-TensorRT parser I used your sample provided under /usr/src/tensorRT/samples/python/ with couple of modifications to create the functionality mentioned above changing the output of the “half” model to the tensort after the “usual” output:

with trt.Builder(logger) as builder, builder.create_network() as network, trt.UffParser() as parser:
    builder.max_workspace_size = self.GiB(1)
    parser.register_input("input_1", (3, 224, 224))
    parser.parse("./my_model.uff", network)
    network.mark_output(network.get_layer(network.num_layers - 2).get_output(0))
    engine = builder.build_cuda_engine(network)

    inputs, outputs, bindings, stream = allocate_buffers(engine)
    with engine.create_execution_context() as context:
        case_num = load_normalized_test_case(pagelocked_buffer=inputs[0].host)
        # For more information on performing inference, refer to the introductory samples.
        # The common.do_inference function will return a list of outputs - we only have one in this case.
        [output] = do_inference(context, bindings=bindings, inputs=inputs, outputs=outputs, stream=stream)
        pred = np.argmax(output)
        print("Prediction: " + str(pred))

As my problem is “solved” I send this information just in case there is some kind of bug. I understand for your answer that the creation of these layers just before the output is not something expected so I would like to understand why this happens.

If I can provide something else, do not hesitate to ask.

No worries, thanks for the detailed explanation, and I’m glad your issue was solved.

Just curious, I notice you call parser.register_input(input_name, input_shape), but not parser.register_output(output_name). Perhaps this could be the issue of the resulting mismatched output layer? There is an example of registering both inputs/outputs for a Resnet50 model in /usr/src/tensorRT/samples/python/introductory_parser_samples/uff_resnet50.py for reference.