TensorRT gives different output to original Tensorflow model (conv2d layer conversion)

Hi all,

I found similar topics in the forum but none was the solution for my problem, I already tried to reshape and transpose the input according to documentation and samples but the output of the model is different to the original one.


in the file attached you can find the model and the transformations in pb,uff, trt-engine as well as a pickle file containing some sample data, the jupyter notebook connects all that, including the code to execute tranfomrations and inference.


Problem Description:

Following a simple CNN model defined and traine via tf.keras, converted with trt.lite.Engine into a TRT model.

The model predicts the shape of objects either BOX or CYLINDER (one hot encoded)

input is a depth image, therefore greyscale, only one channel HWC(299,299,1), in TRT it is CHW

The trt predictions are different from the TF which are considered as ground truth

convert-to-uff output:

(venv) ml@mltest2:~/tensorflow/venv/data/jupyter$ convert-to-uff tensorflow --input-file saved_model_small_freeze.pb -l
/home/ml/tensorflow/venv/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)
Loading saved_model_small_freeze.pb
Automatically deduced output nodes: dense/Softmax
1 Placeholder: “conv2d_input”
2 Const: “conv2d/kernel”
3 Const: “conv2d/bias”
4 Conv2D: “conv2d/Conv2D”
5 BiasAdd: “conv2d/BiasAdd”
6 Relu: “conv2d/Relu”
7 MaxPool: “max_pooling2d/MaxPool”
8 Const: “conv2d_1/kernel”
9 Const: “conv2d_1/bias”
10 Conv2D: “conv2d_1/Conv2D”
11 BiasAdd: “conv2d_1/BiasAdd”
12 Relu: “conv2d_1/Relu”
13 MaxPool: “max_pooling2d_1/MaxPool”
14 Const: “conv2d_2/kernel”
15 Const: “conv2d_2/bias”
16 Conv2D: “conv2d_2/Conv2D”
17 BiasAdd: “conv2d_2/BiasAdd”
18 Relu: “conv2d_2/Relu”
19 MaxPool: “max_pooling2d_2/MaxPool”
20 Shape: “flatten/Shape”
21 Const: “flatten/strided_slice/stack”
22 Const: “flatten/strided_slice/stack_1”
23 Const: “flatten/strided_slice/stack_2”
24 StridedSlice: “flatten/strided_slice”
25 Const: “flatten/Reshape/shape/1”
26 Pack: “flatten/Reshape/shape”
27 Reshape: “flatten/Reshape”
28 Const: “dense/kernel”
29 Const: “dense/bias”
30 MatMul: “dense/MatMul”
31 BiasAdd: “dense/BiasAdd”
32 Softmax: “dense/Softmax”

Keras Model:

Layer (type) Output Shape Param #

conv2d (Conv2D) (None, 295, 295, 32) 832

max_pooling2d (MaxPooling2D) (None, 97, 97, 32) 0

conv2d_1 (Conv2D) (None, 93, 93, 64) 51264

max_pooling2d_1 (MaxPooling2 (None, 46, 46, 64) 0

conv2d_2 (Conv2D) (None, 44, 44, 64) 36928

max_pooling2d_2 (MaxPooling2 (None, 22, 22, 64) 0

flatten (Flatten) (None, 30976) 0

dense (Dense) (None, 2) 61954

Total params: 150,978
Trainable params: 150,978
Non-trainable params: 0

Here the results of the attached pickle samples, as you the predictions are different in TRT and TF

0-4 are real samples (TF is considered ground truth) 5 white image 6 black image 7 grey image

# run test
for i in test_samples:
    result_trt = engine_loaded.infer(i.reshape(1,299,299))
    print("TRT: %s" % np.asarray(result_trt).reshape(2))
    result_tf = model_loaded.predict(i.reshape((1,299,299,1)))
    print("TF : %s" % result_tf.reshape(2))
    print(np.argmax(result_trt), np.argmax(result_tf))

TRT: [0.9007671 0.09923294]
TF : [0.99828416 0.00171584]
0 0

TRT: [0.04684561 0.9531544 ]
TF : [1.000000e+00 3.279911e-08]
1 0

TRT: [0.27094778 0.7290522 ]
TF : [9.9988842e-01 1.1151263e-04]
1 0

TRT: [0.4547093 0.5452907]
TF : [9.9962366e-01 3.7629422e-04]
1 0

TRT: [0.11016078 0.88983923]
TF : [0.9982955 0.00170457]
1 0

TRT: [0.49179485 0.50820524]
TF : [0.4918013 0.5081987]
1 1

TRT: [0.49184766 0.50815237]
TF : [0.49184763 0.50815237]
1 1

TRT: [0.49179485 0.50820524]
TF : [0.4918013 0.5081987]
1 1

command for TRT lite engine transformation

engine_lite = trt.lite.Engine(framework="tf",path="saved_model_small_freeze.pb",max_batch_size=1,input_nodes={"conv2d_input":(1,299,299)},output_nodes=["dense/Softmax"])

The prediction in tensorflow via loading the frozen graph and evaluate via session.run is still correct.

The ‘non-lite’ way to create the engine also gives the same wrong results.


Tensorflow 1.9
Cuda 9.2
Ubuntu 16.0.4


Try to use tf.reshape instead of Keras’ flatten layer.

Thanks for your answer, I checked now the output for different layer configurations. The flatten layer is no issue in my opinon. The difference on the result is caused by the convolutional layer(s).

I tried different setups, and also removed the convolution, without that the output was the same.
Within the convolution, when the “filters” parameter is set to 0 the output is still the same, when “filters” is > 1 there is an error.


Of course there is still the possibility that the converted flatten layer is not working with more than one filter. But I have no idea how it hsould be possible to replace the flatten layer with a reshape tensor.

Any ideas about that?
Is it worth to try to replace the flatten layer, in case yes how, do you have to replace the node in via graphdef conversion and import it back?


Hi dhingratul,

thank you, I just tried the same in tf.keras style:


Which seems to be a workaround.

@NVIDIA, could you still please look into this issue, many saved model already have flattening layers, TRT even has its own, so this is something that seems to be a Bug

Confirming the issue.

My task was to run Xception model from Keras-applications on TensorRT. Confirmed that original H5 model in Keras and exported after graph.remove_training_nodes() TensorFlow PB giving the same result on our test set (with some minor differences in confidence). But when it comes down to TRT - the result is nowhere around.

I did some checks, defining each next layer as output and comparing the tensors produced. It narrowed to MaxPool - it produced round zeros for the places where TF implementation outputs negative numbers.

Please advise,

For me the mismatch was at Conv2d layer, see if sorted(conv from TRT) == sorted(conv from TF), if so then it is the issue of CONV implementation from Keras.

Hi dhingratul,

Not sure I got your idea. So far - before converting the model to TensorRT I got a correct result from both Keras directly or TensorFlow using “frozen” and exported model. I.e. - top 5 categories were the same and were consistent with a training set (i.e. model classified the image correctly). Corresponding confidence values were almost equal as well - so Keras implementation of the used layers is not so bad and the issue seems to belong to TRT compiler or a runtime…
This is a fragment of Xception design (from https://github.com/keras-team/keras-applications/blob/master/keras_applications/xception.py)

x = layers.SeparableConv2D(256, (3, 3),
    x = layers.BatchNormalization(name='block3_sepconv2_bn')(x)

    x = layers.MaxPooling2D((3, 3), strides=(2, 2),

When I am inspected the tensors layer-by-layer - MaxPooling/block3_pool was the first one where TRT output was significantly different from TF one.

For me the TF_CONV and TRT_CONV looked different, but actually they had the same values but the output structures were different. For me TF_CONV != TRT_CONV, but set(TF_CONV) == set(TRT_CONV), just make sure this isn’t the case. Make sure you compare the actual values, not the class labels.

I compared flattened tensors (it worked for a first layers so just continued).

So far - found a suspicious “Unnamed layer[Padding])” which UFF-to-plan compiler added just before MaxPool:

--------------- Timing block3_sepconv2/separable_conv2d(1)
Tactic 0 time 3.14368
Tactic 1 time 2.09795
Tactic 2 time 4.24944
--------------- Chose 14 (1363534230700867617)
--------------- Timing (Unnamed Layer* 25) [Padding](18)
Tactic 0 is the only option, timing skipped[/b]

--------------- Timing block3_pool/MaxPool(8)

Is there any way to get its result?