TensorRT wrong output data (vs. TF/TFLite/TF-TRT)

Hi everyone! I’m currently working with machine learning inference on my Jetson Nano.
I trained a simple network with convolutional, maxpooling2d and dense layers and had exported it to a keras (.h5) file. The model input is an image 64x48 and has two output layers.

So, when I run inference with a black image ( np.zeros(64,48,3) ) on the following libraries/framework: Tensorflow, TFLite and TF-TRT, I get the same output data, e.g., approximately the values: -0,65 and 0,56 . The deal is when I run the very same input data on the same model, the predicted data isn’t the same of the previous engines, getting approximately the values -0,62 and 0,74. I converted the .h5 file to a trt_inference_graph (.pb), and then generated a .uff file from the .pb, running python3 (...)python3.6/dist-packages/uff/bin/convert_to_uff.py model.h5

I double checked the stage pipeline between the .h5 to the final .uff, and tried different implementations.
Method1: .h5 -- trt.create_inference_graph --> .pb --> convert_to_uff.py --> .uff
Method2: .h5 -- trt.create_inference_graph --> .pb --> import to tensorrt and run uff.from_from_tensorflow_frozen_model(...)
Both implementations gave the same wrong.

I also have transposed the input matrix and registered the input and output layers on the parser. A weird phenomenon is that if i run TensorRT inference without the input layer, it will gets an error. But if i don’t register one or two output layers or if I register them with any names at all, the output will be exactly the (wrong) same and won’t give any error.

Following, I give my source code, hoping I can get some feedback about it. Have I missed some flag or am I making some mistake while sending the input data to the host?

pip3 list:

pycuda 2019.1.2
texttensorflow-gpu 1.14.0+nv19.10
uff 0.5.5

2) Another doubt that I have is: if memory is shared between CPU and GPU, it is necessary to allocate the buffers on “device memory” (duplicating the data on the same memory). Also, it is necessary to copy the input data (image) to another buffer (the host_input)?

Hope I can get some help :)
Thank you!
Andre Pereira


1. Input and output(at least one) is essential for TensorRT.
For more detail, we need to check your implementation first.

2. TensorRT input/output tensor are all GPU buffers.
Device memory is required.

Let us check your implementation first and update more information with you later.

1 Like


Could you also share your .pb file with us?


Sure, I will share my .pb files. The model I intend to run is the trt_CNN_noaug_all.pb.
But, instead, in the previous implementation, I used the trt_CNN_noaug_conv1.pb (a simpler version of the original NN used to debug).

The original .pb file exported by tensorflow:

Next is the file I use in my implementation. An optimized frozen graph created by converting the previous original .pb file with trt.create_inference_graph:

Repository with models:

Thank you for your time spent checking my problem!

Thank you for the answers! :)


The .pb file I intend to use is this https://github.com/prtpereira/Jetson_inference/blob/master/CNN_noaug_all.pb

Same model with optimization trt.create_inference_graph:

Hope I can get some feedback :)
Thank you!


I’ve figured out the problem, it’s a bug on TensorRT5.
So I was working with Jetson Nano, testing several models and tweaking the model architecture and the results showed that all models that include Conv2D layers came with errors on the output data.

Then I did the full Nvidia-CUDA-TensorRT setup on my personal computer, and installed the version 7 of TensorRT. And for surprise the output data predicted was a match with, for example, Tensorflow, TF-TRT and also ArmNN inference tests.

TLDR: install TensorRT7, since TensorRT5 gives wrong output data when Conv2D layers are present.