TensorRT wrong output data (vs. TF/TFLite/TF-TRT)

prtpereira · March 26, 2020, 11:07am

Hi everyone! I’m currently working with machine learning inference on my Jetson Nano.
I trained a simple network with convolutional, maxpooling2d and dense layers and had exported it to a keras (.h5) file. The model input is an image 64x48 and has two output layers.

So, when I run inference with a black image ( np.zeros(64,48,3) ) on the following libraries/framework: Tensorflow, TFLite and TF-TRT, I get the same output data, e.g., approximately the values: -0,65 and 0,56 . The deal is when I run the very same input data on the same model, the predicted data isn’t the same of the previous engines, getting approximately the values -0,62 and 0,74. I converted the .h5 file to a trt_inference_graph (.pb), and then generated a .uff file from the .pb, running python3 (...)python3.6/dist-packages/uff/bin/convert_to_uff.py model.h5

I double checked the stage pipeline between the .h5 to the final .uff, and tried different implementations.
Method1: .h5 -- trt.create_inference_graph --> .pb --> convert_to_uff.py --> .uff
Method2: .h5 -- trt.create_inference_graph --> .pb --> import to tensorrt and run uff.from_from_tensorflow_frozen_model(...)
Both implementations gave the same wrong.

I also have transposed the input matrix and registered the input and output layers on the parser. A weird phenomenon is that if i run TensorRT inference without the input layer, it will gets an error. But if i don’t register one or two output layers or if I register them with any names at all, the output will be exactly the (wrong) same and won’t give any error.

Following, I give my source code, hoping I can get some feedback about it. Have I missed some flag or am I making some mistake while sending the input data to the host?

pip3 list:

pycuda 2019.1.2
texttensorflow-gpu 1.14.0+nv19.10
tensorrt 5.0.6.3
uff 0.5.5

2) Another doubt that I have is: if memory is shared between CPU and GPU, it is necessary to allocate the buffers on “device memory” (duplicating the data on the same memory). Also, it is necessary to copy the input data (`image`) to another buffer (the host_input)?

Hope I can get some help :)
Thank you!
Andre Pereira

AastaLLL · March 27, 2020, 3:46am

Hi,

1. Input and output(at least one) is essential for TensorRT.
For more detail, we need to check your implementation first.

2. TensorRT input/output tensor are all GPU buffers.
Device memory is required.

Let us check your implementation first and update more information with you later.
Thanks.

AastaLLL · March 27, 2020, 5:44am

Hi,

Could you also share your .pb file with us?
Thanks.

prtpereira · March 27, 2020, 2:28pm

Hi,

Sure, I will share my .pb files. The model I intend to run is the trt_CNN_noaug_all.pb.
But, instead, in the previous implementation, I used the trt_CNN_noaug_conv1.pb (a simpler version of the original NN used to debug).

The original .pb file exported by tensorflow:
https://github.com/prtpereira/Jetson_inference/blob/master/CNN_noaug_all.pb

Next is the file I use in my implementation. An optimized frozen graph created by converting the previous original .pb file with trt.create_inference_graph:

Repository with models:
https://github.com/prtpereira/Jetson_inference

Thank you for your time spent checking my problem!

prtpereira · March 27, 2020, 2:31pm

Thank you for the answers! :)

prtpereira · April 8, 2020, 9:59am

Hi,

The .pb file I intend to use is this Jetson_inference/CNN_noaug_all.pb at master · prtpereira/Jetson_inference · GitHub

Same model with optimization trt.create_inference_graph:

Hope I can get some feedback :)
Thank you!
AndreP

prtpereira · April 11, 2020, 1:48pm

Hi,

I’ve figured out the problem, it’s a bug on TensorRT5.
So I was working with Jetson Nano, testing several models and tweaking the model architecture and the results showed that all models that include Conv2D layers came with errors on the output data.

Then I did the full Nvidia-CUDA-TensorRT setup on my personal computer, and installed the version 7 of TensorRT. And for surprise the output data predicted was a match with, for example, Tensorflow, TF-TRT and also ArmNN inference tests.

TLDR: install TensorRT7, since TensorRT5 gives wrong output data when Conv2D layers are present.

Topic		Replies	Views
TensorRT Inference error on Jetson nano TensorRT	3	1195	December 6, 2021
TensorRT 5 and TensorRT 7 conversion discrepancy TensorRT tensorrt , tensorflow	4	510	September 23, 2020
TensorRT inference producing incorrect results TensorRT	0	750	July 22, 2019
TF-TRT Error on Jetson Nano TensorRT tensorrt , nano	2	2126	August 26, 2021
model accuracy penalty with tensorRT on jetson TX2 Jetson TX2	7	638	October 18, 2021
Different outputs between the output of the tensorflow and the output of ternsorrt. TensorRT	0	372	August 27, 2019
TensorRT inference result of one image don't keep the same in high qps TensorRT tensorrt	1	609	June 29, 2022
TensorRT results in reduced accuracy and performance TensorRT tensorrt	1	1509	July 30, 2020
TensorRT added layer before output TensorRT	3	791	February 13, 2020
TensorRT waiting after inference seemingly for no reason TensorRT tensorrt , cuda , performance , python	12	1581	October 20, 2022

TensorRT wrong output data (vs. TF/TFLite/TF-TRT)

pip3 list:

2) Another doubt that I have is: if memory is shared between CPU and GPU, it is necessary to allocate the buffers on “device memory” (duplicating the data on the same memory). Also, it is necessary to copy the input data (image) to another buffer (the host_input)?

Related topics

2) Another doubt that I have is: if memory is shared between CPU and GPU, it is necessary to allocate the buffers on “device memory” (duplicating the data on the same memory). Also, it is necessary to copy the input data (`image`) to another buffer (the host_input)?