TensorRT3 results are different with Tensorflow (with a minimal example code)

Hi, Could you please check this?

The code below contains a very simple and basic network (2 conv + 1 fc) and its conversion to UFF. However, I am getting different results between Tensorflow and TensorRT. Where is wrong?

For information, unlike the provided MNIST example, in this use case, an input has 3 channels (just like usual color images). Should I transpose the input somehow (e.g., NCHW)?

from __future__ import print_function
import numpy as np
import tensorflow as tf
import tensorrt as trt
from tensorrt.parsers import uffparser
import uff
import pycuda.driver as cuda
import pycuda.autoinit
slim = tf.contrib.slim

# ---- define network
with tf.Graph().as_default():
    image_placeholder = tf.placeholder(tf.float32, [None, 100, 100, 3])
    net = slim.repeat(image_placeholder, 2, slim.conv2d, 64, [3, 3], scope='conv1')
    net = slim.flatten(net)
    net = slim.fully_connected(net, 5, scope='pred/label_fc1')
    init = tf.global_variables_initializer()

    sess = tf.Session()

    output_names = [net.op.name]     # which is "pred/label_fc1/Relu"
    graphdef = tf.get_default_graph().as_graph_def()
    frozen_graph = tf.graph_util.convert_variables_to_constants(sess, graphdef, output_names)
    frozen_graph = tf.graph_util.remove_training_nodes(frozen_graph)

# ---- model to uff
uff_model = uff.from_tensorflow(frozen_graph, output_names)
G_LOGGER = trt.infer.ConsoleLogger(trt.infer.LogSeverity.ERROR)
parser = uffparser.create_uff_parser()
parser.register_input("Placeholder", (3,100,100), 0)
for output_name in output_names:
    print('register output:', output_name)
engine = trt.utils.uff_to_trt_engine(G_LOGGER, uff_model, parser, 1, 1 << 20)

# ---- tensorflow inference
temp = np.random.rand(1, 100, 100, 3).astype(np.float32)  # random input
tf_results = sess.run(net, feed_dict={image_placeholder:temp})

# ---- tensorRT inference
runtime = trt.infer.create_infer_runtime(G_LOGGER)
context = engine.create_execution_context()
tr_result = np.empty(5, dtype=np.float32)

d_input = cuda.mem_alloc(1 * temp.size * temp.dtype.itemsize)
d_labels = cuda.mem_alloc(1 * tr_result.size * tr_result.dtype.itemsize)

bindings = [int(d_input), int(d_labels)]
stream = cuda.Stream()
cuda.memcpy_htod_async(d_input, temp, stream)
context.enqueue(1, bindings, stream.handle, None)
cuda.memcpy_dtoh_async(tr_result, d_labels, stream)

# ---- let's see
print("tensorflow result: ", tf_results[0])
print("tensorRT result: ", tr_result)


tensorflow result:  [0.         0.         0.         0.09752206 0.        ]
tensorRT result:  [0.         0.         0.00887266 0.11281744 0.        ]

I can confirm that running this test yields same different results on TensorrT4, CUDA9.0, cudnn7
I would like to know where is the problem.

By the way, thank you Paul for your example, it is much better that official guide.
One question: Why dimensions order doesn’t match? Lines 13, 31 and 39

I also had the same problem using a fairly standard network with input=images, I have trying every possible combination of the input to get correct results with no sucess.


It has to do with the numbers of filters used in conv2d and the flattening layers, I still have no solution for the probelm, but if you found it please contribute:

Here are two connected conversions:




I believe you should transpose the image to CHW format before feeding into a tensorrt engine. I have just successfully achieved consistent outputs between a tensorflow model (trained in NHWC) and its converted uff model (inference in NCWH) loaded by tensorrt’s C++ APIs. Seems like the conversion tool automatically takes care of the dimension changes so that you should assume everything is in CHW format in tensorrt, no matter what format you used when training the tensorflow model.