Incorrect Results during Inference using Tensorrt3.0 C++ uff parser

Hello there,

I am currently having problems getting proper results using my mobilenet.uff file. I edited the sampleUffMNIST code so as to perform inference using my own .uff file. Currently it seems that the engine is loaded properly but the output i get is wrong. My output is below.

5 eltCount
--- OUTPUT ---
[c] f = 1.000000
[c] f = 1.000000
[c] f = 1.000000
[c] f = 1.000000
[c] f = 1.000000
0 => 1   : ***
1 => 1   :
2 => 1   :
3 => 1   :
4 => 1   :

Average over 1 runs is 467.3683472 ms.

As you can see. The network is predicting “1” for all classes. I understand this could be due to input. I wrote the pixel values for my preprocessed image into a text file (I know this isn’t the best option but I wanted to see if I could get the code working completely first before I added preprocessing code in the c++ file as I am not fluent in c++). This is my code block for writing into the txt file.

with open("/raid/nri/Classification_task/Data_files_txt/weeds/Mobilenet_v1_1.0/Preprocessed_array3.txt",
                  "w") as outfile:
            for k in range(3):
                for i in range(224):
                    for j in range(224):
                        outfile.write("{:10.4f}".format(final_image[i][j][k])+" ")

And i am reading it like this to the input buffer

std::ifstream file (filename);
    while(file >> number) {

I defined input to my network as so:

parser->registerInput("input", DimsCHW(3, 224, 224));

The network was originally trained in NHWC format. Does this have any impact? Please any help or advice is appreciated and if more information is needed, please let me know. Thanks!

I had the similar issue. I couldn’t reproduce the result I got from python API


Thanks for your feedback.

In summary:
The result of python API is correct, but the result from c++ API is wrong.
Is this correct?

If yes, this is a significant issue and could you attach or share the model with us for further investigation.



I will attach my .uff file here. For my case, I did not use the python API, I have only used the .pb file to get good results before I converted it and tried using the C++ API. Do you suggest i get it working on the python API first? Thanks a lot for the help!


Thanks for your feedback.

Could you check if the result of python-based API is correct?
This check will help to figure out the error comes from TensorRT or preprocess.



Sorry it has taken me a while to get back to you. I have tried out the python API. The results I get from it are far different from what i get with my .pb file. They are not all 1’s as in the case of my C++ code but the results are still not good. Honestly I am not really sure what is causing these problems but i’m fairly certain it is not pre-processing. I will attach the .pb file I converted to .uff format. I will also attach my .uff file. This one is a bit different from the one I shared earlier because this time when using convert_to_uff, I used the -I parameter. Also I used my new .uff file with my C++ code but with no luck, it still only prints 1’s. Thanks a lot for any help that is given. Please let me know if you require more information.


Could you share the log when coverting the TensorFlow model into UFF format?
Please noticed that for a non-supported layer, uff parser may skip it automatically without assertion.



This is the command i run

convert-to-uff tensorflow -o /raid/nri/Classification_task/Exported_uff_files/log_mobilenet.uff --input-file /raid/nri/Classification_task/Optimized_graphs/Mobilenet_no_squeeze/optimized_mobilenet.pb -O  MobilenetV1/Predictions/Reshape_1 -I input,input,float32,3,224,224

And this is the log I get

Loading /raid/nri/Classification_task/Optimized_graphs/Mobilenet_no_squeeze/optimized_mobilenet.pb
Using output node MobilenetV1/Predictions/Reshape_1
Using input node input
Converting to UFF graph
No. nodes: 633
UFF Output written to /raid/nri/Classification_task/Exported_uff_files/log_mobilenet.uff


Thanks for your feedback.
We are checking this issue and will update information with you later.



Thanks a lot!


Thanks for sharing your model and results.
Due to lots of operations, could you help to cut off the model and narrow down which layers yield the difference?



Please could you elaborate? I’m not really sure how to do that. Maybe a simple example if possible.



First, you can get the TensorRT layers via generating the .pbtxt file:


Then, compare the output layer by layer.

context.enqueue(1, bindings, stream.handle, None)
cuda.memcpy_dtoh_async(h_layer1, d_layer1, stream)
# Compare result between TensorRT and TensorFlow here



Thanks for the response, I am currently trying this process out. I was wondering if it wasn’t too much trouble, if you could provide or point to an example of inference using the uff models and the c++ API on 3 channel images. This could help me a lot in fixing my c++ code because I was using the mnist uff example as a reference to writing my code which only has 1 channel. I believed I changed the code accordingly to fit 3 channel images but I could be wrong.



Following your suggestion, I am trying to get the output of each layer to compare but I am getting the following error when parsing my UFF model stream

Using output node MobilenetV1/Predictions/Reshape_1
Converting to UFF graph
No. nodes: 633
UFF Output written to /raid/nri/Classification_task/TensorRt_text_files/mobilenet_uff
UFF Text Output written to /raid/nri/Classification_task/TensorRt_text_files/mobilenet_uff.pbtxt
  File "/usr/local/lib/python3.4/dist-packages/tensorrt/utils/", line 186, in uff_to_trt_engine
[TensorRT] ERROR: Failed to parse UFF model stream
Traceback (most recent call last):
  File "/usr/local/lib/python3.4/dist-packages/tensorrt/utils/", line 186, in uff_to_trt_engine

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/dami/TensorRt_test/", line 41, in <module>
    engine = trt.utils.uff_to_trt_engine(G_LOGGER,uff_model,parser,1,1 << 20)
  File "/usr/local/lib/python3.4/dist-packages/tensorrt/utils/", line 194, in uff_to_trt_engine
    raise AssertionError('UFF parsing failed on line {} in statement {}'.format(line, text))
AssertionError: UFF parsing failed on line 186 in statement assert(parser_result)

I am using the tensorrt3.0 user guide to write this code. This is the line that gives the error

engine = trt.utils.uff_to_trt_engine(G_LOGGER,uff_model,parser,1,1 << 20)

This is how i am defining the parser

parser = uffparser.create_uff_parser()
parser.register_input("input", (3,224,224),0)

I also tried using the provided lenet5_mnist_frozen.pb but it gives the same error.If you are wondering how i managed to get the pythonAPI to work earlier without this approach, it was because i used “trt.lite.engine”. Thanks for any help provided!


Want to clarify first:

  • You meet the #15 error only on standard TensorRT.
  • TensorRT lite can work correctly.

Is this correct?

If yes, could you share the TensorFlow model definition with us? (Not .pb file)
It will help us narrow down the root cause.



Thanks for the response. I managed to find out what was causing the #15 error. Apologies, it was not clear to me that I should have read the .uff file as a binary before passing it into the tensorrt engine . I did that and the standard TensorRT started working. I was trying to compare the layers as you asked and I got a bit confused. I originally thought that you wanted me to compare the weights between TensorRT and Tensorflow at each layer until I noticed a difference between them. However when I do the following:

context.enqueue(1, bindings, stream.handle, None)
cuda.memcpy_dtoh_async(h_layer1, d_layer1, stream)
# Compare result between TensorRT and TensorFlow here

No matter what I change the layer name to, the weights are the same in TensorRT. Is that the expected result? I wrote the comparison code for the provided Lenet model in TensorRT3 and this is what happens as well. My code is below:

import tensorrt as trt
import pycuda.driver as cuda
from tensorrt.parsers import uffparser
from PIL import Image
import numpy as np
import tensorflow as tf

FLAGS = tf.flags.FLAGS
def isclose(a, b, rel_tol=1e-05, abs_tol=0.00003):
    return abs(a-b) <= max(rel_tol * max(abs(a), abs(b)), abs_tol)

def compare_arrays(array1,array2):
        return False
    for i in range(len(array1)):
    return status

def load_graph(model_file):
  graph = tf.Graph()
  graph_def = tf.GraphDef()

  with open(model_file, "rb") as f:
  with graph.as_default():

  return graph

def normalize(data):
#each image is provided as a 3D numpy array (like how it’s provided to inference function)

    array_holder = np.arange(784).reshape(1, 28, 28)
    array_holder = array_holder.astype(np.float32)
    for i in range(len(data)):  # normalize
        holder1 = data[i] / 255.0
        array_holder[i] = 1.0 - holder1

    return array_holder

# frozen_graph=open("/usr/local/TensorRT-3.0.1/data/mnist/lenet5_mnist_frozen.pb",'rb').read()

# frozen_graph = tf.graph_util.remove_training_nodes(frozen_graph)
# uff_model=uff.from_tensorflow(frozen_graph,output_filename="/raid/nri/Classification_task/TensorRt_text_files/lenet_uff",output_nodes=["out"],text=True)
G_LOGGER = trt.infer.ConsoleLogger(trt.infer.LogSeverity.ERROR)

parser = uffparser.create_uff_parser()
parser.register_input("in", (1,28,28),0)

engine = trt.utils.uff_to_trt_engine(G_LOGGER,uff_model,parser,1,1 << 20)
#host_mem = parser.hidden_plugin_memory()

output_layer = 'wc2'

# Load frozen model (TF)
graph = load_graph("/usr/local/TensorRT-3.0.1/data/mnist/lenet5_mnist_frozen.pb")

input_name = "import/" + "in"
output_name = "import/" + output_layer
input_operation = graph.get_operation_by_name(input_name)
output_operation = graph.get_operation_by_name(output_name)

def main(_):
    with tf.Session(graph=graph) as sess:
        for c in range(10):

            im ="/usr/local/TensorRT-3.0.1/data/mnist/" + str(c) + ".pgm")
            normalized_im_o = im_n.reshape(28, 28, 1)
            arr = tf.expand_dims(normalized_im_o, [0])
            arr_final = arr.eval()
            results_tf =[0], {input_operation.outputs[0]: arr_final})
            results = np.squeeze(results_tf)

            runtime = trt.infer.create_infer_runtime(G_LOGGER)
            context = engine.create_execution_context()
            output = np.empty(10, dtype=np.float32)  # allocate device memory
            d_input = cuda.mem_alloc(1 * im_n.size * im_n.dtype.itemsize)
            d_output = cuda.mem_alloc(1 * output.size * output.dtype.itemsize)

            bindings = [int(d_input), int(d_output)]

            stream = cuda.Stream()

            # transfer input data to device
            cuda.memcpy_htod_async(d_input, im_n, stream)
            # execute model
            context.enqueue(1, bindings, stream.handle, None)
            # transfer predictions back
            cuda.memcpy_dtoh_async(output, d_output, stream)
            # synchronize threads
            # print("Test Case: " + str(label))
            # print("Prediction: " + str(np.argmax(output)))
            print("Tensorflow "+str(results))
            print("TensorRT "+str(output))


if __name__ == '__main__':

“output” is always the same in TensorRT, no matter what layer name is used and always corresponds to the result at the final “output_node” of the Tensorflow model . I am not sure if that is correct or not. I chose the layer names based on the .pbtxt file i get from:


Please let me know if the weights in the .uff files are meant to be the same for most layers. If not can you tell me how to get the layer names that will give me appropriate weights?

Thanks as always


Please modify the name(ex.‘output’) to the layer name of your model.

If you have TensorFlow model code, layer name can be set by name parameter:

output = tf.nn.relu(output, <b>name='output'</b>)

If you only have the .pb file, you can get the layer name by printing graphdef:

with tf.Session() as sess:
    graphdef = tf.get_default_graph().as_graph_def()
    print graphdef



I think maybe there is some miscommunication. I can already get the weights from the .pb file. And when I do, their dimensions correspond to the graphdef. However for the .uff file, when i am trying to get the same layers weights, i get an array that is always the same size ([number_of_classes,1]) and same weights. This happens no matter what i change the output node to. For example,the final layer for the .pb and .uff file is called “out”. What confuses me is that when i change parser.register(“out”) to parser.register(“wc2”), the array i get from both is the same. I only get a different array when i try accessing something like a pooling layer then the program gives an error and crashes. Hope this is a little clearer.

Thanks as always

ignore comment #20. The question posted twice