Dont see any speedups using TensorRT


I tried benchmarking inference time on the Jetson TX2 using pure Tensorflow (TF) vs. Tensorflow converted to TensorRT (TRT) using the tensorflow.contrib.tensorrt library.

Here Here is the link to my public benchmarking gist. To benchmark, I have used the MobileNetv2 model, whose frozen TF graph can be downloaded from here.

The average classification time for a single 224x224 image over 100 trials was:
TF: 27ms
TRT: 31ms

My questions are:

  1. Why is there no TensorRT speedup (in fact a slight slowdown)?
  2. I know there is also the option to convert to TensorRT in pure C++ w/o/ python. Is this much faster than TensorRT in pyton? (I dont know C++ well, so would only want to pursue this if there were significant gains)


we are attempting to repro now. What version of jetpack are you running? The performance difference between python and C++ should be negligible.

Also, just so we have the complete repro, can you share the data/panda.jpg?

Thank you!

panda.jpg is from here:

I believe I have Jetpack 3.3, however, I am not sure how to actually check that…

I am using tensorflow-gpu==1.9.0+nv18.8


I have made some progress in the issue.

First, I tried another *.pb graph from here and had the exact same issue (TF=137ms, TRT=147ms).

I also found that in both cases, if I tried to run the TRT session before the TF session, there would be a bug during inference saying: ‘No attr named ‘identical_element_shapes’ in NodeDef’ (see below for full error stack). This is the same bug recently reported here, where you said there was a bug w/ TF-TRT which was recently fixed in new release. Do you know how I can access this fix?


NotFoundError                             Traceback (most recent call last)
~/Projects/SANATA/ in <module>
     73             outputs=output_names,
     74             max_batch_size=1,
---> 75             precision_mode='FP16',  # 'INT8'/'FP16'
     76     )

~/Projects/SANATA/.venv/lib/python3.5/site-packages/tensorflow/contrib/tensorrt/python/ in create_inference_graph(input_graph_def, outputs, max_batch_size, max_workspace_size_bytes, precision_mode, minimum_segment_size)
    113     # pylint: disable=protected-access
    114     raise _impl._make_specific_exception(None, None, ";".join(msg[1:]),
--> 115                                          int(msg[0]))
    116     # pylint: enable=protected-access
    117   output_graph_def = graph_pb2.GraphDef()

NotFoundError: No attr named 'identical_element_shapes' in NodeDef:
         [[Node: Preprocessor/map/TensorArray_2 = TensorArrayV3[clear_after_read=true, dtype=DT_INT32, dynamic_size=false, element_shape=<unknown>, tensor_array_name=""](Preprocessor/map/TensorArrayUnstack/strided_slice)]] for 'Preprocessor/map/TensorArray_2' (op: 'TensorArrayV3') with input shapes: [].


Performance would depend on how many ops were converted from TF to TRT. Recommend viewing the network on tensorboard and comparing the two .pb files (native TF and TF-TRT)

In case of the TF-TRT network, the portion of the graph being run on TRT will show up as a new node called my_trt_op_*. It looks like nothing is converted , hence no speedup.

You can try something like this to view the number of TRT ops in the converted TRT graph.

trt_engine_ops = len([1 for n in trt_graph.node if str(n.op)=='TRTEngineOp'])

I’m seeing trt_engine_ops = 0 in your case.

To execute on native TRT, we also have python APIs which can be used to load a TF network, convert to UFF and run natively on TRT.

Thank you @nves

Two followup questions:

  1. Do you know why nothing is being converted?
  2. Based on my previous comment here, it seems like there is a fundamental issue with the ‘No attr named ‘identical_element_shapes’ in NodeDef’ error, since if I try to convert to TRT w/o/ running the TF graph first, it gives me this error. Why do I get this error?



regarding #2, I can’t reproduce it in TRT4 or TRT5RC. the bug I referenced in that post should be available in TRT 5.RC.

I have tensorrt, and am seeing the bug. Do you mean the bug exists only in TRT 5.RC (my case proves it doesnt) or that the bug fix is available in TRT 5.RC?


the original case you referenced:, the fix is actually upstreamed to TensorFlow . It’s not a TRT bug.

Got it. Am I able to upgrade TensorFlow given that the install instructions for TF-TRT provided a specific version of Tensorflow directly from NVIDIA (which was v1.9)?

I feel that this issue went offtrack from the original posting and my issue was not resolved, so i reposted here:

FYI, I was able to fix the problem using the NVIDIA-IOT code base which makes very specific modifications to common networks to get them to run. See this answer: