Using TF-TRT to convert MobileNet / SSDLite model gives errors


I am trying to convert a Tensorflow MobileNet graph (*.pb file) into TensorRT via the TF-TRT module. However I get an error

`NotFoundError: No attr named 'shape' in NodeDef:`


I am running the following:
Jetson TX2
Jetpack 3.3

Here is my code:

import tensorflow as tf
import tensorflow.contrib.tensorrt as trt
import numpy as np
import PIL
from timeit import default_timer as timer
from tqdm import tqdm

This script performs inference on a pure Tensorflow (TF) model vs a converted TensorRT model
In order to run it, it assumes you have a frozen TF graph in the form of a *.pb file.
The specific model is MobileNetv2 
Frozen graph can be downloaded from:

The image file 'panda.jpg' is downloaded from:

## File paths (needs to be set)
pb_path = "mobilenet_v2_1.0_224_frozen.pb"    # frozen TF graph
input_name = 'input'                                    # input layer of the graph
output_names = ['MobilenetV2/Predictions/Reshape_1']    # output layer of the graph. (Can be multiple)
img_path = 'data/panda.jpg'     # image to perform inference on (from

## Load image
img_raw = np.array(, 224))).astype(np.float) / 128 - 1
img = img_raw.reshape(1, 224,224, 3)

##### Run w/ TensorRT
trt_graph = trt.create_inference_graph(
        precision_mode='FP16',  # 'INT8'/'FP16'

tf_config = tf.ConfigProto()
tf_config.gpu_options.allow_growth = True
tf_sess = tf.Session(config=tf_config)
tf.import_graph_def(trt_graph, name='')
tf_input = tf_sess.graph.get_tensor_by_name(input_name + ':0')
tf_output = tf_sess.graph.get_tensor_by_name(output_names[0] + ':0')

output =, feed_dict={tf_input: img + np.random.random(img.shape)/10})

And here is the full error trace:

2018-11-09 16:43:48.943803: I tensorflow/stream_executor/cuda/] ARM64 does not support NUMA - returning NUMA node zero
2018-11-09 16:43:48.943953: I tensorflow/core/grappler/] Number of eligible GPUs (core count >= 8): 0
NotFoundError                             Traceback (most recent call last)
/home/nvidia/Projects/SANATA/ in <module>()
     71             outputs=output_names,
     72             max_batch_size=1,
---> 73             precision_mode='FP16',  # 'INT8'/'FP16'
     74     )

/home/nvidia/Projects/SANATA/.venv/lib/python3.5/site-packages/tensorflow/contrib/tensorrt/python/ in create_inference_graph(input_graph_def, outputs, max_batch_size, max_workspace_size_bytes, precision_mode, minimum_segment_size)
    113     # pylint: disable=protected-access
    114     raise _impl._make_specific_exception(None, None, ";".join(msg[1:]),
--> 115                                          int(msg[0]))
    116     # pylint: enable=protected-access
    117   output_graph_def = graph_pb2.GraphDef()

NotFoundError: No attr named 'shape' in NodeDef:
         [[Node: input = Placeholder[dtype=DT_FLOAT]()]] for 'input' (op: 'Placeholder') with input shapes:

Note, I get essentially the same error even when using the SSDLite graph from:

How can I fix this?



You can check our tutorial for converting ssd_mobilenet_v1_coco to TF-TRT model:

The workflow should like this:
Build TensorRT / Jetson compatible graph

from tf_trt_models.detection import build_detection_graph

frozen_graph, input_names, output_names = build_detection_graph(

Optimize with TensorRT

import tensorflow.contrib.tensorrt as trt

trt_graph = trt.create_inference_graph(
    max_workspace_size_bytes=1 << 25,


Thank you @aastall for the reference. That was exactly what I was looking for.

For the record, I tried comparing inference speed between the pure Tensorflow vs TF-TRT graphs on the MobileNetV1 and MobileNetV2 networks. I used a 640x480 image for both tests and ran

sudo ~/

prior to the tests.

Code for MobileNetV1 benchmark is here, and for MobileNetV2 benchmark is here.

The results were:

Pure tensorflow: 89ms
TF-TRT: 119ms
(~33% speedup)

Pure tensorflow: 203ms
TF-TRT: 203ms
(0% speedup)

I am not sure why MobileNetV2 is so inefficient here and sees no speedup from TF-TRT. Any help would be welcome.

For the record, this Caffe implmentation of MobileNetv2 took 86ms on the Jetson using NVCaffe (not TRT optimized).