Slow inference on jetson TX2 with tensorflow

I am getting started on deploying deep learning models to a Jetson TX2 flashed with JetPack3.3

I have been using the following resources:

so far, I have successfully trained models with DIGITS and deployed using the tutorial in the first repo. The performances are matching the benchmarks (<100ms for classification, <200ms for detection).

However, most of our DL research has been done in Keras and Tensorflow so far and it would be easier to use these frameworks and python directly.

The tf_trt repo looked promising but I am nowhere near the performances announced. My device has been flashed with JetPack3.3 but otherwise I followed the instructions in the repo (tf 1.8.0 etc).

The following code gives an inference time over 1 second (I literally run the classification notebook with inception_v1), so 3 orders of magnitude from the 7 ms announced.

from time import time
start = time()
output =, feed_dict={tf_input: image[None, ...]})
end = time()
print("inference time: {}s".format(end-start))


Not all TensorFlow operation is supported by TensorRT.
For those non-supported ops, TensorFlow-TRT fallback to using TensorFlow implementation directly.

To check the bottleneck comes from, it’s recommended to check which implementation(TF or TRT) is used of each layer.
Could you help to generate this information first?