TRT optimize graph not faster than unoptimized (nvidia/tensorrt:19.01-py3 image)

I am using the nvcr.io/nvidia/tensorrt:19.01-py3 image to optimze tensorflow models with tensorrt.

I cloned the NVIDIA-AI-IOT/tf_trt_models repo (GitHub - NVIDIA-AI-IOT/tf_trt_models: TensorFlow models accelerated with NVIDIA TensorRT) to verify, if tensorrt works properly, by running the provided jupyter notebooks.

Initially, I got an “INFO: Tensorflow running against tensorrt version 0.0.0” when running trt.create_inference_graph() and nothing was optimized.

I applied the steps suggested under https://devtalk.nvidia.com/default/topic/1047057/tensorrt/graph-conversion-to-fp16-not-working/ and now the optimization is running.

HOWEVER: if I am now infering over the trt optimized graph (no matter if I optimize with FP32, FP16 or INT8), it is always excactly just as fast as the unoptimized .pb graph.
What am I missing here?

using:
nvcr.io/nvidia/tensorrt:19.01-py3 image (started with nvidia-docker run …)
running tf_trt_models/classification.ipynb at master · NVIDIA-AI-IOT/tf_trt_models · GitHub
Graphics card: Volta V100 16GB

Hello, can you share what your un-optimized .pb inference code looks like?

I’m running nvcr.io/nvidia/tensorflow:19.01-py3 container, without seeing “INFO: Tensorflow running against tensorrt version 0.0.0”

root@bc610afb4f25:/mnt/tf_trt_models/examples/classification# python
Python 3.5.2 (default, Nov 12 2018, 13:43:14)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from PIL import Image
import urllib
import tensorflow as tf
import tensorflow.contrib.tensorrt as trt
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
import numpy as np
from tf_trt_models.classification import download_classification_checkpoint, build_classification_graph>>> import sys
>>> import os
>>> import urllib
>>> import tensorflow as tf
>>> import tensorflow.contrib.tensorrt as trt

>>> import matplotlib
>>> matplotlib.use('Agg')
>>> import matplotlib.pyplot as plt
>>> import numpy as np
>>> from tf_trt_models.classification import download_classification_checkpoint, build_classification_graph
>>>
>>> MODEL = 'inception_v1'
>>> CHECKPOINT_PATH = 'inception_v1.ckpt'
>>> NUM_CLASSES = 1001
>>> LABELS_PATH = './data/imagenet_labels_%d.txt' % NUM_CLASSES
>>> IMAGE_PATH = './data/dog-yawning.jpg'
>>> checkpoint_path = download_classification_checkpoint(MODEL, 'data')
--2019-02-25 19:37:15--  http://download.tensorflow.org/models/inception_v1_2016_08_28.tar.gz
Resolving download.tensorflow.org (download.tensorflow.org)... 172.217.6.48, 2607:f8b0:4005:809::2010
Connecting to download.tensorflow.org (download.tensorflow.org)|172.217.6.48|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 24642554 (24M) [application/x-tar]
Saving to: ‘data/inception_v1_2016_08_28.tar.gz’

data/inception_v1_2016_08_28.tar.gz                100%[==============================================================================================================>]  23.50M  8.11MB/s    in 2.9s

2019-02-25 19:37:18 (8.11 MB/s) - ‘data/inception_v1_2016_08_28.tar.gz’ saved [24642554/24642554]

tar: inception_v1.ckpt: Cannot change ownership to uid 77690, gid 5000: Permission denied
tar: Exiting with failure status due to previous errors
>>> frozen_graph, input_names, output_names = build_classification_graph(
...     model=MODEL,
...     checkpoint=checkpoint_path,
...     num_classes=NUM_CLASSES
... )
2019-02-25 19:37:23.542732: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: Tesla V100-SXM2-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0000:06:00.0
totalMemory: 31.74GiB freeMemory: 31.33GiB
2019-02-25 19:37:23.922839: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 1 with properties:
name: Tesla V100-SXM2-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0000:07:00.0
totalMemory: 31.74GiB freeMemory: 31.33GiB
2019-02-25 19:37:24.304349: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 2 with properties:
name: Tesla V100-SXM2-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0000:0a:00.0
totalMemory: 31.74GiB freeMemory: 31.33GiB
2019-02-25 19:37:24.695042: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 3 with properties:
name: Tesla V100-SXM2-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0000:0b:00.0
totalMemory: 31.74GiB freeMemory: 31.33GiB
2019-02-25 19:37:25.106197: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 4 with properties:
name: Tesla V100-SXM2-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0000:85:00.0
totalMemory: 31.74GiB freeMemory: 31.33GiB
2019-02-25 19:37:25.533935: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 5 with properties:
name: Tesla V100-SXM2-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0000:86:00.0
totalMemory: 31.74GiB freeMemory: 31.33GiB
2019-02-25 19:37:25.973775: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 6 with properties:
name: Tesla V100-SXM2-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0000:89:00.0
totalMemory: 31.74GiB freeMemory: 31.33GiB
2019-02-25 19:37:26.425154: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 7 with properties:
name: Tesla V100-SXM2-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0000:8a:00.0
totalMemory: 31.74GiB freeMemory: 31.33GiB
2019-02-25 19:37:26.425491: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0, 1, 2, 3, 4, 5, 6, 7
2019-02-25 19:37:30.383785: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-02-25 19:37:30.383835: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 1 2 3 4 5 6 7
2019-02-25 19:37:30.383845: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N Y Y Y Y N N N
2019-02-25 19:37:30.383851: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 1:   Y N Y Y N Y N N
2019-02-25 19:37:30.383858: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 2:   Y Y N Y N N Y N
2019-02-25 19:37:30.383865: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 3:   Y Y Y N N N N Y
2019-02-25 19:37:30.383890: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 4:   Y N N N N Y Y Y
2019-02-25 19:37:30.383914: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 5:   N Y N N Y N Y Y
2019-02-25 19:37:30.383920: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 6:   N N Y N Y Y N Y
2019-02-25 19:37:30.383926: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 7:   N N N Y Y Y Y N
2019-02-25 19:37:30.388181: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 30366 MB memory) -> physical GPU (device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:06:00.0, compute capability: 7.0)
2019-02-25 19:37:30.388917: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 30366 MB memory) -> physical GPU (device: 1, name: Tesla V100-SXM2-32GB, pci bus id: 0000:07:00.0, compute capability: 7.0)
2019-02-25 19:37:30.389631: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 30366 MB memory) -> physical GPU (device: 2, name: Tesla V100-SXM2-32GB, pci bus id: 0000:0a:00.0, compute capability: 7.0)
2019-02-25 19:37:30.390266: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 30366 MB memory) -> physical GPU (device: 3, name: Tesla V100-SXM2-32GB, pci bus id: 0000:0b:00.0, compute capability: 7.0)
2019-02-25 19:37:30.390903: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:4 with 30366 MB memory) -> physical GPU (device: 4, name: Tesla V100-SXM2-32GB, pci bus id: 0000:85:00.0, compute capability: 7.0)
2019-02-25 19:37:30.391487: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:5 with 30366 MB memory) -> physical GPU (device: 5, name: Tesla V100-SXM2-32GB, pci bus id: 0000:86:00.0, compute capability: 7.0)
2019-02-25 19:37:30.392121: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:6 with 30366 MB memory) -> physical GPU (device: 6, name: Tesla V100-SXM2-32GB, pci bus id: 0000:89:00.0, compute capability: 7.0)
2019-02-25 19:37:30.392594: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:7 with 30366 MB memory) -> physical GPU (device: 7, name: Tesla V100-SXM2-32GB, pci bus id: 0000:8a:00.0, compute capability: 7.0)
INFO:tensorflow:Restoring parameters from data/inception_v1/inception_v1.ckpt
INFO:tensorflow:Froze 230 variables.
INFO:tensorflow:Converted 230 variables to const ops.
>>> trt_graph = trt.create_inference_graph(
...     input_graph_def=frozen_graph,
...     outputs=output_names,
...     max_batch_size=1,
...     max_workspace_size_bytes=1 << 25,
...     precision_mode='FP16',
...     minimum_segment_size=50
... )
INFO:tensorflow:Running against TensorRT version 5.0.2
2019-02-25 19:37:53.537870: I tensorflow/core/grappler/devices.cc:51] Number of eligible GPUs (core count >= 8): 8
2019-02-25 19:37:53.539717: I tensorflow/core/grappler/clusters/single_machine.cc:359] Starting new session
2019-02-25 19:37:53.540339: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0, 1, 2, 3, 4, 5, 6, 7
2019-02-25 19:37:53.540809: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-02-25 19:37:53.540825: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 1 2 3 4 5 6 7
2019-02-25 19:37:53.540856: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N Y Y Y Y N N N
2019-02-25 19:37:53.540864: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 1:   Y N Y Y N Y N N
2019-02-25 19:37:53.540872: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 2:   Y Y N Y N N Y N
2019-02-25 19:37:53.540880: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 3:   Y Y Y N N N N Y
2019-02-25 19:37:53.540888: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 4:   Y N N N N Y Y Y
2019-02-25 19:37:53.540896: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 5:   N Y N N Y N Y Y
2019-02-25 19:37:53.540903: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 6:   N N Y N Y Y N Y
2019-02-25 19:37:53.540911: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 7:   N N N Y Y Y Y N
2019-02-25 19:37:53.545772: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 30366 MB memory) -> physical GPU (device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:06:00.0, compute capability: 7.0)
2019-02-25 19:37:53.546305: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 30366 MB memory) -> physical GPU (device: 1, name: Tesla V100-SXM2-32GB, pci bus id: 0000:07:00.0, compute capability: 7.0)
2019-02-25 19:37:53.546688: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 30366 MB memory) -> physical GPU (device: 2, name: Tesla V100-SXM2-32GB, pci bus id: 0000:0a:00.0, compute capability: 7.0)
2019-02-25 19:37:53.547139: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 30366 MB memory) -> physical GPU (device: 3, name: Tesla V100-SXM2-32GB, pci bus id: 0000:0b:00.0, compute capability: 7.0)
2019-02-25 19:37:53.547729: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:4 with 30366 MB memory) -> physical GPU (device: 4, name: Tesla V100-SXM2-32GB, pci bus id: 0000:85:00.0, compute capability: 7.0)
2019-02-25 19:37:53.548033: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:5 with 30366 MB memory) -> physical GPU (device: 5, name: Tesla V100-SXM2-32GB, pci bus id: 0000:86:00.0, compute capability: 7.0)
2019-02-25 19:37:53.548173: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:6 with 30366 MB memory) -> physical GPU (device: 6, name: Tesla V100-SXM2-32GB, pci bus id: 0000:89:00.0, compute capability: 7.0)
2019-02-25 19:37:53.548329: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:7 with 30366 MB memory) -> physical GPU (device: 7, name: Tesla V100-SXM2-32GB, pci bus id: 0000:8a:00.0, compute capability: 7.0)
2019-02-25 19:37:53.868189: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3058] Segment @scope 'InceptionV1/', converted to graph
2019-02-25 19:37:53.868257: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:433] Can't find a device placement for the op!
2019-02-25 19:38:22.509291: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:967] Engine InceptionV1/my_trt_op_0 creation for segment 0, composed of 493 nodes succeeded.
2019-02-25 19:38:22.886387: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:185] TensorRTOptimizer is probably called on funcdef! This optimizer must *NOT* be called on function objects.
2019-02-25 19:38:22.979403: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:185] TensorRTOptimizer is probably called on funcdef! This optimizer must *NOT* be called on function objects.
2019-02-25 19:38:23.057046: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:501] Optimization results for grappler item: tf_graph
2019-02-25 19:38:23.057120: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   constant folding: Graph size after: 502 nodes (-231), 528 edges (-230), time = 143.298ms.
2019-02-25 19:38:23.057129: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   layout: Graph size after: 515 nodes (13), 532 edges (4), time = 46.311ms.
2019-02-25 19:38:23.057136: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   TensorRTOptimizer: Graph size after: 23 nodes (-492), 13 edges (-519), time = 28741.6797ms.
2019-02-25 19:38:23.057159: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   constant folding: Graph size after: 14 nodes (-9), 13 edges (0), time = 52.414ms.
2019-02-25 19:38:23.057183: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   TensorRTOptimizer: Graph size after: 14 nodes (0), 13 edges (0), time = 80.394ms.
2019-02-25 19:38:23.057189: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:501] Optimization results for grappler item: InceptionV1/my_trt_op_0_native_segment
2019-02-25 19:38:23.057195: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   constant folding: Graph size after: 494 nodes (0), 520 edges (0), time = 94.843ms.
2019-02-25 19:38:23.057201: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   layout: Invalid argument: The graph is already optimized by layout optimizer.
2019-02-25 19:38:23.057207: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   TensorRTOptimizer: Graph size after: 494 nodes (0), 520 edges (0), time = 14.879ms.
2019-02-25 19:38:23.057230: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   constant folding: Graph size after: 494 nodes (0), 520 edges (0), time = 78.071ms.
2019-02-25 19:38:23.057236: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   TensorRTOptimizer: Graph size after: 494 nodes (0), 520 edges (0), time = 14.959ms.
>>> tf_config = tf.ConfigProto()
tf_config.gpu_options.allow_growth = True
>>> tf_config.gpu_options.allow_growth = True
>>>
>>> tf_sess = tf.Session(config=tf_config)
2019-02-25 19:39:01.922633: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0, 1, 2, 3, 4, 5, 6, 7
2019-02-25 19:39:01.923168: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-02-25 19:39:01.923196: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 1 2 3 4 5 6 7
2019-02-25 19:39:01.923208: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N N Y Y Y N N N
2019-02-25 19:39:01.923219: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 1:   Y N Y Y N Y N N
2019-02-25 19:39:01.923230: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 2:   Y Y N Y N N Y N
2019-02-25 19:39:01.923240: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 3:   Y Y Y N N N N Y
2019-02-25 19:39:01.923250: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 4:   Y N N N N Y Y Y
2019-02-25 19:39:01.923259: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 5:   N Y N N Y N Y Y
2019-02-25 19:39:01.923269: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 6:   N N Y N Y Y N Y
2019-02-25 19:39:01.923280: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 7:   N N N Y Y Y Y N
2019-02-25 19:39:01.927397: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 30366 MB memory) -> physical GPU (device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:06:00.0, compute capability: 7.0)
2019-02-25 19:39:01.928051: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 30366 MB memory) -> physical GPU (device: 1, name: Tesla V100-SXM2-32GB, pci bus id: 0000:07:00.0, compute capability: 7.0)
2019-02-25 19:39:01.928602: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 30366 MB memory) -> physical GPU (device: 2, name: Tesla V100-SXM2-32GB, pci bus id: 0000:0a:00.0, compute capability: 7.0)
2019-02-25 19:39:01.929195: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 30366 MB memory) -> physical GPU (device: 3, name: Tesla V100-SXM2-32GB, pci bus id: 0000:0b:00.0, compute capability: 7.0)
2019-02-25 19:39:01.929759: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:4 with 30366 MB memory) -> physical GPU (device: 4, name: Tesla V100-SXM2-32GB, pci bus id: 0000:85:00.0, compute capability: 7.0)
2019-02-25 19:39:01.930484: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:5 with 30366 MB memory) -> physical GPU (device: 5, name: Tesla V100-SXM2-32GB, pci bus id: 0000:86:00.0, compute capability: 7.0)
2019-02-25 19:39:01.930979: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:6 with 30366 MB memory) -> physical GPU (device: 6, name: Tesla V100-SXM2-32GB, pci bus id: 0000:89:00.0, compute capability: 7.0)
2019-02-25 19:39:01.931506: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:7 with 30366 MB memory) -> physical GPU (device: 7, name: Tesla V100-SXM2-32GB, pci bus id: 0000:8a:00.0, compute capability: 7.0)
>>>
>>> tf.import_graph_def(trt_graph, name='')
>>>
>>> tf_input = tf_sess.graph.get_tensor_by_name(input_names[0] + ':0')
>>> tf_output = tf_sess.graph.get_tensor_by_name(output_names[0] + ':0')
>>>
>>>
>>>
>>> image = Image.open(IMAGE_PATH)
>>>
>>> plt.imshow(image)
<matplotlib.image.AxesImage object at 0x7f14b995eba8>
>>>
>>> width = int(tf_input.shape.as_list()[1])
>>> height = int(tf_input.shape.as_list()[2])
>>>
>>> image = np.array(image.resize((width, height)))
>>> output = tf_sess.run(tf_output, feed_dict={
...     tf_input: image[None, ...]
... })
>>>
>>> scores = output[0]
>>> with open(LABELS_PATH, 'r') as f:
...     labels = f.readlines()
...
>>> top5_idx = scores.argsort()[::-1][0:5]
>>>
>>> for i in top5_idx:
...     print('(%3f) %s' % (scores[i], labels[i]))
...
(0.338516) golden retriever

(0.056441) toy poodle

(0.049763) miniature poodle

(0.040646) cocker spaniel, English cocker spaniel, cocker

(0.017970) standard poodle

>>>
>>>
>>>
>>> tf_sess.close()
>>>
>>>
>>>

Hi, I am using nvcr.io/nvidia/tensorrt:19.01-py3 (you used nvcr.io/nvidia/tensorflow:19.01-py3) image with tf-nightly-gpu 1.13. additionally installed (this was suggested in https://devtalk.nvidia.com/default/topic/1047057/tensorrt/graph-conversion-to-fp16-not-working/). The problem with “running against TensorRT Version 0.0.0” was solved with this.

However, I am not getting similar logs like you do and the optimization does not seem to work at all (even though I get no Errors), since the inference over all the differently optimized graphs (with FP16 or FP32 or not optimized) is always the same speed.
I infered over the unoptimized graph by running
“tf.import_graph_def(frozen_graph, name=‘’)” instead of
“tf.import_graph_def(trt_graph, name=‘’)”

So, should I try it with nvcr.io/nvidia/tensorflow:19.01-py3 image as well ?

  • Yes, try it with tensorflow container.

    Everything works well with tensorflow container, thanks!

    However,
    if I try optimizing for precision_mode=‘INT8’,
    inference time is at ~0.33s
    it is ~5ms for FP32 and ~2ms for FP16.
    Am I still missing something?

    I think it takes too much time because it’s running calibration.

    After the calibration is done, you would need to call calib_graph_to_infer_graph.

    See the examples here: https://github.com/tensorflow/tensorrt/blob/master/tftrt/examples/image-classification/image_classification.py#L584

    Yes, the optimization/ calibration for INT8 takes a very long time.
    What I was referring to here:

    was the speed of the inference, which is strangely way longer for INT8 optimized graph than for FP32 or FP16 optimized graph.