TRT optimize graph not faster than unoptimized (nvidia/tensorrt:19.01-py3 image)

gabriel0xFF · February 25, 2019, 4:09pm

I am using the nvcr.io/nvidia/tensorrt:19.01-py3 image to optimze tensorflow models with tensorrt.

I cloned the NVIDIA-AI-IOT/tf_trt_models repo (GitHub - NVIDIA-AI-IOT/tf_trt_models: TensorFlow models accelerated with NVIDIA TensorRT) to verify, if tensorrt works properly, by running the provided jupyter notebooks.

Initially, I got an “INFO: Tensorflow running against tensorrt version 0.0.0” when running trt.create_inference_graph() and nothing was optimized.

I applied the steps suggested under https://devtalk.nvidia.com/default/topic/1047057/tensorrt/graph-conversion-to-fp16-not-working/ and now the optimization is running.

HOWEVER: if I am now infering over the trt optimized graph (no matter if I optimize with FP32, FP16 or INT8), it is always excactly just as fast as the unoptimized .pb graph.
What am I missing here?

using:
nvcr.io/nvidia/tensorrt:19.01-py3 image (started with nvidia-docker run …)
running tf_trt_models/classification.ipynb at master · NVIDIA-AI-IOT/tf_trt_models · GitHub
Graphics card: Volta V100 16GB

NVES · February 25, 2019, 9:11pm

Hello, can you share what your un-optimized .pb inference code looks like?

I’m running nvcr.io/nvidia/tensorflow:19.01-py3 container, without seeing “INFO: Tensorflow running against tensorrt version 0.0.0”

root@bc610afb4f25:/mnt/tf_trt_models/examples/classification# python
Python 3.5.2 (default, Nov 12 2018, 13:43:14)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from PIL import Image
import urllib
import tensorflow as tf
import tensorflow.contrib.tensorrt as trt
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
import numpy as np
from tf_trt_models.classification import download_classification_checkpoint, build_classification_graph>>> import sys
>>> import os
>>> import urllib
>>> import tensorflow as tf
>>> import tensorflow.contrib.tensorrt as trt

>>> import matplotlib
>>> matplotlib.use('Agg')
>>> import matplotlib.pyplot as plt
>>> import numpy as np
>>> from tf_trt_models.classification import download_classification_checkpoint, build_classification_graph
>>>
>>> MODEL = 'inception_v1'
>>> CHECKPOINT_PATH = 'inception_v1.ckpt'
>>> NUM_CLASSES = 1001
>>> LABELS_PATH = './data/imagenet_labels_%d.txt' % NUM_CLASSES
>>> IMAGE_PATH = './data/dog-yawning.jpg'
>>> checkpoint_path = download_classification_checkpoint(MODEL, 'data')
--2019-02-25 19:37:15--  http://download.tensorflow.org/models/inception_v1_2016_08_28.tar.gz
Resolving download.tensorflow.org (download.tensorflow.org)... 172.217.6.48, 2607:f8b0:4005:809::2010
Connecting to download.tensorflow.org (download.tensorflow.org)|172.217.6.48|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 24642554 (24M) [application/x-tar]
Saving to: ‘data/inception_v1_2016_08_28.tar.gz’

data/inception_v1_2016_08_28.tar.gz                100%[==============================================================================================================>]  23.50M  8.11MB/s    in 2.9s

2019-02-25 19:37:18 (8.11 MB/s) - ‘data/inception_v1_2016_08_28.tar.gz’ saved [24642554/24642554]

tar: inception_v1.ckpt: Cannot change ownership to uid 77690, gid 5000: Permission denied
tar: Exiting with failure status due to previous errors
>>> frozen_graph, input_names, output_names = build_classification_graph(
...     model=MODEL,
...     checkpoint=checkpoint_path,
...     num_classes=NUM_CLASSES
... )
2019-02-25 19:37:23.542732: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: Tesla V100-SXM2-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0000:06:00.0
totalMemory: 31.74GiB freeMemory: 31.33GiB
2019-02-25 19:37:23.922839: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 1 with properties:
name: Tesla V100-SXM2-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0000:07:00.0
totalMemory: 31.74GiB freeMemory: 31.33GiB
2019-02-25 19:37:24.304349: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 2 with properties:
name: Tesla V100-SXM2-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0000:0a:00.0
totalMemory: 31.74GiB freeMemory: 31.33GiB
2019-02-25 19:37:24.695042: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 3 with properties:
name: Tesla V100-SXM2-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0000:0b:00.0
totalMemory: 31.74GiB freeMemory: 31.33GiB
2019-02-25 19:37:25.106197: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 4 with properties:
name: Tesla V100-SXM2-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0000:85:00.0
totalMemory: 31.74GiB freeMemory: 31.33GiB
2019-02-25 19:37:25.533935: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 5 with properties:
name: Tesla V100-SXM2-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0000:86:00.0
totalMemory: 31.74GiB freeMemory: 31.33GiB
2019-02-25 19:37:25.973775: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 6 with properties:
name: Tesla V100-SXM2-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0000:89:00.0
totalMemory: 31.74GiB freeMemory: 31.33GiB
2019-02-25 19:37:26.425154: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 7 with properties:
name: Tesla V100-SXM2-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0000:8a:00.0
totalMemory: 31.74GiB freeMemory: 31.33GiB
2019-02-25 19:37:26.425491: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0, 1, 2, 3, 4, 5, 6, 7
2019-02-25 19:37:30.383785: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-02-25 19:37:30.383835: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 1 2 3 4 5 6 7
2019-02-25 19:37:30.383845: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N Y Y Y Y N N N
2019-02-25 19:37:30.383851: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 1:   Y N Y Y N Y N N
2019-02-25 19:37:30.383858: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 2:   Y Y N Y N N Y N
2019-02-25 19:37:30.383865: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 3:   Y Y Y N N N N Y
2019-02-25 19:37:30.383890: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 4:   Y N N N N Y Y Y
2019-02-25 19:37:30.383914: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 5:   N Y N N Y N Y Y
2019-02-25 19:37:30.383920: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 6:   N N Y N Y Y N Y
2019-02-25 19:37:30.383926: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 7:   N N N Y Y Y Y N
2019-02-25 19:37:30.388181: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 30366 MB memory) -> physical GPU (device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:06:00.0, compute capability: 7.0)
2019-02-25 19:37:30.388917: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 30366 MB memory) -> physical GPU (device: 1, name: Tesla V100-SXM2-32GB, pci bus id: 0000:07:00.0, compute capability: 7.0)
2019-02-25 19:37:30.389631: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 30366 MB memory) -> physical GPU (device: 2, name: Tesla V100-SXM2-32GB, pci bus id: 0000:0a:00.0, compute capability: 7.0)
2019-02-25 19:37:30.390266: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 30366 MB memory) -> physical GPU (device: 3, name: Tesla V100-SXM2-32GB, pci bus id: 0000:0b:00.0, compute capability: 7.0)
2019-02-25 19:37:30.390903: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:4 with 30366 MB memory) -> physical GPU (device: 4, name: Tesla V100-SXM2-32GB, pci bus id: 0000:85:00.0, compute capability: 7.0)
2019-02-25 19:37:30.391487: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:5 with 30366 MB memory) -> physical GPU (device: 5, name: Tesla V100-SXM2-32GB, pci bus id: 0000:86:00.0, compute capability: 7.0)
2019-02-25 19:37:30.392121: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:6 with 30366 MB memory) -> physical GPU (device: 6, name: Tesla V100-SXM2-32GB, pci bus id: 0000:89:00.0, compute capability: 7.0)
2019-02-25 19:37:30.392594: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:7 with 30366 MB memory) -> physical GPU (device: 7, name: Tesla V100-SXM2-32GB, pci bus id: 0000:8a:00.0, compute capability: 7.0)
INFO:tensorflow:Restoring parameters from data/inception_v1/inception_v1.ckpt
INFO:tensorflow:Froze 230 variables.
INFO:tensorflow:Converted 230 variables to const ops.
>>> trt_graph = trt.create_inference_graph(
...     input_graph_def=frozen_graph,
...     outputs=output_names,
...     max_batch_size=1,
...     max_workspace_size_bytes=1 << 25,
...     precision_mode='FP16',
...     minimum_segment_size=50
... )
INFO:tensorflow:Running against TensorRT version 5.0.2
2019-02-25 19:37:53.537870: I tensorflow/core/grappler/devices.cc:51] Number of eligible GPUs (core count >= 8): 8
2019-02-25 19:37:53.539717: I tensorflow/core/grappler/clusters/single_machine.cc:359] Starting new session
2019-02-25 19:37:53.540339: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0, 1, 2, 3, 4, 5, 6, 7
2019-02-25 19:37:53.540809: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-02-25 19:37:53.540825: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 1 2 3 4 5 6 7
2019-02-25 19:37:53.540856: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N Y Y Y Y N N N
2019-02-25 19:37:53.540864: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 1:   Y N Y Y N Y N N
2019-02-25 19:37:53.540872: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 2:   Y Y N Y N N Y N
2019-02-25 19:37:53.540880: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 3:   Y Y Y N N N N Y
2019-02-25 19:37:53.540888: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 4:   Y N N N N Y Y Y
2019-02-25 19:37:53.540896: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 5:   N Y N N Y N Y Y
2019-02-25 19:37:53.540903: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 6:   N N Y N Y Y N Y
2019-02-25 19:37:53.540911: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 7:   N N N Y Y Y Y N
2019-02-25 19:37:53.545772: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 30366 MB memory) -> physical GPU (device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:06:00.0, compute capability: 7.0)
2019-02-25 19:37:53.546305: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 30366 MB memory) -> physical GPU (device: 1, name: Tesla V100-SXM2-32GB, pci bus id: 0000:07:00.0, compute capability: 7.0)
2019-02-25 19:37:53.546688: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 30366 MB memory) -> physical GPU (device: 2, name: Tesla V100-SXM2-32GB, pci bus id: 0000:0a:00.0, compute capability: 7.0)
2019-02-25 19:37:53.547139: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 30366 MB memory) -> physical GPU (device: 3, name: Tesla V100-SXM2-32GB, pci bus id: 0000:0b:00.0, compute capability: 7.0)
2019-02-25 19:37:53.547729: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:4 with 30366 MB memory) -> physical GPU (device: 4, name: Tesla V100-SXM2-32GB, pci bus id: 0000:85:00.0, compute capability: 7.0)
2019-02-25 19:37:53.548033: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:5 with 30366 MB memory) -> physical GPU (device: 5, name: Tesla V100-SXM2-32GB, pci bus id: 0000:86:00.0, compute capability: 7.0)
2019-02-25 19:37:53.548173: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:6 with 30366 MB memory) -> physical GPU (device: 6, name: Tesla V100-SXM2-32GB, pci bus id: 0000:89:00.0, compute capability: 7.0)
2019-02-25 19:37:53.548329: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:7 with 30366 MB memory) -> physical GPU (device: 7, name: Tesla V100-SXM2-32GB, pci bus id: 0000:8a:00.0, compute capability: 7.0)
2019-02-25 19:37:53.868189: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3058] Segment @scope 'InceptionV1/', converted to graph
2019-02-25 19:37:53.868257: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:433] Can't find a device placement for the op!
2019-02-25 19:38:22.509291: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:967] Engine InceptionV1/my_trt_op_0 creation for segment 0, composed of 493 nodes succeeded.
2019-02-25 19:38:22.886387: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:185] TensorRTOptimizer is probably called on funcdef! This optimizer must *NOT* be called on function objects.
2019-02-25 19:38:22.979403: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:185] TensorRTOptimizer is probably called on funcdef! This optimizer must *NOT* be called on function objects.
2019-02-25 19:38:23.057046: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:501] Optimization results for grappler item: tf_graph
2019-02-25 19:38:23.057120: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   constant folding: Graph size after: 502 nodes (-231), 528 edges (-230), time = 143.298ms.
2019-02-25 19:38:23.057129: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   layout: Graph size after: 515 nodes (13), 532 edges (4), time = 46.311ms.
2019-02-25 19:38:23.057136: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   TensorRTOptimizer: Graph size after: 23 nodes (-492), 13 edges (-519), time = 28741.6797ms.
2019-02-25 19:38:23.057159: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   constant folding: Graph size after: 14 nodes (-9), 13 edges (0), time = 52.414ms.
2019-02-25 19:38:23.057183: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   TensorRTOptimizer: Graph size after: 14 nodes (0), 13 edges (0), time = 80.394ms.
2019-02-25 19:38:23.057189: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:501] Optimization results for grappler item: InceptionV1/my_trt_op_0_native_segment
2019-02-25 19:38:23.057195: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   constant folding: Graph size after: 494 nodes (0), 520 edges (0), time = 94.843ms.
2019-02-25 19:38:23.057201: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   layout: Invalid argument: The graph is already optimized by layout optimizer.
2019-02-25 19:38:23.057207: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   TensorRTOptimizer: Graph size after: 494 nodes (0), 520 edges (0), time = 14.879ms.
2019-02-25 19:38:23.057230: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   constant folding: Graph size after: 494 nodes (0), 520 edges (0), time = 78.071ms.
2019-02-25 19:38:23.057236: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   TensorRTOptimizer: Graph size after: 494 nodes (0), 520 edges (0), time = 14.959ms.
>>> tf_config = tf.ConfigProto()
tf_config.gpu_options.allow_growth = True
>>> tf_config.gpu_options.allow_growth = True
>>>
>>> tf_sess = tf.Session(config=tf_config)
2019-02-25 19:39:01.922633: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0, 1, 2, 3, 4, 5, 6, 7
2019-02-25 19:39:01.923168: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-02-25 19:39:01.923196: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 1 2 3 4 5 6 7
2019-02-25 19:39:01.923208: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N N Y Y Y N N N
2019-02-25 19:39:01.923219: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 1:   Y N Y Y N Y N N
2019-02-25 19:39:01.923230: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 2:   Y Y N Y N N Y N
2019-02-25 19:39:01.923240: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 3:   Y Y Y N N N N Y
2019-02-25 19:39:01.923250: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 4:   Y N N N N Y Y Y
2019-02-25 19:39:01.923259: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 5:   N Y N N Y N Y Y
2019-02-25 19:39:01.923269: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 6:   N N Y N Y Y N Y
2019-02-25 19:39:01.923280: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 7:   N N N Y Y Y Y N
2019-02-25 19:39:01.927397: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 30366 MB memory) -> physical GPU (device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:06:00.0, compute capability: 7.0)
2019-02-25 19:39:01.928051: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 30366 MB memory) -> physical GPU (device: 1, name: Tesla V100-SXM2-32GB, pci bus id: 0000:07:00.0, compute capability: 7.0)
2019-02-25 19:39:01.928602: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 30366 MB memory) -> physical GPU (device: 2, name: Tesla V100-SXM2-32GB, pci bus id: 0000:0a:00.0, compute capability: 7.0)
2019-02-25 19:39:01.929195: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 30366 MB memory) -> physical GPU (device: 3, name: Tesla V100-SXM2-32GB, pci bus id: 0000:0b:00.0, compute capability: 7.0)
2019-02-25 19:39:01.929759: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:4 with 30366 MB memory) -> physical GPU (device: 4, name: Tesla V100-SXM2-32GB, pci bus id: 0000:85:00.0, compute capability: 7.0)
2019-02-25 19:39:01.930484: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:5 with 30366 MB memory) -> physical GPU (device: 5, name: Tesla V100-SXM2-32GB, pci bus id: 0000:86:00.0, compute capability: 7.0)
2019-02-25 19:39:01.930979: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:6 with 30366 MB memory) -> physical GPU (device: 6, name: Tesla V100-SXM2-32GB, pci bus id: 0000:89:00.0, compute capability: 7.0)
2019-02-25 19:39:01.931506: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:7 with 30366 MB memory) -> physical GPU (device: 7, name: Tesla V100-SXM2-32GB, pci bus id: 0000:8a:00.0, compute capability: 7.0)
>>>
>>> tf.import_graph_def(trt_graph, name='')
>>>
>>> tf_input = tf_sess.graph.get_tensor_by_name(input_names[0] + ':0')
>>> tf_output = tf_sess.graph.get_tensor_by_name(output_names[0] + ':0')
>>>
>>>
>>>
>>> image = Image.open(IMAGE_PATH)
>>>
>>> plt.imshow(image)
<matplotlib.image.AxesImage object at 0x7f14b995eba8>
>>>
>>> width = int(tf_input.shape.as_list()[1])
>>> height = int(tf_input.shape.as_list()[2])
>>>
>>> image = np.array(image.resize((width, height)))
>>> output = tf_sess.run(tf_output, feed_dict={
...     tf_input: image[None, ...]
... })
>>>
>>> scores = output[0]
>>> with open(LABELS_PATH, 'r') as f:
...     labels = f.readlines()
...
>>> top5_idx = scores.argsort()[::-1][0:5]
>>>
>>> for i in top5_idx:
...     print('(%3f) %s' % (scores[i], labels[i]))
...
(0.338516) golden retriever

(0.056441) toy poodle

(0.049763) miniature poodle

(0.040646) cocker spaniel, English cocker spaniel, cocker

(0.017970) standard poodle

>>>
>>>
>>>
>>> tf_sess.close()
>>>
>>>
>>>

gabriel0xFF · February 26, 2019, 5:32pm

Hi, I am using nvcr.io/nvidia/tensorrt:19.01-py3 (you used nvcr.io/nvidia/tensorflow:19.01-py3) image with tf-nightly-gpu 1.13. additionally installed (this was suggested in https://devtalk.nvidia.com/default/topic/1047057/tensorrt/graph-conversion-to-fp16-not-working/). The problem with “running against TensorRT Version 0.0.0” was solved with this.

However, I am not getting similar logs like you do and the optimization does not seem to work at all (even though I get no Errors), since the inference over all the differently optimized graphs (with FP16 or FP32 or not optimized) is always the same speed.
I infered over the unoptimized graph by running
“tf.import_graph_def(frozen_graph, name=‘’)” instead of
“tf.import_graph_def(trt_graph, name=‘’)”

So, should I try it with nvcr.io/nvidia/tensorflow:19.01-py3 image as well ?

NVES · February 26, 2019, 5:35pm

Yes, try it with tensorflow container.

gabriel0xFF · February 26, 2019, 7:11pm

Everything works well with tensorflow container, thanks!

However,
if I try optimizing for precision_mode=‘INT8’,
inference time is at ~0.33s
it is ~5ms for FP32 and ~2ms for FP16.
Am I still missing something?

Pooya-Davoodi · March 11, 2019, 2:06am

I think it takes too much time because it’s running calibration.

After the calibration is done, you would need to call calib_graph_to_infer_graph.

See the examples here: https://github.com/tensorflow/tensorrt/blob/master/tftrt/examples/image-classification/image_classification.py#L584

gabriel0xFF · March 11, 2019, 9:04am

Yes, the optimization/ calibration for INT8 takes a very long time.
What I was referring to here:

was the speed of the inference, which is strangely way longer for INT8 optimized graph than for FP32 or FP16 optimized graph.

Topic		Replies	Views
TRT issue with Graph Creation - TRTEngineOP TensorRT	12	3270	November 4, 2019
Don't get any 'TRTEngineOp' after optimizing model via TensorRT in Jeton TX2 TensorRT	17	3810	October 12, 2021
TensorRT Integration Speeds Up TensorFlow Inference Technical Blog	40	1146	March 27, 2020
TF-TRT INT8 Failing to convert due to no calibration TensorRT	3	1434	April 2, 2019
No improvement in inference performance after Opt. with TensorRT TensorRT	6	1296	April 15, 2020
No speed up with TensorRT FP16 or INT8 on NVIDIA V100 TensorRT	7	2914	November 15, 2019
No improvements from TensorRT on NVIDIA-AI-IOT/tf_trt_models TensorRT	3	1627	February 21, 2019
Error while optimizing frozen Tensorflow graph TensorRT	4	1228	February 26, 2019
TF-TRT failing to convert with INT32 values TensorRT	8	1137	April 15, 2019
TensorRT (TF-TRT) doesn't improve TF model in GeForce 1060? TensorRT	7	3018	January 18, 2019

TRT optimize graph not faster than unoptimized (nvidia/tensorrt:19.01-py3 image)

Related topics