TF_TRT unsupported Constant Type

Hi,

I’m using tf slim to build InceptionV3 model and wanted to convert into tf-trt graph.

If set the model trainable(is_training) = True, it will failed to convert to tf-trt graph with unsupported constant type.

Somehow, I try to set trainable(is_training) = False, I got no error message and conversion went smoothly.

PC:Ubuntu 16.04
GPU:1080Ti
TF:1.7.0
CUDA:9.0
TensorRT:4.1.2

import sys
import os
import tensorflow as tf
import tensorflow.contrib.tensorrt as trt
import numpy as np
import tensorflow.contrib.slim as slim

sys.path.append('/home/user/Desktop/python_code/model_test/models/research/slim')
from nets import inception,inception_v3

###critical setting
is_training=True

config = tf.ConfigProto()
config.gpu_options.allow_growth
input_name='input'
num_classes=1001
output_name='prediction'
checkpoint='inception_v3.ckpt'
with tf.Graph().as_default() as tf_graph:
    with tf.Session(config=config) as tf_sess:
        tf_input = tf.placeholder(tf.float32, [None, 299, 299, 3],
                name=input_name)
        with slim.arg_scope(inception.inception_v3_arg_scope()):
            with slim.arg_scope([slim.batch_norm],is_training=is_training):
                tf_net, tf_end_points = inception.inception_v3(tf_input, is_training=is_training,
                    num_classes=num_classes)
        tf_output =  tf.nn.softmax(tf_net, name=output_name)
        # load checkpoint
        tf_saver = tf.train.Saver()
        tf_saver.restore(save_path=checkpoint, sess=tf_sess)
        # freeze graph
        frozen_graph = tf.graph_util.convert_variables_to_constants(
            tf_sess,
            tf_sess.graph_def,
            output_node_names=[output_name]
        )

trt_graph = trt.create_inference_graph(
    input_graph_def=frozen_graph,
    outputs=[output_name],
    max_batch_size=1,
    max_workspace_size_bytes=1<<25,
    precision_mode='FP32',
    minimum_segment_size=50
)

Error message:

2019-01-17 17:41:31.726424: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-01-17 17:41:31.804534: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-01-17 17:41:31.804836: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.6705
pciBusID: 0000:01:00.0
totalMemory: 10.91GiB freeMemory: 10.55GiB
2019-01-17 17:41:31.804864: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0
2019-01-17 17:41:32.063939: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-01-17 17:41:32.063971: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917]      0 
2019-01-17 17:41:32.063977: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0:   N 
2019-01-17 17:41:32.064162: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10187 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
2019-01-17 17:41:34.839673: I tensorflow/core/grappler/devices.cc:51] Number of eligible GPUs (core count >= 8): 1
2019-01-17 17:41:35.341416: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2624] Max batch size= 1 max workspace size= 33554432
2019-01-17 17:41:35.341461: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2630] starting build engine
2019-01-17 17:42:06.791967: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2635] Built network
2019-01-17 17:42:07.663609: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2640] Serialized engine
2019-01-17 17:42:07.703502: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2648] finished engine InceptionV3/my_trt_op0 containing 801 nodes
2019-01-17 17:42:07.703563: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2668] Finished op preparation
2019-01-17 17:42:07.738476: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2676] OK finished op building
2019-01-17 17:42:53.465747: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0
2019-01-17 17:42:53.465796: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-01-17 17:42:53.465816: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917]      0 
2019-01-17 17:42:53.465820: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0:   N 
2019-01-17 17:42:53.465920: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10187 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
2019-01-17 17:42:57.789125: I tensorflow/core/grappler/devices.cc:51] Number of eligible GPUs (core count >= 8): 1
2019-01-17 17:42:58.150488: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:412] subgraph conversion error for subgraph_index:0 due to: "Unimplemented: Not supported constant type, at InceptionV3/InceptionV3/Mixed_7c/Branch_3/Conv2d_0b_1x1/BatchNorm/Const_2" SKIPPING......( 796 nodes)

hello,

You might be using precompiled pip package with TRT4.0. If tensorflow is compiled with another trt version some features are disabled.

Please try again with some recent tensorflow, 1.12 for example.

Hi,

Which TRT version should I use to compile TF 1.12, in order to solve this problem.

Some updates:

Nvidia Docker: nvcr.io/nvidia/tensorflow:18.12-py3

TF:1.12
TensorRT:5.02

I tried to use latest version of TF and TRT.

The default setting of slim inception uses fused batch_norm.

I got some warnings from terminal :

Use fused batch norm

  1. is_training=True
    Engine InceptionV3/my_trt_op_0 creation for segment 0, composed of 788 nodes failed: Unimplemented: only is_training=false is supported, at InceptionV3/InceptionV3/Conv2d_1a_3x3/BatchNorm/FusedBatchNorm. Skipping…

Inference time per Image: 14ms

  1. is_training=False
    There is no skipping error.
    Inference time per Image: 8ms

Turn off fused batch norm

  1. is_training=True
    No skipping erroe
    Inference time per Image: 15ms

  2. is_training=False
    No skipping error
    Inference time per Image: 9ms

I was running the fowllowing code

import sys
import os
import tensorflow as tf
import tensorflow.contrib.tensorrt as trt
import numpy as np
import tensorflow.contrib.slim as slim
import time
sys.path.append('./slim')
import nets
from nets import inception,inception_v3

def build_frozen_graph(is_training):
    tf_config = tf.ConfigProto()
    tf_config.gpu_options.allow_growth = True
    input_name='input'
    output_name='prediction'
    num_classes=1001
    checkpoint='inception_v3.ckpt'
    with tf.Graph().as_default() as tf_graph:

        with tf.Session(config=tf_config) as tf_sess:
            tf_input = tf.placeholder(tf.float32, [None, 299, 299, 3],
                    name=input_name)

            with slim.arg_scope(inception.inception_v3_arg_scope()):
                with slim.arg_scope([slim.batch_norm],is_training=is_training):
                    tf_net, tf_end_points = inception.inception_v3(tf_input, is_training=is_training,
                        num_classes=num_classes)
            tf_output =  tf.nn.softmax(tf_net, name=output_name)
            # load checkpoint
            tf_saver = tf.train.Saver()
            tf_saver.restore(save_path=checkpoint, sess=tf_sess)
            # freeze graph
            frozen_graph = tf.graph_util.convert_variables_to_constants(
                tf_sess,
                tf_sess.graph_def,
                output_node_names=[output_name]
            )
            # remove relu 6

            frozen_graph = tf.graph_util.remove_training_nodes(frozen_graph)

    return frozen_graph

def convert_to_tftrt(frozen_graph,FP):
    input_name='input'
    output_name='prediction'
    trt_graph = trt.create_inference_graph(
    input_graph_def=frozen_graph,
    outputs=[output_name],
    max_batch_size=1,
    max_workspace_size_bytes=3221225472,
    precision_mode=FP,
    minimum_segment_size=50
)
    return trt_graph

def import_graph(imported_graph):
    input_name='input'
    output_name='prediction'
    tf.reset_default_graph()
    with tf.get_default_graph().as_default() as graph:
        tf_config = tf.ConfigProto()
        tf_config.gpu_options.allow_growth = True

        with tf.Session(config=tf_config).as_default() as sess:
            tf.import_graph_def(imported_graph, name='')
            tf_input = sess.graph.get_tensor_by_name([input_name][0] + ':0')
            tf_output = sess.graph.get_tensor_by_name([output_name][0] + ':0')
            return graph,sess,tf_input,tf_output

def inference_test(graph,sess,tf_input,tf_output):
    image=np.ones((1,299,299,3))
    st = time.time()
    with graph.as_default():
        with sess.as_default():
            for i in range(1000):
                output = sess.run(tf_output, feed_dict={
                    tf_input:image
                })
    print((time.time()-st)/1000)

def main(is_training):
    frozen_graph = build_frozen_graph(is_training)
    FP="FP32"
    trt_graph = convert_to_tftrt(frozen_graph,FP)
    graph,sess,tf_input,tf_output = import_graph(trt_graph)
    inference_test(graph,sess,tf_input,tf_output)

if __name__ =='__main__':
    is_training = bool(int(sys.argv[1]))
    print('is_training:',is_training)
    main(is_training)

[b]

is_training=True
turn off fused batch norm
terminal outputs as following:[/b]

is_training: True
2019-01-18 06:59:03.872197: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:957] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-01-18 06:59:03.872651: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.6705
pciBusID: 0000:01:00.0
totalMemory: 10.92GiB freeMemory: 10.50GiB
2019-01-18 06:59:03.872668: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-01-18 06:59:04.141085: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-01-18 06:59:04.141132: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 
2019-01-18 06:59:04.141139: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N 
2019-01-18 06:59:04.141345: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10137 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
2019-01-18 06:59:08.747584: I tensorflow/core/grappler/devices.cc:51] Number of eligible GPUs (core count >= 8): 1
2019-01-18 06:59:08.754120: I tensorflow/core/grappler/clusters/single_machine.cc:359] Starting new session
2019-01-18 06:59:08.754447: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-01-18 06:59:08.754478: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-01-18 06:59:08.754498: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 
2019-01-18 06:59:08.754503: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N 
2019-01-18 06:59:08.754683: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10137 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
2019-01-18 06:59:10.087833: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:501] Optimization results for grappler item: tf_graph
2019-01-18 06:59:10.087863: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   constant folding: Graph size after: 1841 nodes (-8), 2252 edges (-10), time = 123ms.
2019-01-18 06:59:10.087885: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   layout: Graph size after: 2422 nodes (581), 2635 edges (383), time = 98.311ms.
2019-01-18 06:59:10.087890: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   TensorRTOptimizer: Graph size after: 2422 nodes (0), 2635 edges (0), time = 408.997ms.
2019-01-18 06:59:10.087895: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   constant folding: Graph size after: 2223 nodes (-199), 2635 edges (0), time = 125.847ms.
2019-01-18 06:59:10.087900: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   TensorRTOptimizer: Graph size after: 2223 nodes (0), 2635 edges (0), time = 409.705ms.
2019-01-18 06:59:10.207559: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-01-18 06:59:10.207595: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-01-18 06:59:10.207602: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 
2019-01-18 06:59:10.207607: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N 
2019-01-18 06:59:10.207766: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10137 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
0.01595612072944641