TF_TRT unsupported Constant Type

tom.huang · January 17, 2019, 9:58am

Hi,

I’m using tf slim to build InceptionV3 model and wanted to convert into tf-trt graph.

If set the model trainable(is_training) = True, it will failed to convert to tf-trt graph with unsupported constant type.

Somehow, I try to set trainable(is_training) = False, I got no error message and conversion went smoothly.

PC:Ubuntu 16.04
GPU:1080Ti
TF:1.7.0
CUDA:9.0
TensorRT:4.1.2

import sys
import os
import tensorflow as tf
import tensorflow.contrib.tensorrt as trt
import numpy as np
import tensorflow.contrib.slim as slim

sys.path.append('/home/user/Desktop/python_code/model_test/models/research/slim')
from nets import inception,inception_v3

###critical setting
is_training=True

config = tf.ConfigProto()
config.gpu_options.allow_growth
input_name='input'
num_classes=1001
output_name='prediction'
checkpoint='inception_v3.ckpt'
with tf.Graph().as_default() as tf_graph:
    with tf.Session(config=config) as tf_sess:
        tf_input = tf.placeholder(tf.float32, [None, 299, 299, 3],
                name=input_name)
        with slim.arg_scope(inception.inception_v3_arg_scope()):
            with slim.arg_scope([slim.batch_norm],is_training=is_training):
                tf_net, tf_end_points = inception.inception_v3(tf_input, is_training=is_training,
                    num_classes=num_classes)
        tf_output =  tf.nn.softmax(tf_net, name=output_name)
        # load checkpoint
        tf_saver = tf.train.Saver()
        tf_saver.restore(save_path=checkpoint, sess=tf_sess)
        # freeze graph
        frozen_graph = tf.graph_util.convert_variables_to_constants(
            tf_sess,
            tf_sess.graph_def,
            output_node_names=[output_name]
        )

trt_graph = trt.create_inference_graph(
    input_graph_def=frozen_graph,
    outputs=[output_name],
    max_batch_size=1,
    max_workspace_size_bytes=1<<25,
    precision_mode='FP32',
    minimum_segment_size=50
)

Error message:

2019-01-17 17:41:31.726424: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-01-17 17:41:31.804534: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-01-17 17:41:31.804836: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.6705
pciBusID: 0000:01:00.0
totalMemory: 10.91GiB freeMemory: 10.55GiB
2019-01-17 17:41:31.804864: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0
2019-01-17 17:41:32.063939: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-01-17 17:41:32.063971: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917]      0 
2019-01-17 17:41:32.063977: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0:   N 
2019-01-17 17:41:32.064162: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10187 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
2019-01-17 17:41:34.839673: I tensorflow/core/grappler/devices.cc:51] Number of eligible GPUs (core count >= 8): 1
2019-01-17 17:41:35.341416: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2624] Max batch size= 1 max workspace size= 33554432
2019-01-17 17:41:35.341461: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2630] starting build engine
2019-01-17 17:42:06.791967: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2635] Built network
2019-01-17 17:42:07.663609: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2640] Serialized engine
2019-01-17 17:42:07.703502: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2648] finished engine InceptionV3/my_trt_op0 containing 801 nodes
2019-01-17 17:42:07.703563: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2668] Finished op preparation
2019-01-17 17:42:07.738476: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2676] OK finished op building
2019-01-17 17:42:53.465747: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0
2019-01-17 17:42:53.465796: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-01-17 17:42:53.465816: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917]      0 
2019-01-17 17:42:53.465820: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0:   N 
2019-01-17 17:42:53.465920: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10187 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
2019-01-17 17:42:57.789125: I tensorflow/core/grappler/devices.cc:51] Number of eligible GPUs (core count >= 8): 1
2019-01-17 17:42:58.150488: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:412] subgraph conversion error for subgraph_index:0 due to: "Unimplemented: Not supported constant type, at InceptionV3/InceptionV3/Mixed_7c/Branch_3/Conv2d_0b_1x1/BatchNorm/Const_2" SKIPPING......( 796 nodes)

NVES · January 17, 2019, 5:47pm

hello,

You might be using precompiled pip package with TRT4.0. If tensorflow is compiled with another trt version some features are disabled.

Please try again with some recent tensorflow, 1.12 for example.

tom.huang · January 18, 2019, 1:33am

Hi,

Which TRT version should I use to compile TF 1.12, in order to solve this problem.

tom.huang · January 18, 2019, 7:00am

Some updates:

Nvidia Docker: nvcr.io/nvidia/tensorflow:18.12-py3

TF:1.12
TensorRT:5.02

I tried to use latest version of TF and TRT.

The default setting of slim inception uses fused batch_norm.

I got some warnings from terminal :

Use fused batch norm

is_training=True
Engine InceptionV3/my_trt_op_0 creation for segment 0, composed of 788 nodes failed: Unimplemented: only is_training=false is supported, at InceptionV3/InceptionV3/Conv2d_1a_3x3/BatchNorm/FusedBatchNorm. Skipping…

Inference time per Image: 14ms

is_training=False
There is no skipping error.
Inference time per Image: 8ms

Turn off fused batch norm

is_training=True
No skipping erroe
Inference time per Image: 15ms
is_training=False
No skipping error
Inference time per Image: 9ms

I was running the fowllowing code

import sys
import os
import tensorflow as tf
import tensorflow.contrib.tensorrt as trt
import numpy as np
import tensorflow.contrib.slim as slim
import time
sys.path.append('./slim')
import nets
from nets import inception,inception_v3

def build_frozen_graph(is_training):
    tf_config = tf.ConfigProto()
    tf_config.gpu_options.allow_growth = True
    input_name='input'
    output_name='prediction'
    num_classes=1001
    checkpoint='inception_v3.ckpt'
    with tf.Graph().as_default() as tf_graph:

        with tf.Session(config=tf_config) as tf_sess:
            tf_input = tf.placeholder(tf.float32, [None, 299, 299, 3],
                    name=input_name)

            with slim.arg_scope(inception.inception_v3_arg_scope()):
                with slim.arg_scope([slim.batch_norm],is_training=is_training):
                    tf_net, tf_end_points = inception.inception_v3(tf_input, is_training=is_training,
                        num_classes=num_classes)
            tf_output =  tf.nn.softmax(tf_net, name=output_name)
            # load checkpoint
            tf_saver = tf.train.Saver()
            tf_saver.restore(save_path=checkpoint, sess=tf_sess)
            # freeze graph
            frozen_graph = tf.graph_util.convert_variables_to_constants(
                tf_sess,
                tf_sess.graph_def,
                output_node_names=[output_name]
            )
            # remove relu 6

            frozen_graph = tf.graph_util.remove_training_nodes(frozen_graph)

    return frozen_graph

def convert_to_tftrt(frozen_graph,FP):
    input_name='input'
    output_name='prediction'
    trt_graph = trt.create_inference_graph(
    input_graph_def=frozen_graph,
    outputs=[output_name],
    max_batch_size=1,
    max_workspace_size_bytes=3221225472,
    precision_mode=FP,
    minimum_segment_size=50
)
    return trt_graph

def import_graph(imported_graph):
    input_name='input'
    output_name='prediction'
    tf.reset_default_graph()
    with tf.get_default_graph().as_default() as graph:
        tf_config = tf.ConfigProto()
        tf_config.gpu_options.allow_growth = True

        with tf.Session(config=tf_config).as_default() as sess:
            tf.import_graph_def(imported_graph, name='')
            tf_input = sess.graph.get_tensor_by_name([input_name][0] + ':0')
            tf_output = sess.graph.get_tensor_by_name([output_name][0] + ':0')
            return graph,sess,tf_input,tf_output

def inference_test(graph,sess,tf_input,tf_output):
    image=np.ones((1,299,299,3))
    st = time.time()
    with graph.as_default():
        with sess.as_default():
            for i in range(1000):
                output = sess.run(tf_output, feed_dict={
                    tf_input:image
                })
    print((time.time()-st)/1000)

def main(is_training):
    frozen_graph = build_frozen_graph(is_training)
    FP="FP32"
    trt_graph = convert_to_tftrt(frozen_graph,FP)
    graph,sess,tf_input,tf_output = import_graph(trt_graph)
    inference_test(graph,sess,tf_input,tf_output)

if __name__ =='__main__':
    is_training = bool(int(sys.argv[1]))
    print('is_training:',is_training)
    main(is_training)

[b]

is_training=True
turn off fused batch norm
terminal outputs as following:[/b]

is_training: True
2019-01-18 06:59:03.872197: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:957] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-01-18 06:59:03.872651: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.6705
pciBusID: 0000:01:00.0
totalMemory: 10.92GiB freeMemory: 10.50GiB
2019-01-18 06:59:03.872668: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-01-18 06:59:04.141085: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-01-18 06:59:04.141132: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 
2019-01-18 06:59:04.141139: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N 
2019-01-18 06:59:04.141345: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10137 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
2019-01-18 06:59:08.747584: I tensorflow/core/grappler/devices.cc:51] Number of eligible GPUs (core count >= 8): 1
2019-01-18 06:59:08.754120: I tensorflow/core/grappler/clusters/single_machine.cc:359] Starting new session
2019-01-18 06:59:08.754447: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-01-18 06:59:08.754478: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-01-18 06:59:08.754498: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 
2019-01-18 06:59:08.754503: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N 
2019-01-18 06:59:08.754683: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10137 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
2019-01-18 06:59:10.087833: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:501] Optimization results for grappler item: tf_graph
2019-01-18 06:59:10.087863: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   constant folding: Graph size after: 1841 nodes (-8), 2252 edges (-10), time = 123ms.
2019-01-18 06:59:10.087885: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   layout: Graph size after: 2422 nodes (581), 2635 edges (383), time = 98.311ms.
2019-01-18 06:59:10.087890: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   TensorRTOptimizer: Graph size after: 2422 nodes (0), 2635 edges (0), time = 408.997ms.
2019-01-18 06:59:10.087895: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   constant folding: Graph size after: 2223 nodes (-199), 2635 edges (0), time = 125.847ms.
2019-01-18 06:59:10.087900: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   TensorRTOptimizer: Graph size after: 2223 nodes (0), 2635 edges (0), time = 409.705ms.
2019-01-18 06:59:10.207559: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-01-18 06:59:10.207595: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-01-18 06:59:10.207602: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 
2019-01-18 06:59:10.207607: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N 
2019-01-18 06:59:10.207766: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10137 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
0.01595612072944641

Topic		Replies	Views
TF-TRT INT8 Failing to convert due to no calibration TensorRT	3	1383	April 2, 2019
Extremely long time to load TRT-optimized frozen TF graphs TensorRT	31	10052	October 12, 2021
[TFTRT 4.0.1.6] TFTRT 4.0.1.6 optimize Inception i3d network failure on FP32 mode TensorRT	6	1171	September 25, 2018
TensorRT (TF-TRT) doesn't improve TF model in GeForce 1060? TensorRT	7	2893	January 18, 2019
No improvements from TensorRT on NVIDIA-AI-IOT/tf_trt_models TensorRT	3	1564	February 21, 2019
TF-TRT not generating .engine file TensorRT	1	719	May 18, 2022
No improvement in inference performance after Opt. with TensorRT TensorRT	6	1221	April 15, 2020
TRT optimize graph not faster than unoptimized (nvidia/tensorrt:19.01-py3 image) TensorRT	7	2148	October 12, 2021
Tf-trt conversion got killed TensorRT tensorrt , tensorflow , jetson-inference	3	745	April 22, 2021
Don't get any 'TRTEngineOp' after optimizing model via TensorRT in Jeton TX2 TensorRT	17	3672	October 12, 2021

TF_TRT unsupported Constant Type

Related topics