Graph conversion to FP16 not working

jamesc46fj8 · February 5, 2019, 3:56pm

I am trying to covert a Tensorflow pb file with the following script:

import tensorflow as tf
from tensorflow.contrib import tensorrt as trt

def load_graph(file):
with tf.gfile.GFile(file, ‘rb’) as f:
graph_def = tf.GraphDef()
graph_def.ParseFromString(f.read())
with tf.Graph().as_default() as graph:
tf.import_graph_def(graph_def)
return graph, graph_def

graph, graph_def = load_graph(‘/project/model/mask_rcnn_tf_model.pb’)
tensorrt_graph = trt.create_inference_graph(graph_def, outputs=output_names, max_batch_size=1, precision_mode=‘FP16’)

with tf.gfile.GFile(‘/project/output/tensorrt_model.pb’, ‘wb’) as f:
f.write(tensorrt_graph.SerializeToString())

I get the following error: Check failed: (int)tensor_l->getType() == (int)dtype (0 vs. 3)

I am using the tensorflow:19.01-py3 docker image provided by Nvidia.

The full stack trace is:
2019-02-05 15:46:08.585743: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:957] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-02-05 15:46:08.586184: I tensorflow/core/grappler/devices.cc:51] Number of eligible GPUs (core count >= 8): 1
2019-02-05 15:46:08.586304: I tensorflow/core/grappler/clusters/single_machine.cc:359] Starting new session
2019-02-05 15:46:08.595975: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: Tesla V100-SXM2-16GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0000:00:1e.0
totalMemory: 15.75GiB freeMemory: 15.34GiB
2019-02-05 15:46:08.596009: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-02-05 15:46:09.170920: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-02-05 15:46:09.170977: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
2019-02-05 15:46:09.170987: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
2019-02-05 15:46:09.171326: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14813 MB memory) → physical GPU (device: 0, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:1e.0, compute capability: 7.0)
2019-02-05 15:46:13.494251: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:853] MULTIPLE tensorrt candidate conversion: 19
2019-02-05 15:46:13.495752: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3056] Segment @scope ‘’, converted to graph
2019-02-05 15:46:13.495778: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can’t find a device placement for the op!
2019-02-05 15:46:13.501540: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3056] Segment @scope ‘’, converted to graph
2019-02-05 15:46:13.501572: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can’t find a device placement for the op!
2019-02-05 15:46:13.502830: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3056] Segment @scope ‘mrcnn_bbox_fc/’, converted to graph
2019-02-05 15:46:13.502859: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can’t find a device placement for the op!
2019-02-05 15:46:13.503485: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3056] Segment @scope ‘mrcnn_class_bn1/’, converted to graph
2019-02-05 15:46:13.503513: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can’t find a device placement for the op!
2019-02-05 15:46:13.504255: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3056] Segment @scope ‘mrcnn_class_bn2/’, converted to graph
2019-02-05 15:46:13.504282: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can’t find a device placement for the op!
2019-02-05 15:46:13.505028: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3056] Segment @scope ‘mrcnn_class_conv1/’, converted to graph
2019-02-05 15:46:13.505056: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can’t find a device placement for the op!
2019-02-05 15:46:13.505832: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3056] Segment @scope ‘mrcnn_class_conv2/’, converted to graph
2019-02-05 15:46:13.505859: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can’t find a device placement for the op!
2019-02-05 15:46:13.506673: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3056] Segment @scope ‘mrcnn_class_logits/’, converted to graph
2019-02-05 15:46:13.506700: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can’t find a device placement for the op!
2019-02-05 15:46:13.507570: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3056] Segment @scope ‘mrcnn_mask/’, converted to graph
2019-02-05 15:46:13.507598: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can’t find a device placement for the op!
2019-02-05 15:46:13.508540: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3056] Segment @scope ‘mrcnn_mask_bn1/’, converted to graph
2019-02-05 15:46:13.508568: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can’t find a device placement for the op!
2019-02-05 15:46:13.509561: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3056] Segment @scope ‘mrcnn_mask_bn2/’, converted to graph
2019-02-05 15:46:13.509589: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can’t find a device placement for the op!
2019-02-05 15:46:13.510600: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3056] Segment @scope ‘mrcnn_mask_bn3/’, converted to graph
2019-02-05 15:46:13.510630: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can’t find a device placement for the op!
2019-02-05 15:46:13.511706: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3056] Segment @scope ‘mrcnn_mask_bn4/’, converted to graph
2019-02-05 15:46:13.511734: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can’t find a device placement for the op!
2019-02-05 15:46:13.512889: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3056] Segment @scope ‘mrcnn_mask_conv1/’, converted to graph
2019-02-05 15:46:13.512916: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can’t find a device placement for the op!
2019-02-05 15:46:13.514072: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3056] Segment @scope ‘mrcnn_mask_conv2/’, converted to graph
2019-02-05 15:46:13.514099: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can’t find a device placement for the op!
2019-02-05 15:46:13.515294: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3056] Segment @scope ‘mrcnn_mask_conv3/’, converted to graph
2019-02-05 15:46:13.515322: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can’t find a device placement for the op!
2019-02-05 15:46:13.516579: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3056] Segment @scope ‘mrcnn_mask_conv4/’, converted to graph
2019-02-05 15:46:13.516608: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can’t find a device placement for the op!
2019-02-05 15:46:13.517872: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3056] Segment @scope ‘mrcnn_mask_deconv/’, converted to graph
2019-02-05 15:46:13.517900: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can’t find a device placement for the op!
2019-02-05 15:46:13.519205: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3056] Segment @scope ‘’, converted to graph
2019-02-05 15:46:13.519232: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can’t find a device placement for the op!
2019-02-05 15:46:15.310960: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:952] Engine my_trt_op_0 creation for segment 0, composed of 3 nodes succeeded.
2019-02-05 15:46:15.337865: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:952] Engine my_trt_op_1 creation for segment 1, composed of 13 nodes succeeded.
2019-02-05 15:46:15.365558: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:952] Engine mrcnn_bbox_fc/my_trt_op_2 creation for segment 2, composed of 4 nodes succeeded.
2019-02-05 15:46:15.373029: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:952] Engine mrcnn_class_bn1/my_trt_op_3 creation for segment 3, composed of 5 nodes succeeded.
2019-02-05 15:46:15.380542: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:952] Engine mrcnn_class_bn2/my_trt_op_4 creation for segment 4, composed of 5 nodes succeeded.
2019-02-05 15:46:17.363690: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:952] Engine mrcnn_class_conv1/my_trt_op_5 creation for segment 5, composed of 4 nodes succeeded.
2019-02-05 15:46:18.342739: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:952] Engine mrcnn_class_conv2/my_trt_op_6 creation for segment 6, composed of 4 nodes succeeded.
2019-02-05 15:46:18.369823: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:952] Engine mrcnn_class_logits/my_trt_op_7 creation for segment 7, composed of 4 nodes succeeded.
2019-02-05 15:46:18.727430: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:952] Engine mrcnn_mask/my_trt_op_8 creation for segment 8, composed of 4 nodes succeeded.
2019-02-05 15:46:18.735034: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:952] Engine mrcnn_mask_bn1/my_trt_op_9 creation for segment 9, composed of 5 nodes succeeded.
2019-02-05 15:46:18.742589: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:952] Engine mrcnn_mask_bn2/my_trt_op_10 creation for segment 10, composed of 5 nodes succeeded.
2019-02-05 15:46:18.750114: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:952] Engine mrcnn_mask_bn3/my_trt_op_11 creation for segment 11, composed of 5 nodes succeeded.
2019-02-05 15:46:18.757590: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:952] Engine mrcnn_mask_bn4/my_trt_op_12 creation for segment 12, composed of 5 nodes succeeded.
2019-02-05 15:46:19.194619: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:952] Engine mrcnn_mask_conv1/my_trt_op_13 creation for segment 13, composed of 4 nodes succeeded.
2019-02-05 15:46:19.633464: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:952] Engine mrcnn_mask_conv2/my_trt_op_14 creation for segment 14, composed of 4 nodes succeeded.
2019-02-05 15:46:20.071144: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:952] Engine mrcnn_mask_conv3/my_trt_op_15 creation for segment 15, composed of 4 nodes succeeded.
2019-02-05 15:46:20.507673: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:952] Engine mrcnn_mask_conv4/my_trt_op_16 creation for segment 16, composed of 4 nodes succeeded.
2019-02-05 15:46:20.515332: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:952] Engine mrcnn_mask_deconv/my_trt_op_17 creation for segment 17, composed of 3 nodes succeeded.
2019-02-05 15:46:20.515680: F tensorflow/contrib/tensorrt/convert/convert_nodes.cc:1430] Check failed: (int)tensor_l->getType() == (int)dtype (0 vs. 3)

NVES · February 5, 2019, 4:04pm

Hello,

to help us debug, can you share a small repro containing the .pb and full conversion code (you posted a snippet above) that demonstrate the error you are seeing?

thanks,
NVES

jamesc46fj8 · February 5, 2019, 4:14pm

Sure. Full script:

import tensorflow as tf
from tensorflow.contrib import tensorrt as trt

import uff

def load_graph(file):
with tf.gfile.GFile(file, ‘rb’) as f:
graph_def = tf.GraphDef()
graph_def.ParseFromString(f.read())
with tf.Graph().as_default() as graph:
tf.import_graph_def(graph_def)
return graph, graph_def

output_names = [
‘output_detections:0’,
‘output_mrcnn_class:0’,
‘output_mrcnn_bbox:0’,
‘output_mrcnn_mask:0’,
‘output_rois:0’
]

graph, graph_def = load_graph(‘/project/model/mask_rcnn_tf_model.pb’)
tensorrt_graph = trt.create_inference_graph(graph_def, outputs=output_names, max_batch_size=1, precision_mode=‘FP16’)

with tf.gfile.GFile(‘/project/output/tensorrt_model.pb’, ‘wb’) as f:
f.write(tensorrt_graph.SerializeToString())

Link to .pb:
https://drive.google.com/file/d/1FhCei-ODiRTm9dINHaXQxmRGJVq93U4e/view?usp=sharing

NVES · February 5, 2019, 5:17pm

thanks for the additional info. we are triaging and will keep you updated.

NVES · February 7, 2019, 11:21pm

Hello,

Per engineering, was able to successfully convert the model using a newer version of tensorflow. Could you please try again using tf-nightly-gpu? It can be installed using

$ pip install tf-nightly-gpu

The user will also need to by add

is_dynamic_op=True

to the args in create_inference_graph().

jamesc46fj8 · February 13, 2019, 7:23pm

Thanks! That solution worked. I’ve confirmed the there are now TensorRT engines in my graph. However, my overall speedup seems to be less than 10%. I am using batch inference. I am using a single V100 on AWS. Is there any document/guide you can point me to to help improve speeds?

Topic		Replies	Views
Extremely long time to load TRT-optimized frozen TF graphs TensorRT	31	10112	October 12, 2021
TF-TRT INT8 Failing to convert due to no calibration TensorRT	3	1386	April 2, 2019
TensorRT 4: subgraph conversion error for subgraph_index:1 due to: "Unimplemented: Not supported constant type... TensorRT	1	730	October 8, 2018
No SpeedUp after TensorRT INT8 (PointNet ++ tensorflow model) TensorRT	6	1261	February 25, 2020
No improvement in inference performance after Opt. with TensorRT TensorRT	6	1229	April 15, 2020
TF-TRT failing to convert with INT32 values TensorRT	8	1066	April 15, 2019
[TFTRT 4.0.1.6] TFTRT 4.0.1.6 optimize Inception i3d network failure on FP32 mode TensorRT	6	1172	September 25, 2018
TensorRT: Can't find device placement for op TensorRT	3	1475	September 18, 2019
use tensorflow tensorrt API convert failed TensorRT	7	2958	May 2, 2018
TRT optimize graph not faster than unoptimized (nvidia/tensorrt:19.01-py3 image) TensorRT	7	2167	October 12, 2021

Graph conversion to FP16 not working

import uff

Related topics