Graph conversion to FP16 not working

I am trying to covert a Tensorflow pb file with the following script:

import tensorflow as tf
from tensorflow.contrib import tensorrt as trt

def load_graph(file):
with tf.gfile.GFile(file, ‘rb’) as f:
graph_def = tf.GraphDef()
graph_def.ParseFromString(f.read())
with tf.Graph().as_default() as graph:
tf.import_graph_def(graph_def)
return graph, graph_def

graph, graph_def = load_graph(’/project/model/mask_rcnn_tf_model.pb’)
tensorrt_graph = trt.create_inference_graph(graph_def, outputs=output_names, max_batch_size=1, precision_mode=‘FP16’)

with tf.gfile.GFile(’/project/output/tensorrt_model.pb’, ‘wb’) as f:
f.write(tensorrt_graph.SerializeToString())

I get the following error: Check failed: (int)tensor_l->getType() == (int)dtype (0 vs. 3)

I am using the tensorflow:19.01-py3 docker image provided by Nvidia.

The full stack trace is:
2019-02-05 15:46:08.585743: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:957] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-02-05 15:46:08.586184: I tensorflow/core/grappler/devices.cc:51] Number of eligible GPUs (core count >= 8): 1
2019-02-05 15:46:08.586304: I tensorflow/core/grappler/clusters/single_machine.cc:359] Starting new session
2019-02-05 15:46:08.595975: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: Tesla V100-SXM2-16GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0000:00:1e.0
totalMemory: 15.75GiB freeMemory: 15.34GiB
2019-02-05 15:46:08.596009: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-02-05 15:46:09.170920: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-02-05 15:46:09.170977: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
2019-02-05 15:46:09.170987: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
2019-02-05 15:46:09.171326: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14813 MB memory) -> physical GPU (device: 0, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:1e.0, compute capability: 7.0)
2019-02-05 15:46:13.494251: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:853] MULTIPLE tensorrt candidate conversion: 19
2019-02-05 15:46:13.495752: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3056] Segment @scope ‘’, converted to graph
2019-02-05 15:46:13.495778: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can’t find a device placement for the op!
2019-02-05 15:46:13.501540: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3056] Segment @scope ‘’, converted to graph
2019-02-05 15:46:13.501572: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can’t find a device placement for the op!
2019-02-05 15:46:13.502830: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3056] Segment @scope ‘mrcnn_bbox_fc/’, converted to graph
2019-02-05 15:46:13.502859: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can’t find a device placement for the op!
2019-02-05 15:46:13.503485: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3056] Segment @scope ‘mrcnn_class_bn1/’, converted to graph
2019-02-05 15:46:13.503513: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can’t find a device placement for the op!
2019-02-05 15:46:13.504255: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3056] Segment @scope ‘mrcnn_class_bn2/’, converted to graph
2019-02-05 15:46:13.504282: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can’t find a device placement for the op!
2019-02-05 15:46:13.505028: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3056] Segment @scope ‘mrcnn_class_conv1/’, converted to graph
2019-02-05 15:46:13.505056: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can’t find a device placement for the op!
2019-02-05 15:46:13.505832: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3056] Segment @scope ‘mrcnn_class_conv2/’, converted to graph
2019-02-05 15:46:13.505859: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can’t find a device placement for the op!
2019-02-05 15:46:13.506673: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3056] Segment @scope ‘mrcnn_class_logits/’, converted to graph
2019-02-05 15:46:13.506700: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can’t find a device placement for the op!
2019-02-05 15:46:13.507570: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3056] Segment @scope ‘mrcnn_mask/’, converted to graph
2019-02-05 15:46:13.507598: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can’t find a device placement for the op!
2019-02-05 15:46:13.508540: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3056] Segment @scope ‘mrcnn_mask_bn1/’, converted to graph
2019-02-05 15:46:13.508568: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can’t find a device placement for the op!
2019-02-05 15:46:13.509561: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3056] Segment @scope ‘mrcnn_mask_bn2/’, converted to graph
2019-02-05 15:46:13.509589: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can’t find a device placement for the op!
2019-02-05 15:46:13.510600: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3056] Segment @scope ‘mrcnn_mask_bn3/’, converted to graph
2019-02-05 15:46:13.510630: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can’t find a device placement for the op!
2019-02-05 15:46:13.511706: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3056] Segment @scope ‘mrcnn_mask_bn4/’, converted to graph
2019-02-05 15:46:13.511734: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can’t find a device placement for the op!
2019-02-05 15:46:13.512889: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3056] Segment @scope ‘mrcnn_mask_conv1/’, converted to graph
2019-02-05 15:46:13.512916: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can’t find a device placement for the op!
2019-02-05 15:46:13.514072: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3056] Segment @scope ‘mrcnn_mask_conv2/’, converted to graph
2019-02-05 15:46:13.514099: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can’t find a device placement for the op!
2019-02-05 15:46:13.515294: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3056] Segment @scope ‘mrcnn_mask_conv3/’, converted to graph
2019-02-05 15:46:13.515322: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can’t find a device placement for the op!
2019-02-05 15:46:13.516579: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3056] Segment @scope ‘mrcnn_mask_conv4/’, converted to graph
2019-02-05 15:46:13.516608: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can’t find a device placement for the op!
2019-02-05 15:46:13.517872: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3056] Segment @scope ‘mrcnn_mask_deconv/’, converted to graph
2019-02-05 15:46:13.517900: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can’t find a device placement for the op!
2019-02-05 15:46:13.519205: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3056] Segment @scope ‘’, converted to graph
2019-02-05 15:46:13.519232: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can’t find a device placement for the op!
2019-02-05 15:46:15.310960: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:952] Engine my_trt_op_0 creation for segment 0, composed of 3 nodes succeeded.
2019-02-05 15:46:15.337865: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:952] Engine my_trt_op_1 creation for segment 1, composed of 13 nodes succeeded.
2019-02-05 15:46:15.365558: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:952] Engine mrcnn_bbox_fc/my_trt_op_2 creation for segment 2, composed of 4 nodes succeeded.
2019-02-05 15:46:15.373029: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:952] Engine mrcnn_class_bn1/my_trt_op_3 creation for segment 3, composed of 5 nodes succeeded.
2019-02-05 15:46:15.380542: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:952] Engine mrcnn_class_bn2/my_trt_op_4 creation for segment 4, composed of 5 nodes succeeded.
2019-02-05 15:46:17.363690: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:952] Engine mrcnn_class_conv1/my_trt_op_5 creation for segment 5, composed of 4 nodes succeeded.
2019-02-05 15:46:18.342739: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:952] Engine mrcnn_class_conv2/my_trt_op_6 creation for segment 6, composed of 4 nodes succeeded.
2019-02-05 15:46:18.369823: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:952] Engine mrcnn_class_logits/my_trt_op_7 creation for segment 7, composed of 4 nodes succeeded.
2019-02-05 15:46:18.727430: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:952] Engine mrcnn_mask/my_trt_op_8 creation for segment 8, composed of 4 nodes succeeded.
2019-02-05 15:46:18.735034: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:952] Engine mrcnn_mask_bn1/my_trt_op_9 creation for segment 9, composed of 5 nodes succeeded.
2019-02-05 15:46:18.742589: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:952] Engine mrcnn_mask_bn2/my_trt_op_10 creation for segment 10, composed of 5 nodes succeeded.
2019-02-05 15:46:18.750114: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:952] Engine mrcnn_mask_bn3/my_trt_op_11 creation for segment 11, composed of 5 nodes succeeded.
2019-02-05 15:46:18.757590: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:952] Engine mrcnn_mask_bn4/my_trt_op_12 creation for segment 12, composed of 5 nodes succeeded.
2019-02-05 15:46:19.194619: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:952] Engine mrcnn_mask_conv1/my_trt_op_13 creation for segment 13, composed of 4 nodes succeeded.
2019-02-05 15:46:19.633464: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:952] Engine mrcnn_mask_conv2/my_trt_op_14 creation for segment 14, composed of 4 nodes succeeded.
2019-02-05 15:46:20.071144: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:952] Engine mrcnn_mask_conv3/my_trt_op_15 creation for segment 15, composed of 4 nodes succeeded.
2019-02-05 15:46:20.507673: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:952] Engine mrcnn_mask_conv4/my_trt_op_16 creation for segment 16, composed of 4 nodes succeeded.
2019-02-05 15:46:20.515332: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:952] Engine mrcnn_mask_deconv/my_trt_op_17 creation for segment 17, composed of 3 nodes succeeded.
2019-02-05 15:46:20.515680: F tensorflow/contrib/tensorrt/convert/convert_nodes.cc:1430] Check failed: (int)tensor_l->getType() == (int)dtype (0 vs. 3)

Hello,

to help us debug, can you share a small repro containing the .pb and full conversion code (you posted a snippet above) that demonstrate the error you are seeing?

thanks,
NVES

Sure. Full script:

import tensorflow as tf
from tensorflow.contrib import tensorrt as trt

import uff

def load_graph(file):
with tf.gfile.GFile(file, ‘rb’) as f:
graph_def = tf.GraphDef()
graph_def.ParseFromString(f.read())
with tf.Graph().as_default() as graph:
tf.import_graph_def(graph_def)
return graph, graph_def

output_names = [
‘output_detections:0’,
‘output_mrcnn_class:0’,
‘output_mrcnn_bbox:0’,
‘output_mrcnn_mask:0’,
‘output_rois:0’
]

graph, graph_def = load_graph(’/project/model/mask_rcnn_tf_model.pb’)
tensorrt_graph = trt.create_inference_graph(graph_def, outputs=output_names, max_batch_size=1, precision_mode=‘FP16’)

with tf.gfile.GFile(’/project/output/tensorrt_model.pb’, ‘wb’) as f:
f.write(tensorrt_graph.SerializeToString())

Link to .pb:
https://drive.google.com/file/d/1FhCei-ODiRTm9dINHaXQxmRGJVq93U4e/view?usp=sharing

thanks for the additional info. we are triaging and will keep you updated.

Hello,

Per engineering, was able to successfully convert the model using a newer version of tensorflow. Could you please try again using tf-nightly-gpu? It can be installed using

$ pip install tf-nightly-gpu

The user will also need to by add

is_dynamic_op=True

to the args in create_inference_graph().

Thanks! That solution worked. I’ve confirmed the there are now TensorRT engines in my graph. However, my overall speedup seems to be less than 10%. I am using batch inference. I am using a single V100 on AWS. Is there any document/guide you can point me to to help improve speeds?