TF_TRT Can't find a device placement for the op!

tom.huang · November 27, 2018, 6:06am

Platform : nvidia Xavier
using JetPack 4.1 to install xavier OS / CUDA / cudnn / TensorRT
Jetpack : 4.1
Tensorrt : tensorrt_5.0.3.2-1+cuda10.0_arm64
Tensorflow : tensorflow_gpu-1.12.0rc2+nv18.11-cp36-cp36m-linux_aarch64.whl

Problem:
Can’t find a device placement for the op!

link for inv3.pb
https://smasoft-my.sharepoint.com/:u:/g/personal/tom_huang_smasoft_com_tw/ERg5hC2AVdpHgHLUIkgBQQUBghXf2zrP5es3ZVoQqo5Gvg?e=4hbQSf

import tensorflow as tf
import tensorflow.contrib.tensorrt as trt

fn = 'trt_graph10_'+FP
frozen_graph_fn='inv3.pb'
output_name='prediction'
input_name = 'input'

with tf.gfile.GFile(frozen_graph_fn,'rb') as f:
    graph_def = tf.GraphDef()
    graph_def.ParseFromString(f.read())

trt_graph = trt.create_inference_graph(
    input_graph_def=graph_def,
    outputs=[output_name],
    max_batch_size=1,
    max_workspace_size_bytes=1342177280,
    precision_mode='FP'+FP,
    minimum_segment_size=50
)

terminal output:

2018-11-27 13:53:16.707869: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:924] ARM64 does not support NUMA - returning NUMA node zero
2018-11-27 13:53:16.708342: I tensorflow/core/grappler/devices.cc:51] Number of eligible GPUs (core count >= 8): 1
2018-11-27 13:53:16.709143: I tensorflow/core/grappler/clusters/single_machine.cc:359] Starting new session
2018-11-27 13:53:16.739946: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: 
name: Xavier major: 7 minor: 2 memoryClockRate(GHz): 1.5
pciBusID: 0000:00:00.0
totalMemory: 15.45GiB freeMemory: 9.48GiB
2018-11-27 13:53:16.740300: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2018-11-27 13:53:20.110821: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-11-27 13:53:20.111248: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 
2018-11-27 13:53:20.111336: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N 
2018-11-27 13:53:20.111848: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8494 MB memory) -> physical GPU (device: 0, name: Xavier, pci bus id: 0000:00:00.0, compute capability: 7.2)
2018-11-27 13:53:25.941717: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3056] Segment @scope 'InceptionV3/', converted to graph
2018-11-27 13:53:25.942199: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can't find a device placement for the op!
2018-11-27 13:54:52.317595: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:952] Engine InceptionV3/my_trt_op_0 creation for segment 0, composed of 793 nodes succeeded.
2018-11-27 13:54:54.331525: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:185] TensorRTOptimizer is probably called on funcdef! This optimizer must *NOT* be called on function objects.
2018-11-27 13:54:54.962589: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:185] TensorRTOptimizer is probably called on funcdef! This optimizer must *NOT* be called on function objects.
2018-11-27 13:54:55.138503: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:501] Optimization results for grappler item: tf_graph
2018-11-27 13:54:55.138813: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   constant folding: Graph size after: 802 nodes (-386), 836 edges (-386), time = 1548.64197ms.
2018-11-27 13:54:55.138878: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   layout: Graph size after: 817 nodes (15), 840 edges (4), time = 386.45ms.
2018-11-27 13:54:55.138922: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   TensorRTOptimizer: Graph size after: 25 nodes (-792), 13 edges (-827), time = 87273.8125ms.
2018-11-27 13:54:55.139067: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   constant folding: Graph size after: 14 nodes (-11), 13 edges (0), time = 161.317ms.
2018-11-27 13:54:55.139120: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   TensorRTOptimizer: Graph size after: 14 nodes (0), 13 edges (0), time = 393.6ms.
2018-11-27 13:54:55.139162: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:501] Optimization results for grappler item: InceptionV3/my_trt_op_0_native_segment
2018-11-27 13:54:55.139204: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   constant folding: Graph size after: 794 nodes (0), 828 edges (0), time = 868.905ms.
2018-11-27 13:54:55.139244: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   layout: Invalid argument: The graph is already optimized by layout optimizer.
2018-11-27 13:54:55.139295: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   TensorRTOptimizer: Graph size after: 794 nodes (0), 828 edges (0), time = 56.107ms.
2018-11-27 13:54:55.139341: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   constant folding: Graph size after: 794 nodes (0), 828 edges (0), time = 576.317ms.
2018-11-27 13:54:55.139439: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503]   TensorRTOptimizer: Graph size after: 794 nodes (0), 828 edges (0), time = 56.435ms.

377970563 · November 28, 2018, 7:56am

I have the same problem and solve it with the help of this page: https://github.com/tensorflow/tensorflow/issues/21487

As a work around, it should work if you add with graph.device(‘gpu:0’) when building your model for training. Or you may read the ./log/freeze_graph.pb, import it inside a with graph.device(‘gpu:0’) context, write it out as a new ./log/freeze_graph.pb, and use the new one to do the conversion.

NVES · November 28, 2018, 5:30pm

Hello,

“Can’t find a device placement for the op!” means that TF-TRT does not know which device the op is going to run on. It is not really an error.

It means that user tried offline conversion where ops are not assigned to any device yet. Because of this, don’t know which device to pick for TRT and allocate memory from. It is safe to ignore for Xavier. we will update the error message upstream. This is not a bug.

1484601833 · September 18, 2019, 7:34am

So how can we appoint the contain device to run?

325804824 · September 18, 2019, 1:45pm

So how can we appoint the contain device to run?