I converted my model tf to trt in xavier FP16 like below , when i run it in inference i get the same speed without optimization , i used this code
with graph.as_default():
with tf.Session() as sess:
trt_graph = trt.create_inference_graph(
input_graph_def=gdef,
outputs=[‘features’],
max_batch_size=32,
max_workspace_size_bytes=7000000000,
is_dynamic_op=True,
precision_mode=‘FP16’)
what’s wrong?
Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6960 MB memory) → physical GPU (device: 0, name: Xavier, pci bus id: 0000:00:00.0, compute capability: 7.2)
2020-04-16 11:33:36.026142: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:460] There are 50 ops of 23 different types in the graph that are not converted to TensorRT: TensorArrayGatherV3, Exit, NextIteration, TensorArrayReadV3, Switch, StridedSlice, Shape, Cast, Reshape, TensorArrayScatterV3, TensorArraySizeV3, TensorArrayWriteV3, NoOp, TensorArrayV3, Placeholder, Range, Enter, Less, Merge, LogicalAnd, LoopCond, Identity, Add, (For more information see Accelerating Inference In TF-TRT User Guide :: NVIDIA Deep Learning Frameworks Documentation).
2020-04-16 11:33:36.041563: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:735] Number of TensorRT candidate segments: 4
2020-04-16 11:33:36.079573: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2020-04-16 11:33:36.080101: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:837] TensorRT node TRTEngineOp_0 added for segment 0 consisting of 7 nodes succeeded.
2020-04-16 11:33:36.083233: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:837] TensorRT node TRTEngineOp_1 added for segment 1 consisting of 137 nodes succeeded.
2020-04-16 11:33:36.084416: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:837] TensorRT node TRTEngineOp_2 added for segment 2 consisting of 10 nodes succeeded.
2020-04-16 11:33:36.097784: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:837] TensorRT node TRTEngineOp_3 added for segment 3 consisting of 12 nodes succeeded.
2020-04-16 11:33:36.109449: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:752] Optimization results for grappler item: tf_graph
2020-04-16 11:33:36.109576: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:754] constant folding: Graph size after: 226 nodes (-68), 257 edges (-68), time = 95.428ms.
2020-04-16 11:33:36.109630: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:754] layout: Graph size after: 234 nodes (8), 269 edges (12), time = 25.178ms.
2020-04-16 11:33:36.109690: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:754] constant folding: Graph size after: 234 nodes (0), 269 edges (0), time = 26.862ms.
2020-04-16 11:33:36.109718: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:754] TensorRTOptimizer: Graph size after: 72 nodes (-162), 100 edges (-169), time = 104.099ms.