TensortRT has no effect on ssd_mobilenet_v1_fpn_coco model

When I use the ssd_mobilenet_v1_fpn_coco model to use tensorRT to accelerate,It doesn’t work

retinanet mobile no tensorRT
Iteration: 0.430 sec
Iteration: 0.421 sec
Iteration: 0.420 sec
Iteration: 0.427 sec
Iteration: 0.439 sec
Iteration: 0.427 sec
Iteration: 0.411 sec
Iteration: 0.424 sec
Iteration: 0.432 sec
Iteration: 0.429 sec
Iteration: 0.413 sec
Iteration: 0.424 sec
Iteration: 0.424 sec
Iteration: 0.428 sec
Iteration: 0.427 sec
Iteration: 0.431 sec
Iteration: 0.417 sec
Iteration: 0.418 sec
tensorRT
0.505087852478
0.504916906357
0.501970052719
0.505352973938
0.494786024094
0.498456954956
0.504287004471
0.50328707695
0.507141113281
0.499255895615
0.487679004669
0.489063978195
0.492527008057
0.503779172897
0.514405965805

log:
retinanet v1
(‘config_path’, ‘./data/ssd_mobilenet_v1_fpn_shared_box_predictor_640x640_coco14_sync_2018_07_03/pipeline.config’)
(‘checkpoint_path’, ‘./data/ssd_mobilenet_v1_fpn_shared_box_predictor_640x640_coco14_sync_2018_07_03/model.ckpt’)
2018-09-03 09:24:44.137510: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:865] ARM64 does not support NUMA - returning NUMA node zero
2018-09-03 09:24:44.137784: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties:
name: NVIDIA Tegra X2 major: 6 minor: 2 memoryClockRate(GHz): 1.3005
pciBusID: 0000:00:00.0
totalMemory: 7.67GiB freeMemory: 4.45GiB
2018-09-03 09:24:44.137850: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-09-03 09:24:47.792908: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-09-03 09:24:47.793169: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0
2018-09-03 09:24:47.793277: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N
2018-09-03 09:24:47.793573: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2913 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
Converted 333 variables to const ops.
2018-09-03 09:26:07.919932: I tensorflow/core/grappler/devices.cc:51] Number of eligible GPUs (core count >= 8): 0
2018-09-03 09:26:16.036617: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:383] MULTIPLE tensorrt candidate conversion: 4
2018-09-03 09:26:16.057791: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] subgraph conversion error for subgraph_index:0 due to: “Unimplemented: Require 4 dimensional input. Got 0 const6” SKIPPING…( 108 nodes)
2018-09-03 09:26:16.064689: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] subgraph conversion error for subgraph_index:1 due to: “Unimplemented: Require 4 dimensional input. Got 0 const6” SKIPPING…( 108 nodes)
2018-09-03 09:26:16.837392: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] subgraph conversion error for subgraph_index:2 due to: “Invalid argument: Output node ‘const6’ is weights not tensor” SKIPPING…( 612 nodes)
2018-09-03 09:26:16.842941: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] subgraph conversion error for subgraph_index:3 due to: “Unimplemented: Require 4 dimensional input. Got 1 Postprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/zeros_like_47” SKIPPING…( 181 nodes)
[‘boxes’, ‘classes’, ‘scores’]
2018-09-03 09:27:47.320719: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-09-03 09:27:47.320894: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-09-03 09:27:47.320933: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0
2018-09-03 09:27:47.320967: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N
2018-09-03 09:27:47.321106: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2913

thanks !

Hi,

Not all TensorFlow layers are supported by TensorRT.
If a layer is not supported, TensorFlow implementation will be used and may cause the conversion overhead.

Could you check each layer is executed with TensorFlow or TensorRT first?
Thanks.