hello,
I’m not seeing the performance issue described above:
root@67ad5eeeaa9d:/home/scratch.zhenyi_sw/repro2490943/trt_test# python load.py
------------- Load the TF graph from the pre-build pb file: ./ssd_mobilenet_v2_coco.pb -------------
------------- Load time: 0.17 sec
------------- Load the TF graph from the pre-build pb file: ./ssd_mobilenet_v2_coco_fp16_trt.pb -------------
------------- Load time: 0.26 sec
root@67ad5eeeaa9d:/home/scratch.zhenyi_sw/repro2490943/trt_test# python build.py
------------- Load frozen graph from disk -------------
------------- Optimize the model with TensorRT -------------
2019-02-04 17:34:11.776013: I tensorflow/core/grappler/devices.cc:51] Number of eligible GPUs (core count >= 8): 8
2019-02-04 17:34:11.776257: I tensorflow/core/grappler/clusters/single_machine.cc:359] Starting new session
2019-02-04 17:34:11.793709: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: Tesla V100-SXM2-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0000:06:00.0
totalMemory: 31.72GiB freeMemory: 31.31GiB
2019-02-04 17:34:11.794382: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 1 with properties:
name: Tesla V100-SXM2-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0000:07:00.0
totalMemory: 31.72GiB freeMemory: 31.31GiB
2019-02-04 17:34:11.795045: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 2 with properties:
name: Tesla V100-SXM2-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0000:0a:00.0
totalMemory: 31.72GiB freeMemory: 31.31GiB
2019-02-04 17:34:11.795682: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 3 with properties:
name: Tesla V100-SXM2-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0000:0b:00.0
totalMemory: 31.72GiB freeMemory: 31.31GiB
2019-02-04 17:34:11.796312: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 4 with properties:
name: Tesla V100-SXM2-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0000:85:00.0
totalMemory: 31.72GiB freeMemory: 31.31GiB
2019-02-04 17:34:11.796945: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 5 with properties:
name: Tesla V100-SXM2-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0000:86:00.0
totalMemory: 31.72GiB freeMemory: 31.31GiB
2019-02-04 17:34:11.797599: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 6 with properties:
name: Tesla V100-SXM2-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0000:89:00.0
totalMemory: 31.72GiB freeMemory: 31.31GiB
2019-02-04 17:34:11.798238: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 7 with properties:
name: Tesla V100-SXM2-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0000:8a:00.0
totalMemory: 31.72GiB freeMemory: 31.31GiB
2019-02-04 17:34:11.798534: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0, 1, 2, 3, 4, 5, 6, 7
2019-02-04 17:34:15.868392: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-02-04 17:34:15.868455: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0 1 2 3 4 5 6 7
2019-02-04 17:34:15.868466: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N Y Y Y Y N N N
2019-02-04 17:34:15.868473: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 1: Y N Y Y N Y N N
2019-02-04 17:34:15.868484: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 2: Y Y N Y N N Y N
2019-02-04 17:34:15.868521: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 3: Y Y Y N N N N Y
2019-02-04 17:34:15.868529: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 4: Y N N N N Y Y Y
2019-02-04 17:34:15.868535: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 5: N Y N N Y N Y Y
2019-02-04 17:34:15.868558: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 6: N N Y N Y Y N Y
2019-02-04 17:34:15.868565: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 7: N N N Y Y Y Y N
2019-02-04 17:34:15.871958: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 30342 MB memory) -> physical GPU (device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:06:00.0, compute capability: 7.0)
2019-02-04 17:34:15.872761: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 30342 MB memory) -> physical GPU (device: 1, name: Tesla V100-SXM2-32GB, pci bus id: 0000:07:00.0, compute capability: 7.0)
2019-02-04 17:34:15.873333: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 30342 MB memory) -> physical GPU (device: 2, name: Tesla V100-SXM2-32GB, pci bus id: 0000:0a:00.0, compute capability: 7.0)
2019-02-04 17:34:15.873934: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 30342 MB memory) -> physical GPU (device: 3, name: Tesla V100-SXM2-32GB, pci bus id: 0000:0b:00.0, compute capability: 7.0)
2019-02-04 17:34:15.874547: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:4 with 30342 MB memory) -> physical GPU (device: 4, name: Tesla V100-SXM2-32GB, pci bus id: 0000:85:00.0, compute capability: 7.0)
2019-02-04 17:34:15.875042: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:5 with 30342 MB memory) -> physical GPU (device: 5, name: Tesla V100-SXM2-32GB, pci bus id: 0000:86:00.0, compute capability: 7.0)
2019-02-04 17:34:15.875513: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:6 with 30342 MB memory) -> physical GPU (device: 6, name: Tesla V100-SXM2-32GB, pci bus id: 0000:89:00.0, compute capability: 7.0)
2019-02-04 17:34:15.875966: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:7 with 30342 MB memory) -> physical GPU (device: 7, name: Tesla V100-SXM2-32GB, pci bus id: 0000:8a:00.0, compute capability: 7.0)
2019-02-04 17:34:17.998323: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:868] MULTIPLE tensorrt candidate conversion: 2
2019-02-04 17:34:18.001120: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3058] Segment @scope '', converted to graph
2019-02-04 17:34:18.001139: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:433] Can't find a device placement for the op!
2019-02-04 17:34:18.007434: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3058] Segment @scope '', converted to graph
2019-02-04 17:34:18.007455: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:433] Can't find a device placement for the op!
2019-02-04 17:34:20.488707: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:967] Engine my_trt_op_0 creation for segment 0, composed of 2 nodes succeeded.
2019-02-04 17:34:20.548541: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:967] Engine my_trt_op_1 creation for segment 1, composed of 3 nodes succeeded.
2019-02-04 17:34:21.991633: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:185] TensorRTOptimizer is probably called on funcdef! This optimizer must *NOT* be called on function objects.
2019-02-04 17:34:21.992225: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:185] TensorRTOptimizer is probably called on funcdef! This optimizer must *NOT* be called on function objects.
2019-02-04 17:34:22.000877: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:185] TensorRTOptimizer is probably called on funcdef! This optimizer must *NOT* be called on function objects.
2019-02-04 17:34:22.001457: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:185] TensorRTOptimizer is probably called on funcdef! This optimizer must *NOT* be called on function objects.
2019-02-04 17:34:22.006006: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:501] Optimization results for grappler item: tf_graph
2019-02-04 17:34:22.006029: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] constant folding: Graph size after: 6035 nodes (-1940), 10082 edges (-2174), time = 832.385ms.
2019-02-04 17:34:22.006037: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] layout: Graph size after: 6226 nodes (191), 10284 edges (202), time = 233.816ms.
2019-02-04 17:34:22.006044: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] TensorRTOptimizer: Graph size after: 6223 nodes (-3), 10281 edges (-3), time = 3354.4ms.
2019-02-04 17:34:22.006050: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] constant folding: Graph size after: 6053 nodes (-170), 10111 edges (-170), time = 562.532ms.
2019-02-04 17:34:22.006058: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] TensorRTOptimizer: Graph size after: 6053 nodes (0), 10111 edges (0), time = 774.5ms.
2019-02-04 17:34:22.006092: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:501] Optimization results for grappler item: my_trt_op_0_native_segment
2019-02-04 17:34:22.006099: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] constant folding: Graph size after: 8 nodes (0), 7 edges (0), time = 3.896ms.
2019-02-04 17:34:22.006120: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] layout: Graph size after: 8 nodes (0), 7 edges (0), time = 0.283ms.
2019-02-04 17:34:22.006127: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] TensorRTOptimizer: Graph size after: 8 nodes (0), 7 edges (0), time = 0.094ms.
2019-02-04 17:34:22.006134: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] constant folding: Graph size after: 8 nodes (0), 7 edges (0), time = 0.461ms.
2019-02-04 17:34:22.006141: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] TensorRTOptimizer: Graph size after: 8 nodes (0), 7 edges (0), time = 0.07ms.
2019-02-04 17:34:22.006164: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:501] Optimization results for grappler item: my_trt_op_1_native_segment
2019-02-04 17:34:22.006171: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] constant folding: Graph size after: 9 nodes (0), 8 edges (0), time = 3.521ms.
2019-02-04 17:34:22.006178: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] layout: Graph size after: 9 nodes (0), 8 edges (0), time = 0.284ms.
2019-02-04 17:34:22.006185: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] TensorRTOptimizer: Graph size after: 9 nodes (0), 8 edges (0), time = 0.077ms.
2019-02-04 17:34:22.006199: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] constant folding: Graph size after: 9 nodes (0), 8 edges (0), time = 0.495ms.
2019-02-04 17:34:22.006207: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] TensorRTOptimizer: Graph size after: 9 nodes (0), 8 edges (0), time = 0.07ms.
------------- Write optimized model to the file -------------
------------- DONE! -------------