** UPDATE, ALREADY FIND A SOLUTION **
Hi, I wanna share that I already find a solution for my case. The modified code can be found here: https://drive.google.com/file/d/1GX-zmP-OQP3mtbAQWzOFQeyOhterZLcV/view?usp=sharing. The modifications are:
- Needs to define tf.Session with config=tf.ConfigProto(gpu_options=tf.GPUOptions(per_process_gpu_memory_fraction=0.50)) before performing TensorRT graph optimization
- Needs to isolate the inference session, e.g., by making the inference code in a separated function
Here is the output result:
2018-12-25 10:15:13.230369: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-12-25 10:15:13.311395: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-12-25 10:15:13.311791: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: GeForce GTX 1060 6GB major: 6 minor: 1 memoryClockRate(GHz): 1.759
pciBusID: 0000:01:00.0
totalMemory: 5.92GiB freeMemory: 5.14GiB
2018-12-25 10:15:13.311806: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2018-12-25 10:15:13.689522: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-12-25 10:15:13.689552: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
2018-12-25 10:15:13.689558: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
2018-12-25 10:15:13.689727: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3032 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1)
WARNING:tensorflow:From /home/cvrc/development_dir/Keras2TRT/4_inference_using_TensorRT_model_modif.py:55: FastGFile.__init__ (from tensorflow.python.platform.gfile) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.gfile.GFile.
2018-12-25 10:15:21.693342: I tensorflow/core/grappler/devices.cc:51] Number of eligible GPUs (core count >= 8): 1
2018-12-25 10:15:21.693530: I tensorflow/core/grappler/clusters/single_machine.cc:359] Starting new session
2018-12-25 10:15:21.693993: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2018-12-25 10:15:21.694018: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-12-25 10:15:21.694024: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
2018-12-25 10:15:21.694029: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
2018-12-25 10:15:21.694141: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3032 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1)
2018-12-25 10:15:26.100677: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:853] MULTIPLE tensorrt candidate conversion: 2
2018-12-25 10:15:26.100850: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2957] Segment @scope '', converted to graph
2018-12-25 10:15:26.100861: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can't find a device placement for the op!
2018-12-25 10:15:26.108500: I tensorflow/contrib/tensorrt/convert/convert_nodes.cc:2957] Segment @scope '', converted to graph
2018-12-25 10:15:26.108525: E tensorflow/contrib/tensorrt/convert/convert_graph.cc:418] Can't find a device placement for the op!
2018-12-25 10:17:04.664107: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:952] Engine my_trt_op_0 creation for segment 0, composed of 19 nodes succeeded.
2018-12-25 10:17:07.969592: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:952] Engine my_trt_op_1 creation for segment 1, composed of 10 nodes succeeded.
2018-12-25 10:17:11.384931: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:185] TensorRTOptimizer is probably called on funcdef! This optimizer must *NOT* be called on function objects.
2018-12-25 10:17:12.694581: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:185] TensorRTOptimizer is probably called on funcdef! This optimizer must *NOT* be called on function objects.
2018-12-25 10:17:13.012427: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:185] TensorRTOptimizer is probably called on funcdef! This optimizer must *NOT* be called on function objects.
2018-12-25 10:17:13.264191: W tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:185] TensorRTOptimizer is probably called on funcdef! This optimizer must *NOT* be called on function objects.
2018-12-25 10:17:13.265261: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:501] Optimization results for grappler item: tf_graph
2018-12-25 10:17:13.265282: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] constant folding: Graph size after: 39 nodes (-7), 39 edges (-7), time = 2824.71704ms.
2018-12-25 10:17:13.265290: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] layout: Graph size after: 44 nodes (5), 44 edges (5), time = 278.095ms.
2018-12-25 10:17:13.265295: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] TensorRTOptimizer: Graph size after: 17 nodes (-27), 17 edges (-27), time = 102135.352ms.
2018-12-25 10:17:13.265299: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] constant folding: Graph size after: 17 nodes (0), 17 edges (0), time = 1.116ms.
2018-12-25 10:17:13.265302: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] TensorRTOptimizer: Graph size after: 17 nodes (0), 17 edges (0), time = 1950.57495ms.
2018-12-25 10:17:13.265306: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:501] Optimization results for grappler item: my_trt_op_0_native_segment
2018-12-25 10:17:13.265310: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] constant folding: Graph size after: 20 nodes (0), 19 edges (0), time = 1224.9ms.
2018-12-25 10:17:13.265313: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] layout: Graph size after: 20 nodes (0), 19 edges (0), time = 236.859ms.
2018-12-25 10:17:13.265317: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] TensorRTOptimizer: Graph size after: 20 nodes (0), 19 edges (0), time = 0.357ms.
2018-12-25 10:17:13.265321: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] constant folding: Graph size after: 20 nodes (0), 19 edges (0), time = 1309.24304ms.
2018-12-25 10:17:13.265324: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] TensorRTOptimizer: Graph size after: 20 nodes (0), 19 edges (0), time = 0.375ms.
2018-12-25 10:17:13.265328: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:501] Optimization results for grappler item: my_trt_op_1_native_segment
2018-12-25 10:17:13.265331: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] constant folding: Graph size after: 11 nodes (0), 10 edges (0), time = 267.426ms.
2018-12-25 10:17:13.265335: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] layout: Graph size after: 11 nodes (0), 10 edges (0), time = 48.771ms.
2018-12-25 10:17:13.265339: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] TensorRTOptimizer: Graph size after: 11 nodes (0), 10 edges (0), time = 0.161ms.
2018-12-25 10:17:13.265342: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] constant folding: Graph size after: 11 nodes (0), 10 edges (0), time = 251.584ms.
2018-12-25 10:17:13.265346: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:503] TensorRTOptimizer: Graph size after: 11 nodes (0), 10 edges (0), time = 0.166ms.
2018-12-25 10:17:15.751519: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2018-12-25 10:17:15.751561: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-12-25 10:17:15.751568: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
2018-12-25 10:17:15.751573: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
2018-12-25 10:17:15.751689: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3032 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1)
needed time in inference-0: 15.006317853927612
needed time in inference-1: 0.041179656982421875
needed time in inference-2: 0.03866219520568848
needed time in inference-3: 0.03570389747619629
needed time in inference-4: 0.03510284423828125
needed time in inference-5: 0.03611278533935547
needed time in inference-6: 0.04064226150512695
needed time in inference-7: 0.04653620719909668
needed time in inference-8: 0.03874993324279785
needed time in inference-9: 0.034958839416503906
needed time in inference-10: 0.03500032424926758
needed time in inference-11: 0.03582310676574707
needed time in inference-12: 0.04033660888671875
needed time in inference-13: 0.040354251861572266
needed time in inference-14: 0.03400826454162598
needed time in inference-15: 0.033091068267822266
needed time in inference-16: 0.03689217567443848
needed time in inference-17: 0.03311920166015625
needed time in inference-18: 0.03401374816894531
needed time in inference-19: 0.03716325759887695
needed time in inference-20: 0.03750157356262207
needed time in inference-21: 0.03295445442199707
needed time in inference-22: 0.03301501274108887
needed time in inference-23: 0.03294515609741211
needed time in inference-24: 0.03882884979248047
needed time in inference-25: 0.03882479667663574
needed time in inference-26: 0.03724384307861328
needed time in inference-27: 0.03335261344909668
needed time in inference-28: 0.033097267150878906
needed time in inference-29: 0.03336834907531738
needed time in inference-30: 0.03383231163024902
needed time in inference-31: 0.03699827194213867
needed time in inference-32: 0.03634238243103027
needed time in inference-33: 0.034226417541503906
needed time in inference-34: 0.03304028511047363
needed time in inference-35: 0.032994747161865234
needed time in inference-36: 0.03297996520996094
needed time in inference-37: 0.04032158851623535
needed time in inference-38: 0.03634953498840332
needed time in inference-39: 0.033815622329711914
needed time in inference-40: 0.036960601806640625
needed time in inference-41: 0.033074140548706055
needed time in inference-42: 0.032826900482177734
needed time in inference-43: 0.036762237548828125
needed time in inference-44: 0.03467607498168945
needed time in inference-45: 0.0336000919342041
needed time in inference-46: 0.03297138214111328
needed time in inference-47: 0.032752037048339844
needed time in inference-48: 0.0335540771484375
needed time in inference-49: 0.036481618881225586
average inference time: 0.3351892137527466
To be honest, I find this solution is only by doing trial one by one without knowing the reason behind that. lol. If you have a related explanation, will be glad to know.
Thanks.