No improvement in inference performance after Opt. with TensorRT

I am trying to optimize a frozen graph of (FasterRCNN-Resnet101) with tensorrt and run the inference. However, in TF1.14.0 after optimization there is no improvement visible in inference time (250 ms). It seems trt_engine_ops are 0 after conversion !!! I don’t understand the reason that no TRT operation was created.

import tensorflow as tf
import numpy as np
import time
from PIL import Image

im = Image.open("image_test_resized.png")
np_image= np.array(im)
im=np.expand_dims(np_image, axis=0)

from tensorflow.python.compiler.tensorrt import trt_convert as trt
with tf.Session() as sess:
	with tf.gfile.GFile("frozen_inference_graph.pb", "rb") as f:
		frozen_graph = tf.GraphDef()
		frozen_graph.ParseFromString(f.read())
	converter = trt.TrtGraphConverter(
		is_dynamic_op=True,
		input_graph_def=frozen_graph,
		nodes_blacklist=["detection_boxes", "detection_scores", "detection_classes", "num_detections"],precision_mode='FP16')
	trt_graph = converter.convert()
	output_node = tf.import_graph_def(
		trt_graph,
		input_map={'image_tensor:0':im},
		return_elements=["detection_boxes", "detection_scores", "detection_classes", "num_detections"])

	trt_engine_ops = len([1 for n in trt_graph.node if str(n.op) == 'TRTEngineOp'])
	print("numb. of trt_engine_ops in trt_graph", trt_engine_ops)
	start = time.time()
	sess.run(output_node)
	end = time.time()
	print("Executed TF Detection on image in {0} seconds".format(end - start))

The result I get:

WARNING:tensorflow:From pb_to_TRT2.py:18: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

2020-01-28 22:11:40.055861: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2020-01-28 22:11:40.089333: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: 
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545
pciBusID: 0000:19:00.0
2020-01-28 22:11:40.089432: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/compat/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
2020-01-28 22:11:40.089469: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/compat/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
2020-01-28 22:11:40.089502: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/compat/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
2020-01-28 22:11:40.089552: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/compat/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
2020-01-28 22:11:40.089586: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/compat/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
2020-01-28 22:11:40.089619: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/compat/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
2020-01-28 22:11:40.092587: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2020-01-28 22:11:40.092607: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1663] Cannot dlopen some GPU libraries. Skipping registering GPU devices...
2020-01-28 22:11:40.092862: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2020-01-28 22:11:40.209303: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x51d4eb0 executing computations on platform CUDA. Devices:
2020-01-28 22:11:40.209360: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): GeForce RTX 2080 Ti, Compute Capability 7.5
2020-01-28 22:11:40.232089: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3300000000 Hz
2020-01-28 22:11:40.235084: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5887d80 executing computations on platform Host. Devices:
2020-01-28 22:11:40.235130: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
2020-01-28 22:11:40.235236: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-01-28 22:11:40.235257: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      
WARNING:tensorflow:From pb_to_TRT2.py:22: The name tf.gfile.GFile is deprecated. Please use tf.io.gfile.GFile instead.

WARNING:tensorflow:From pb_to_TRT2.py:24: The name tf.GraphDef is deprecated. Please use tf.compat.v1.GraphDef instead.

2020-01-28 22:11:42.626568: I tensorflow/core/grappler/devices.cc:55] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 1
2020-01-28 22:11:42.627180: I tensorflow/core/grappler/clusters/single_machine.cc:359] Starting new session
2020-01-28 22:11:42.631030: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: 
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545
pciBusID: 0000:19:00.0
2020-01-28 22:11:42.631139: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/compat/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
2020-01-28 22:11:42.631174: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/compat/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
2020-01-28 22:11:42.631221: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/compat/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
2020-01-28 22:11:42.631252: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/compat/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
2020-01-28 22:11:42.631282: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/compat/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
2020-01-28 22:11:42.631312: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/compat/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
2020-01-28 22:11:42.631320: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2020-01-28 22:11:42.631327: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1663] Cannot dlopen some GPU libraries. Skipping registering GPU devices...
2020-01-28 22:11:42.631340: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-01-28 22:11:42.631346: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0 
2020-01-28 22:11:42.631353: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N 
2020-01-28 22:11:44.567833: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:716] Optimization results for grappler item: tf_graph
2020-01-28 22:11:44.567868: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:718]   constant folding: Graph size after: 11171 nodes (0), 19239 edges (1), time = 939.437ms.
2020-01-28 22:11:44.567873: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:718]   layout: layout did nothing. time = 12.066ms.
2020-01-28 22:11:44.567879: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:718]   constant folding: Graph size after: 11171 nodes (0), 19239 edges (0), time = 365.909ms.
numb. of trt_engine_ops in trt_graph 0
2020-01-28 22:11:49.478312: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set.  If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU.  To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.
Executed TF Detection on image in 3.685523271560669 seconds
Executed TF Detection on image in 0.23265981674194336 seconds
Executed TF Detection on image in 0.2277967929840088 seconds
Executed TF Detection on image in 0.22919106483459473 seconds

I also tried TF 1.15.0. There I can see that the optimization worked by having 13 trt_engine_ops:

2020-01-28 22:14:16.341746: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:460] There are 5772 ops of 53 different types in the graph that are not converted to TensorRT: TopKV2, NonMaxSuppressionV2, CropAndResize, Fill, Split, Transpose, Gather, Where, Equal, Tile, Reshape, Assert, Const, Exit, NoOp, Pack, LoopCond, Merge, ZerosLike, Range, Less, TensorArraySizeV3, Placeholder, TensorArrayV3, TensorArrayScatterV3, Cast, Shape, Minimum, Switch, TensorArrayReadV3, StridedSlice, Maximum, RealDiv, Slice, LogicalAnd, Mul, Round, TensorArrayWriteV3, GreaterEqual, Max, Size, Greater, Sub, ConcatV2, Unpack, NextIteration, Identity, ExpandDims, ResizeBilinear, Enter, Squeeze, Add, TensorArrayGatherV3, (For more information see https://docs.nvidia.com/deeplearning/frameworks/tf-trt-user-guide/index.html#supported-ops).
2020-01-28 22:14:16.443724: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:633] Number of TensorRT candidate segments: 13
2020-01-28 22:14:16.514867: E tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:101] Could not find any TF GPUs
2020-01-28 22:14:16.514898: W tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:723] Can't identify the cuda device. Running on device 0 
2020-01-28 22:14:16.514969: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:734] TensorRT node ClipToWindow/TRTEngineOp_0 added for segment 0 consisting of 8 nodes succeeded.
2020-01-28 22:14:16.515007: E tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:101] Could not find any TF GPUs
2020-01-28 22:14:16.515013: W tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:723] Can't identify the cuda device. Running on device 0 
2020-01-28 22:14:16.515042: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:734] TensorRT node TRTEngineOp_1 added for segment 1 consisting of 18 nodes succeeded.
2020-01-28 22:14:16.515089: E tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:101] Could not find any TF GPUs
2020-01-28 22:14:16.515095: W tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:723] Can't identify the cuda device. Running on device 0 
2020-01-28 22:14:16.515121: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:734] TensorRT node TRTEngineOp_2 added for segment 2 consisting of 18 nodes succeeded.
2020-01-28 22:14:16.515165: E tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:101] Could not find any TF GPUs
2020-01-28 22:14:16.515170: W tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:723] Can't identify the cuda device. Running on device 0 
2020-01-28 22:14:16.515197: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:734] TensorRT node TRTEngineOp_3 added for segment 3 consisting of 18 nodes succeeded.
2020-01-28 22:14:16.515241: E tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:101] Could not find any TF GPUs
2020-01-28 22:14:16.515246: W tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:723] Can't identify the cuda device. Running on device 0 
2020-01-28 22:14:16.515271: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:734] TensorRT node TRTEngineOp_4 added for segment 4 consisting of 18 nodes succeeded.
2020-01-28 22:14:16.515307: E tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:101] Could not find any TF GPUs
2020-01-28 22:14:16.515312: W tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:723] Can't identify the cuda device. Running on device 0 
2020-01-28 22:14:16.515347: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:734] TensorRT node TRTEngineOp_5 added for segment 5 consisting of 519 nodes succeeded.
2020-01-28 22:14:16.516951: E tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:101] Could not find any TF GPUs
2020-01-28 22:14:16.516972: W tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:723] Can't identify the cuda device. Running on device 0 
2020-01-28 22:14:16.517018: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:734] TensorRT node TRTEngineOp_6 added for segment 6 consisting of 4 nodes succeeded.
2020-01-28 22:14:16.517043: E tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:101] Could not find any TF GPUs
2020-01-28 22:14:16.517048: W tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:723] Can't identify the cuda device. Running on device 0 
2020-01-28 22:14:16.517078: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:734] TensorRT node TRTEngineOp_7 added for segment 7 consisting of 3 nodes succeeded.
2020-01-28 22:14:16.517101: E tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:101] Could not find any TF GPUs
2020-01-28 22:14:16.517106: W tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:723] Can't identify the cuda device. Running on device 0 
2020-01-28 22:14:16.517134: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:734] TensorRT node GridAnchorGenerator/TRTEngineOp_8 added for segment 8 consisting of 8 nodes succeeded.
2020-01-28 22:14:16.517165: E tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:101] Could not find any TF GPUs
2020-01-28 22:14:16.517170: W tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:723] Can't identify the cuda device. Running on device 0 
2020-01-28 22:14:16.517196: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:734] TensorRT node TRTEngineOp_9 added for segment 9 consisting of 55 nodes succeeded.
2020-01-28 22:14:16.517335: E tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:101] Could not find any TF GPUs
2020-01-28 22:14:16.517341: W tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:723] Can't identify the cuda device. Running on device 0 
2020-01-28 22:14:16.517369: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:734] TensorRT node TRTEngineOp_10 added for segment 10 consisting of 7 nodes succeeded.
2020-01-28 22:14:16.517401: E tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:101] Could not find any TF GPUs
2020-01-28 22:14:16.517406: W tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:723] Can't identify the cuda device. Running on device 0 
2020-01-28 22:14:16.517430: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:734] TensorRT node TRTEngineOp_11 added for segment 11 consisting of 7 nodes succeeded.
2020-01-28 22:14:16.517466: E tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:101] Could not find any TF GPUs
2020-01-28 22:14:16.517472: W tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:723] Can't identify the cuda device. Running on device 0 
2020-01-28 22:14:16.517496: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:734] TensorRT node SecondStagePostprocessor/TRTEngineOp_12 added for segment 12 consisting of 5 nodes succeeded.
2020-01-28 22:14:16.921990: W tensorflow/compiler/tf2tensorrt/convert/trt_optimization_pass.cc:183] TensorRTOptimizer is probably called on funcdef! This optimizer must *NOT* be called on function objects.
2020-01-28 22:14:16.956974: W tensorflow/compiler/tf2tensorrt/convert/trt_optimization_pass.cc:183] TensorRTOptimizer is probably called on funcdef! This optimizer must *NOT* be called on function objects.
2020-01-28 22:14:16.983557: W tensorflow/compiler/tf2tensorrt/convert/trt_optimization_pass.cc:183] TensorRTOptimizer is probably called on funcdef! This optimizer must *NOT* be called on function objects.
2020-01-28 22:14:17.135528: W tensorflow/compiler/tf2tensorrt/convert/trt_optimization_pass.cc:183] TensorRTOptimizer is probably called on funcdef! This optimizer must *NOT* be called on function objects.
2020-01-28 22:14:17.216288: W tensorflow/compiler/tf2tensorrt/convert/trt_optimization_pass.cc:183] TensorRTOptimizer is probably called on funcdef! This optimizer must *NOT* be called on function objects.
2020-01-28 22:14:17.229627: W tensorflow/compiler/tf2tensorrt/convert/trt_optimization_pass.cc:183] TensorRTOptimizer is probably called on funcdef! This optimizer must *NOT* be called on function objects.
2020-01-28 22:14:17.244907: W tensorflow/compiler/tf2tensorrt/convert/trt_optimization_pass.cc:183] TensorRTOptimizer is probably called on funcdef! This optimizer must *NOT* be called on function objects.
2020-01-28 22:14:17.260996: W tensorflow/compiler/tf2tensorrt/convert/trt_optimization_pass.cc:183] TensorRTOptimizer is probably called on funcdef! This optimizer must *NOT* be called on function objects.
2020-01-28 22:14:17.275068: W tensorflow/compiler/tf2tensorrt/convert/trt_optimization_pass.cc:183] TensorRTOptimizer is probably called on funcdef! This optimizer must *NOT* be called on function objects.
2020-01-28 22:14:17.288798: W tensorflow/compiler/tf2tensorrt/convert/trt_optimization_pass.cc:183] TensorRTOptimizer is probably called on funcdef! This optimizer must *NOT* be called on function objects.
2020-01-28 22:14:17.302031: W tensorflow/compiler/tf2tensorrt/convert/trt_optimization_pass.cc:183] TensorRTOptimizer is probably called on funcdef! This optimizer must *NOT* be called on function objects.
2020-01-28 22:14:17.314723: W tensorflow/compiler/tf2tensorrt/convert/trt_optimization_pass.cc:183] TensorRTOptimizer is probably called on funcdef! This optimizer must *NOT* be called on function objects.
2020-01-28 22:14:17.329713: W tensorflow/compiler/tf2tensorrt/convert/trt_optimization_pass.cc:183] TensorRTOptimizer is probably called on funcdef! This optimizer must *NOT* be called on function objects.
2020-01-28 22:14:17.376069: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:786] Optimization results for grappler item: tf_graph
2020-01-28 22:14:17.376098: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   constant_folding: Graph size after: 11171 nodes (0), 19239 edges (1), time = 844.105ms.
2020-01-28 22:14:17.376102: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   layout: layout did nothing. time = 11.896ms.
2020-01-28 22:14:17.376106: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   constant_folding: Graph size after: 11171 nodes (0), 19239 edges (0), time = 337.749ms.
2020-01-28 22:14:17.376110: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   TensorRTOptimizer: Graph size after: 10496 nodes (-675), 18508 edges (-731), time = 673.054ms.
2020-01-28 22:14:17.376115: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   constant_folding: Graph size after: 10496 nodes (0), 18508 edges (0), time = 272.011ms.
2020-01-28 22:14:17.376118: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:786] Optimization results for grappler item: ClipToWindow/TRTEngineOp_0_native_segment
2020-01-28 22:14:17.376122: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   constant_folding: Graph size after: 14 nodes (0), 16 edges (0), time = 0.745ms.
2020-01-28 22:14:17.376126: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   layout: layout did nothing. time = 0.014ms.
2020-01-28 22:14:17.376129: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   constant_folding: Graph size after: 14 nodes (0), 16 edges (0), time = 0.653ms.
2020-01-28 22:14:17.376133: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   TensorRTOptimizer: Graph size after: 14 nodes (0), 16 edges (0), time = 0.048ms.
2020-01-28 22:14:17.376137: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   constant_folding: Graph size after: 14 nodes (0), 16 edges (0), time = 0.681ms.
2020-01-28 22:14:17.376141: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:786] Optimization results for grappler item: TRTEngineOp_9_native_segment
2020-01-28 22:14:17.376145: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   constant_folding: Graph size after: 57 nodes (0), 59 edges (0), time = 13.999ms.
2020-01-28 22:14:17.376151: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   layout: layout did nothing. time = 0.078ms.
2020-01-28 22:14:17.376154: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   constant_folding: Graph size after: 57 nodes (0), 59 edges (0), time = 10.911ms.
2020-01-28 22:14:17.376158: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   TensorRTOptimizer: Graph size after: 57 nodes (0), 59 edges (0), time = 0.161ms.
2020-01-28 22:14:17.376162: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   constant_folding: Graph size after: 57 nodes (0), 59 edges (0), time = 9.882ms.
2020-01-28 22:14:17.376171: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:786] Optimization results for grappler item: TRTEngineOp_10_native_segment
2020-01-28 22:14:17.376176: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   constant_folding: Graph size after: 9 nodes (0), 8 edges (0), time = 3.315ms.
2020-01-28 22:14:17.376182: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   layout: layout did nothing. time = 0.035ms.
2020-01-28 22:14:17.376194: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   constant_folding: Graph size after: 9 nodes (0), 8 edges (0), time = 3.15ms.
2020-01-28 22:14:17.376202: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   TensorRTOptimizer: Graph size after: 9 nodes (0), 8 edges (0), time = 0.103ms.
2020-01-28 22:14:17.376208: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   constant_folding: Graph size after: 9 nodes (0), 8 edges (0), time = 3.211ms.
2020-01-28 22:14:17.376211: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:786] Optimization results for grappler item: TRTEngineOp_5_native_segment
2020-01-28 22:14:17.376217: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   constant_folding: Graph size after: 524 nodes (0), 553 edges (0), time = 52.455ms.
2020-01-28 22:14:17.376223: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   layout: layout did nothing. time = 0.956ms.
2020-01-28 22:14:17.376227: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   constant_folding: Graph size after: 524 nodes (0), 553 edges (0), time = 45.113ms.
2020-01-28 22:14:17.376235: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   TensorRTOptimizer: Graph size after: 524 nodes (0), 553 edges (0), time = 4.344ms.
2020-01-28 22:14:17.376244: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   constant_folding: Graph size after: 524 nodes (0), 553 edges (0), time = 52.893ms.
2020-01-28 22:14:17.376254: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:786] Optimization results for grappler item: TRTEngineOp_7_native_segment
2020-01-28 22:14:17.376261: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   constant_folding: Graph size after: 5 nodes (0), 4 edges (0), time = 0.614ms.
2020-01-28 22:14:17.376269: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   layout: layout did nothing. time = 0.011ms.
2020-01-28 22:14:17.376278: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   constant_folding: Graph size after: 5 nodes (0), 4 edges (0), time = 0.604ms.
2020-01-28 22:14:17.376283: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   TensorRTOptimizer: Graph size after: 5 nodes (0), 4 edges (0), time = 0.045ms.
2020-01-28 22:14:17.376293: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   constant_folding: Graph size after: 5 nodes (0), 4 edges (0), time = 0.592ms.
2020-01-28 22:14:17.376297: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:786] Optimization results for grappler item: TRTEngineOp_11_native_segment
2020-01-28 22:14:17.376301: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   constant_folding: Graph size after: 9 nodes (0), 8 edges (0), time = 2.138ms.
2020-01-28 22:14:17.376306: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   layout: layout did nothing. time = 0.03ms.
2020-01-28 22:14:17.376311: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   constant_folding: Graph size after: 9 nodes (0), 8 edges (0), time = 2.149ms.
2020-01-28 22:14:17.376315: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   TensorRTOptimizer: Graph size after: 9 nodes (0), 8 edges (0), time = 0.093ms.
2020-01-28 22:14:17.376321: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   constant_folding: Graph size after: 9 nodes (0), 8 edges (0), time = 2.074ms.
2020-01-28 22:14:17.376324: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:786] Optimization results for grappler item: TRTEngineOp_1_native_segment
2020-01-28 22:14:17.376331: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   constant_folding: Graph size after: 24 nodes (0), 27 edges (0), time = 2.147ms.
2020-01-28 22:14:17.376337: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   layout: layout did nothing. time = 0.036ms.
2020-01-28 22:14:17.376343: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   constant_folding: Graph size after: 24 nodes (0), 27 edges (0), time = 2.229ms.
2020-01-28 22:14:17.376347: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   TensorRTOptimizer: Graph size after: 24 nodes (0), 27 edges (0), time = 0.115ms.
2020-01-28 22:14:17.376354: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   constant_folding: Graph size after: 24 nodes (0), 27 edges (0), time = 2.131ms.
2020-01-28 22:14:17.376357: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:786] Optimization results for grappler item: TRTEngineOp_3_native_segment
2020-01-28 22:14:17.376362: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   constant_folding: Graph size after: 24 nodes (0), 27 edges (0), time = 2.171ms.
2020-01-28 22:14:17.376369: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   layout: layout did nothing. time = 0.036ms.
2020-01-28 22:14:17.376373: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   constant_folding: Graph size after: 24 nodes (0), 27 edges (0), time = 2.18ms.
2020-01-28 22:14:17.376378: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   TensorRTOptimizer: Graph size after: 24 nodes (0), 27 edges (0), time = 0.11ms.
2020-01-28 22:14:17.376384: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   constant_folding: Graph size after: 24 nodes (0), 27 edges (0), time = 2.062ms.
2020-01-28 22:14:17.376389: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:786] Optimization results for grappler item: TRTEngineOp_4_native_segment
2020-01-28 22:14:17.376392: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   constant_folding: Graph size after: 24 nodes (0), 27 edges (0), time = 1.616ms.
2020-01-28 22:14:17.376398: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   layout: layout did nothing. time = 0.026ms.
2020-01-28 22:14:17.376403: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   constant_folding: Graph size after: 24 nodes (0), 27 edges (0), time = 1.494ms.
2020-01-28 22:14:17.376407: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   TensorRTOptimizer: Graph size after: 24 nodes (0), 27 edges (0), time = 0.078ms.
2020-01-28 22:14:17.376411: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   constant_folding: Graph size after: 24 nodes (0), 27 edges (0), time = 1.556ms.
2020-01-28 22:14:17.376417: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:786] Optimization results for grappler item: GridAnchorGenerator/TRTEngineOp_8_native_segment
2020-01-28 22:14:17.376423: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   constant_folding: Graph size after: 11 nodes (0), 12 edges (0), time = 1.682ms.
2020-01-28 22:14:17.376429: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   layout: layout did nothing. time = 0.025ms.
2020-01-28 22:14:17.376433: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   constant_folding: Graph size after: 11 nodes (0), 12 edges (0), time = 1.779ms.
2020-01-28 22:14:17.376438: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   TensorRTOptimizer: Graph size after: 11 nodes (0), 12 edges (0), time = 0.086ms.
2020-01-28 22:14:17.376443: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   constant_folding: Graph size after: 11 nodes (0), 12 edges (0), time = 1.733ms.
2020-01-28 22:14:17.376447: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:786] Optimization results for grappler item: SecondStagePostprocessor/TRTEngineOp_12_native_segment
2020-01-28 22:14:17.376453: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   constant_folding: Graph size after: 7 nodes (0), 6 edges (0), time = 1.665ms.
2020-01-28 22:14:17.376458: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   layout: layout did nothing. time = 0.021ms.
2020-01-28 22:14:17.376462: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   constant_folding: Graph size after: 7 nodes (0), 6 edges (0), time = 1.597ms.
2020-01-28 22:14:17.376467: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   TensorRTOptimizer: Graph size after: 7 nodes (0), 6 edges (0), time = 0.074ms.
2020-01-28 22:14:17.376472: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   constant_folding: Graph size after: 7 nodes (0), 6 edges (0), time = 1.659ms.
2020-01-28 22:14:17.376476: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:786] Optimization results for grappler item: TRTEngineOp_6_native_segment
2020-01-28 22:14:17.376482: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   constant_folding: Graph size after: 6 nodes (0), 5 edges (0), time = 1.515ms.
2020-01-28 22:14:17.376494: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   layout: layout did nothing. time = 0.019ms.
2020-01-28 22:14:17.376500: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   constant_folding: Graph size after: 6 nodes (0), 5 edges (0), time = 1.481ms.
2020-01-28 22:14:17.376505: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   TensorRTOptimizer: Graph size after: 6 nodes (0), 5 edges (0), time = 0.066ms.
2020-01-28 22:14:17.376510: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   constant_folding: Graph size after: 6 nodes (0), 5 edges (0), time = 1.712ms.
2020-01-28 22:14:17.376515: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:786] Optimization results for grappler item: TRTEngineOp_2_native_segment
2020-01-28 22:14:17.376522: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   constant_folding: Graph size after: 24 nodes (0), 27 edges (0), time = 2.185ms.
2020-01-28 22:14:17.376528: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   layout: layout did nothing. time = 0.037ms.
2020-01-28 22:14:17.376533: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   constant_folding: Graph size after: 24 nodes (0), 27 edges (0), time = 2.221ms.
2020-01-28 22:14:17.376537: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   TensorRTOptimizer: Graph size after: 24 nodes (0), 27 edges (0), time = 0.108ms.
2020-01-28 22:14:17.376543: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   constant_folding: Graph size after: 24 nodes (0), 27 edges (0), time = 2.335ms.
numb. of trt_engine_ops in trt_graph 13

then I get this error during inference !!!

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1348, in _run_fn
    self._extend_graph()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1388, in _extend_graph
    tf_session.ExtendSession(self._session)
tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'TRTEngineOp' used by {{node import/TRTEngineOp_5}}with these attrs: [output_shapes=[], workspace_size_bytes=404994176, max_cached_engines_count=1, segment_func=TRTEngineOp_5_native_segment[], segment_funcdef_name="", use_calibration=false, fixed_input_size=true, input_shapes=[], OutT=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], precision_mode="FP16", static_engine=false, serialized_segment="", cached_engine_batches=[], InT=[DT_FLOAT], calibration_data=""]
Registered devices: [CPU, XLA_CPU, XLA_GPU]
Registered kernels:
  device='GPU'

	 [[import/TRTEngineOp_5]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "pb_to_TRT2.py", line 54, in <module>
    sess.run(output_node)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 956, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1180, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run
    run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'TRTEngineOp' used by node import/TRTEngineOp_5 (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) with these attrs: [output_shapes=[], workspace_size_bytes=404994176, max_cached_engines_count=1, segment_func=TRTEngineOp_5_native_segment[], segment_funcdef_name="", use_calibration=false, fixed_input_size=true, input_shapes=[], OutT=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], precision_mode="FP16", static_engine=false, serialized_segment="", cached_engine_batches=[], InT=[DT_FLOAT], calibration_data=""]
Registered devices: [CPU, XLA_CPU, XLA_GPU]
Registered kernels:
  device='GPU'

	 [[import/TRTEngineOp_5]]

Please help if the pipeline is correct? Which version of TF I have to use to have the maximum optimization in time and face no error for inference?

Ubuntu: 1.18
TensorRT Container: 19.07

Hi,

Could you please share the script and model file so we can help better?
Also, can you provide GPU type you are using in this case?

Meantime, please try to run the code on latest TF or TRT release container.
https://docs.nvidia.com/deeplearning/frameworks/tensorflow-release-notes/index.html
https://docs.nvidia.com/deeplearning/sdk/tensorrt-container-release-notes/index.html

Thanks

Hi,

Here is the script:

import tensorflow as tf
import numpy as np
import time
from PIL import Image

im = Image.open("image_test_resized.png")
np_image= np.array(im)
im=np.expand_dims(np_image, axis=0)

from tensorflow.python.compiler.tensorrt import trt_convert as trt
with tf.Session() as sess:
	with tf.gfile.GFile("frozen_inference_graph.pb", "rb") as f:
		frozen_graph = tf.GraphDef()
		frozen_graph.ParseFromString(f.read())
	converter = trt.TrtGraphConverter(
		is_dynamic_op=True,
		input_graph_def=frozen_graph,
		nodes_blacklist=["detection_boxes", "detection_scores", "detection_classes", "num_detections"],precision_mode='FP16')
	trt_graph = converter.convert()
	output_node = tf.import_graph_def(
		trt_graph,
		input_map={'image_tensor:0':im},
		return_elements=["detection_boxes", "detection_scores", "detection_classes", "num_detections"])

	trt_engine_ops = len([1 for n in trt_graph.node if str(n.op) == 'TRTEngineOp'])
	print("numb. of trt_engine_ops in trt_graph", trt_engine_ops)
	start = time.time()
	sess.run(output_node)
	end = time.time()
	print("Executed TF Detection on image in {0} seconds".format(end - start))

and I am using the “frozen_inference_graph.pb” in this repo:
http://download.tensorflow.org/models/object_detection/faster_rcnn_resnet101_coco_2018_01_28.tar.gz

GPU: RTX 2080 Ti

In Tensorflow Container 20.01 the script was run successfuly. The inference time for faster_rcnn_resnet101 is ~50ms. Although it is reduced, it is higher than FasterRCNN demo of TensorRT (17ms). Is there any room for more imrovement?

Hi,

Another alternative is to convert your model to ONNX instead using tf2onnx and then convert to TensorRT using ONNX parser. Any layer that are not supported needs to be replaced by custom plugin.
https://github.com/onnx/tensorflow-onnx
https://github.com/onnx/onnx-tensorrt/blob/master/operators.md

Thanks

Hi,

I tried to use the UFF parser and follow the steps for Faster RCNN UFF sample. However it seems that the config.py there is prepared for Resnet10 backbone. Is there any config file for Resnet101 or Resnet50 backbone publically shared by NVIDIA?

I used this container tensorrt:20.03-py3
and then running these commands inside container:

/opt/tensorrt/install_opensource.sh
/opt/tensorrt/samples/sampleUffFasterRCNN/download_model.sh
cd /opt/tensorrt/samples
make
cd /opt/tensorrt/python
apt install ./graphsurgeon-tf_7.0.0-1+cuda10.2_amd64.deb
apt install ./uff-converter-tf_7.0.0-1+cuda10.2_amd64.deb
./python_setup.sh
cd /opt/tensorrt/samples/sampleUffFasterRCNN/
/usr/local/bin/convert-to-uff -p config.py -O dense_class/Softmax -O dense_regress/BiasAdd -O proposal uff_faster_rcnn/faster_rcnn.pb
cp uff_faster_rcnn/faster_rcnn.uff /opt/tensorrt/data/faster-rcnn/
cp uff_faster_rcnn/list.uff /opt/tensorrt/data/faster-rcnn/
cd ../..
# for FP16
./bin/sample_uff_faster_rcnn --profile --datadir /data/uff_faster_rcnn -W 480 -H 272 -I 004545.ppm

# for INT8
./bin/sample_uff_faster_rcnn --profile --datadir /data/uff_faster_rcnn -i -W 480 -H 272 -I 004545.ppm

It works for the pb sample of RESNET10 provided by opensource examples, but does not work for FASTER-RCNN Resnet100, it gets stuck at this command line

/usr/local/bin/convert-to-uff -p config.py -O dense_class/Softmax -O dense_regress/BiasAdd -O proposal uff_faster_rcnn/faster_rcnn.pb

Hi,

Atleast I am not aware of any other such samples that are available.
I think config.py is created specific to “sampleUffFasterRCNN” sample, you have to make modification in config file based on the model.


Could you check this repo, maybe helpful?

Thanks