Inference times for TensorRT graph and native graph are the same

Greetings! I have managed to convert this model to TensorRT for Jetson Nano (Mobile SSD for Face Detection) https://drive.google.com/file/d/0B5ttP5kO_loUdWZWZVVrN2VmWFk/view and got the same inference time as my native graph in tensorflow had.

Conversion was done by this examples - https://github.com/tensorflow/tensorrt/blob/master/tftrt/examples/object_detection/object_detection.py

Output is:

2019-08-17 12:48:07.737155: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1245 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X1, pci bus id: 0000:00:00.0, compute capability: 5.3)
2019-08-17 12:48:13.883319: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:460] There are 436 ops of 48 different types in the graph that are not converted to TensorRT: Sub, TopKV2, Gather, Where, Size, Greater, ExpandDims, Identity, Assert, LoopCond, Merge, Slice, Mul, LogicalAnd, ZerosLike, Less, Range, DataFormatVecPermute, Placeholder, TensorArrayV3, TensorArraySizeV3, TensorArrayReadV3, TensorArrayScatterV3, Reshape, Cast, Minimum, Shape, StridedSlice, Switch, ResizeBilinear, Enter, Squeeze, Add, NextIteration, Exit, NoOp, Pack, NonMaxSuppression, TensorArrayGatherV3, TensorArrayWriteV3, GreaterEqual, Const, Fill, Transpose, Unpack, ConcatV2, Equal, Tile, (For more information see https://docs.nvidia.com/deeplearning/dgx/tf-trt-user-guide/index.html#supported-ops).
2019-08-17 12:48:14.115043: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:733] Number of TensorRT candidate segments: 18
2019-08-17 12:48:14.794959: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-08-17 12:48:14.841150: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:835] TensorRT node TRTEngineOp_0 added for segment 0 consisting of 245 nodes succeeded.
2019-08-17 12:48:14.854771: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:835] TensorRT node TRTEngineOp_1 added for segment 1 consisting of 26 nodes succeeded.
2019-08-17 12:48:14.858770: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:835] TensorRT node TRTEngineOp_2 added for segment 2 consisting of 26 nodes succeeded.
2019-08-17 12:48:14.862750: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:835] TensorRT node TRTEngineOp_3 added for segment 3 consisting of 26 nodes succeeded.
2019-08-17 12:48:14.864412: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:835] TensorRT node TRTEngineOp_4 added for segment 4 consisting of 18 nodes succeeded.
2019-08-17 12:48:14.865049: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:835] TensorRT node TRTEngineOp_5 added for segment 5 consisting of 22 nodes succeeded.
2019-08-17 12:48:14.865583: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:835] TensorRT node TRTEngineOp_6 added for segment 6 consisting of 22 nodes succeeded.
2019-08-17 12:48:14.866068: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:835] TensorRT node TRTEngineOp_7 added for segment 7 consisting of 18 nodes succeeded.
2019-08-17 12:48:14.866512: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:835] TensorRT node TRTEngineOp_8 added for segment 8 consisting of 18 nodes succeeded.
2019-08-17 12:48:14.866823: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:835] TensorRT node Postprocessor/BatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSuppression/TRTEngineOp_9 added for segment 9 consisting of 2 nodes succeeded.
2019-08-17 12:48:14.867047: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:835] TensorRT node Postprocessor/BatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSuppression/TRTEngineOp_10 added for segment 10 consisting of 2 nodes succeeded.
2019-08-17 12:48:14.867268: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:835] TensorRT node Postprocessor/BatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSuppression/TRTEngineOp_11 added for segment 11 consisting of 2 nodes succeeded.
2019-08-17 12:48:14.867484: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:835] TensorRT node Postprocessor/BatchMultiClassNonMaxSuppression/map/while/MultiClassNonMaxSuppression/TRTEngineOp_12 added for segment 12 consisting of 2 nodes succeeded.
2019-08-17 12:48:14.867713: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:835] TensorRT node Postprocessor/TRTEngineOp_13 added for segment 13 consisting of 2 nodes succeeded.
2019-08-17 12:48:14.868108: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:835] TensorRT node TRTEngineOp_14 added for segment 14 consisting of 6 nodes succeeded.
2019-08-17 12:48:14.868414: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:835] TensorRT node Preprocessor/TRTEngineOp_15 added for segment 15 consisting of 4 nodes succeeded.
2019-08-17 12:48:14.868657: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:835] TensorRT node TRTEngineOp_16 added for segment 16 consisting of 2 nodes succeeded.
2019-08-17 12:48:14.868972: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:835] TensorRT node TRTEngineOp_17 added for segment 17 consisting of 2 nodes succeeded.
2019-08-17 12:48:14.955975: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:739] Optimization results for grappler item: tf_graph
2019-08-17 12:48:14.956070: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:741]   constant folding: Graph size after: 1150 nodes (-1205), 1551 edges (-1361), time = 895.217ms.
2019-08-17 12:48:14.956098: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:741]   layout: Graph size after: 1191 nodes (41), 1611 edges (60), time = 173.179ms.
2019-08-17 12:48:14.956122: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:741]   constant folding: Graph size after: 1183 nodes (-8), 1603 edges (-8), time = 302.745ms.
2019-08-17 12:48:14.956161: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:741]   TensorRTOptimizer: Graph size after: 756 nodes (-427), 1123 edges (-480), time = 1173.91895ms.
graph_size(MB)(native_tf): 21.6
graph_size(MB)(trt): 42.9
num_nodes(native_tf): 2355
num_nodes(tftrt_total): 756
num_nodes(trt_only): 18
time(s) (trt_conversion): 115.7192

Inference time on pic 1280x720 for both graphs is ~ 0.145

Thanks in advance.