No speed up tensorrt model in inference (xavier)

I converted my model tf to trt in xavier FP16 like below , when i run it in inference i get the same speed without optimization , i used this code
with graph.as_default():
with tf.Session() as sess:
trt_graph = trt.create_inference_graph(
input_graph_def=gdef,
outputs=[‘features’],
max_batch_size=32,
max_workspace_size_bytes=7000000000,
is_dynamic_op=True,
precision_mode=‘FP16’)
what’s wrong?

Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6960 MB memory) -> physical GPU (device: 0, name: Xavier, pci bus id: 0000:00:00.0, compute capability: 7.2)
2020-04-16 11:33:36.026142: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:460] There are 50 ops of 23 different types in the graph that are not converted to TensorRT: TensorArrayGatherV3, Exit, NextIteration, TensorArrayReadV3, Switch, StridedSlice, Shape, Cast, Reshape, TensorArrayScatterV3, TensorArraySizeV3, TensorArrayWriteV3, NoOp, TensorArrayV3, Placeholder, Range, Enter, Less, Merge, LogicalAnd, LoopCond, Identity, Add, (For more information see https://docs.nvidia.com/deeplearning/dgx/tf-trt-user-guide/index.html#supported-ops).
2020-04-16 11:33:36.041563: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:735] Number of TensorRT candidate segments: 4
2020-04-16 11:33:36.079573: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2020-04-16 11:33:36.080101: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:837] TensorRT node TRTEngineOp_0 added for segment 0 consisting of 7 nodes succeeded.
2020-04-16 11:33:36.083233: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:837] TensorRT node TRTEngineOp_1 added for segment 1 consisting of 137 nodes succeeded.
2020-04-16 11:33:36.084416: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:837] TensorRT node TRTEngineOp_2 added for segment 2 consisting of 10 nodes succeeded.
2020-04-16 11:33:36.097784: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:837] TensorRT node TRTEngineOp_3 added for segment 3 consisting of 12 nodes succeeded.
2020-04-16 11:33:36.109449: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:752] Optimization results for grappler item: tf_graph
2020-04-16 11:33:36.109576: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:754] constant folding: Graph size after: 226 nodes (-68), 257 edges (-68), time = 95.428ms.
2020-04-16 11:33:36.109630: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:754] layout: Graph size after: 234 nodes (8), 269 edges (12), time = 25.178ms.
2020-04-16 11:33:36.109690: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:754] constant folding: Graph size after: 234 nodes (0), 269 edges (0), time = 26.862ms.
2020-04-16 11:33:36.109718: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:754] TensorRTOptimizer: Graph size after: 72 nodes (-162), 100 edges (-169), time = 104.099ms.

I have no experience in this but to my understanding the best inference performance is gained with INT8 precision?

thank you for your replay but even with int8 i get the same results

batch_size=30
batched_input=
count=0
listimg=os.listdir("./img/")
for i in listimg:
img =cv2.imread("./img/"+i)
batched_input.append( tf.image.convert_image_dtype(img, dtype=tf.uint8, saturate=False))
print(’*****batched_input shape: ‘, len(batched_input))
conversion_params = trt.DEFAULT_TRT_CONVERSION_PARAMS
conversion_params = conversion_params._replace(
max_workspace_size_bytes=(1<<32))
conversion_params = conversion_params._replace(precision_mode=“INT8”)
converter = trt.TrtGraphConverterV2(input_saved_model_dir=’./saved’, conversion_params=conversion_params)
def calibration_input_fn():
yield (batched_input, )
converter.convert(calibration_input_fn=calibration_input_fn)
converter.save("./v2trt")

Hi,

TFTRT is a tool that integrated TensorRT into the TensorFlow framework.
In this mechnisim, it will automatically fallback the non-supported layer into TensorFlow implementation.

The precision parameter targets for TensorRT and cannot change the implementation of TensoFlow.
So it’t recommended to check how many layers are using TensorRT and how many are fallbacked to TensorFlow first.

This information should be available with the device placement flag.
https://www.tensorflow.org/guide/gpu#logging_device_placement

Thanks.