No speed up tensorrt model in inference (xavier)

Rabeb · April 16, 2020, 11:29am

I converted my model tf to trt in xavier FP16 like below , when i run it in inference i get the same speed without optimization , i used this code
with graph.as_default():
with tf.Session() as sess:
trt_graph = trt.create_inference_graph(
input_graph_def=gdef,
outputs=[‘features’],
max_batch_size=32,
max_workspace_size_bytes=7000000000,
is_dynamic_op=True,
precision_mode=‘FP16’)
what’s wrong?

Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6960 MB memory) → physical GPU (device: 0, name: Xavier, pci bus id: 0000:00:00.0, compute capability: 7.2)
2020-04-16 11:33:36.026142: I tensorflow/compiler/tf2tensorrt/segment/segment.cc:460] There are 50 ops of 23 different types in the graph that are not converted to TensorRT: TensorArrayGatherV3, Exit, NextIteration, TensorArrayReadV3, Switch, StridedSlice, Shape, Cast, Reshape, TensorArrayScatterV3, TensorArraySizeV3, TensorArrayWriteV3, NoOp, TensorArrayV3, Placeholder, Range, Enter, Less, Merge, LogicalAnd, LoopCond, Identity, Add, (For more information see Accelerating Inference In TF-TRT User Guide :: NVIDIA Deep Learning Frameworks Documentation).
2020-04-16 11:33:36.041563: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:735] Number of TensorRT candidate segments: 4
2020-04-16 11:33:36.079573: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2020-04-16 11:33:36.080101: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:837] TensorRT node TRTEngineOp_0 added for segment 0 consisting of 7 nodes succeeded.
2020-04-16 11:33:36.083233: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:837] TensorRT node TRTEngineOp_1 added for segment 1 consisting of 137 nodes succeeded.
2020-04-16 11:33:36.084416: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:837] TensorRT node TRTEngineOp_2 added for segment 2 consisting of 10 nodes succeeded.
2020-04-16 11:33:36.097784: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:837] TensorRT node TRTEngineOp_3 added for segment 3 consisting of 12 nodes succeeded.
2020-04-16 11:33:36.109449: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:752] Optimization results for grappler item: tf_graph
2020-04-16 11:33:36.109576: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:754] constant folding: Graph size after: 226 nodes (-68), 257 edges (-68), time = 95.428ms.
2020-04-16 11:33:36.109630: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:754] layout: Graph size after: 234 nodes (8), 269 edges (12), time = 25.178ms.
2020-04-16 11:33:36.109690: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:754] constant folding: Graph size after: 234 nodes (0), 269 edges (0), time = 26.862ms.
2020-04-16 11:33:36.109718: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:754] TensorRTOptimizer: Graph size after: 72 nodes (-162), 100 edges (-169), time = 104.099ms.

dkreutz · April 16, 2020, 5:22pm

I have no experience in this but to my understanding the best inference performance is gained with INT8 precision?

Rabeb · April 17, 2020, 3:01pm

thank you for your replay but even with int8 i get the same results

batch_size=30
batched_input=
count=0
listimg=os.listdir(“./img/”)
for i in listimg:
img =cv2.imread(“./img/”+i)
batched_input.append( tf.image.convert_image_dtype(img, dtype=tf.uint8, saturate=False))
print(‘*****batched_input shape: ‘, len(batched_input))
conversion_params = trt.DEFAULT_TRT_CONVERSION_PARAMS
conversion_params = conversion_params._replace(
max_workspace_size_bytes=(1<<32))
conversion_params = conversion_params._replace(precision_mode=“INT8”)
converter = trt.TrtGraphConverterV2(input_saved_model_dir=’./saved’, conversion_params=conversion_params)
def calibration_input_fn():
yield (batched_input, )
converter.convert(calibration_input_fn=calibration_input_fn)
converter.save(“./v2trt”)

AastaLLL · April 24, 2020, 2:53am

Hi,

TFTRT is a tool that integrated TensorRT into the TensorFlow framework.
In this mechnisim, it will automatically fallback the non-supported layer into TensorFlow implementation.

The precision parameter targets for TensorRT and cannot change the implementation of TensoFlow.
So it’t recommended to check how many layers are using TensorRT and how many are fallbacked to TensorFlow first.

This information should be available with the device placement flag.

Thanks.

Topic		Replies	Views
Tf v2 to trt in xavier no time improvement Jetson AGX Xavier tensorrt , tensorflow	2	402	October 18, 2021
inference speed not improve between FP32 vs FP16 when using tensorflow.contrib.tensorrt Jetson AGX Xavier	4	721	October 18, 2021
Failed to use INT8 precision mode when using tf-trt on Xavier Jetson AGX Xavier	4	968	October 18, 2021
No speed up in inference on jetson agx with tensorflow 2 and tensorRT Jetson AGX Xavier tensorrt , tensorflow	4	637	October 18, 2021
No performance improvement with TF-TRT optimization (ResNet50, DenseNet121) TensorRT	4	1090	June 15, 2020
No performance improvement for Tensorflow TensorRT model on converted on Jetsons Xavier NX Jetson Xavier NX tensorrt , tensorflow	2	677	October 18, 2021
Create Inference Graph interpretation TensorRT	0	777	March 21, 2019
No SpeedUp after TensorRT INT8 (PointNet ++ tensorflow model) TensorRT	6	1252	February 25, 2020
Conversion with no speed improvement, TRT-TF TensorRT	2	1138	October 12, 2021
No speed up with TensorRT FP16 or INT8 on NVIDIA V100 TensorRT	7	2807	November 15, 2019

No speed up tensorrt model in inference (xavier)

Related topics