TF-TRT speed up not reproducible on custom trained SSD Inception model from TF model zoo.

rajanand · May 16, 2019, 7:56pm

Hi,

I have an observation on TF-TRT support for tensorflow models. I am able to reproduce 1.5x-2x TensorRT speed-up on TF model-zoo models as well as any custom retrained models of ssd_inception_v2 for 300x300 resolution (resolution used on model zoo), but I am not able to reproduce the same 1.5x -2x speed up for high resolution retrained models on 1920x1080 of ssd_inception_v2. The tf frozen_graph and the tensorrt optimized graph run roughly at the same fps.

Are there any aspects that might affect the kernel fusion or creating TRTEngineOps for TF subgraphs with the high resolution model (as the weight matrix is much larger for this than the 300x300 models). But, I would still expect some amount of speed up either way as I do see creation of TRTEngineOps the same as the 300x300 model conversion. Since they are the same architecture trained with different resolution imagery and that the converted TRTEngineOps introduced are also the same, what could be the cause for this discrepancy?

I use FP16 precision. An example of the logs printed from TRT graph conversion (for both 300x300 and 1920x1080) is:

[tensorflow/contrib/tensorrt/convert/convert_graph.cc:913] Number of TensorRT candidate segments: 4
[tensorflow/contrib/tensorrt/convert/convert_graph.cc:1015] TensorRT node TRTEngineOp_0 added for segment 0 consisting of 945 nodes succeeded.
[tensorflow/contrib/tensorrt/convert/convert_graph.cc:1015] TensorRT node Postprocessor/TRTEngineOp_1 added for segment 1 consisting of 3 nodes succeeded.
[tensorflow/contrib/tensorrt/convert/convert_graph.cc:1015] TensorRT node TRTEngineOp_2 added for segment 2 consisting of 5 nodes succeeded.
[tensorflow/contrib/tensorrt/convert/convert_graph.cc:1015] TensorRT node TRTEngineOp_3 added for segment 3 consisting of 4 nodes succeeded.
[tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:265] Returning from TensorRTOptimizer
[tensorflow/core/grappler/optimizers/meta_optimizer.cc:581] Optimization results for grappler item: tf_graph
[tensorflow/core/grappler/optimizers/meta_optimizer.cc:583] constant folding: Graph size after: 2014 nodes (-1393), 2809 edges (-1542), time = 331.582ms.
[tensorflow/core/grappler/optimizers/meta_optimizer.cc:583] layout: Graph size after: 2051 nodes (37), 2847 edges (38), time = 99.096ms.
[tensorflow/core/grappler/optimizers/meta_optimizer.cc:583] constant folding: Graph size after: 2041 nodes (-10), 2847 edges (0), time = 173.335ms.
[tensorflow/core/grappler/optimizers/meta_optimizer.cc:583] TensorRTOptimizer: Graph size after: 1088 nodes (-953), 1702 edges (-1145), time = 483.096ms.

NVES · May 16, 2019, 8:14pm

hello,

we are triaging will keep you updated.

rajanand · May 16, 2019, 8:18pm

Thank you. For reference, I am using the same approach as mentioned here -[url]https://github.com/tensorflow/tensorrt/tree/master/tftrt/examples/object_detection[/url]

Pooya-Davoodi · August 2, 2019, 12:23am

Is the size 1920x1080 propagated throughout the network?
In other words, do you know if the inputs size of TRTEngineOp_i are different from the lower resolution case?

You can try verbose logging to see if any useful information appears: Accelerating Inference In TF-TRT User Guide :: NVIDIA Deep Learning Frameworks Documentation

Could you post the performance numbers that you get from both of TF and TF-TRT in both of lower and higher resolution tests?

When MatMul sizes get very large, it’s possible that TensorRT optimizations such as fusion become less effective because native TF that uses CUDNN for such sizes is fast too.

Topic		Replies	Views
TF-TRT speed up not reproducible on custom trained SSD Inception model from TF model zoo. Frameworks tensorflow	4	764	August 5, 2019
No performance improvement with TF-TRT optimization (ResNet50, DenseNet121) TensorRT	4	1090	June 15, 2020
Conversion with no speed improvement, TRT-TF TensorRT	2	1138	October 12, 2021
No SpeedUp after TensorRT INT8 (PointNet ++ tensorflow model) TensorRT	6	1253	February 25, 2020
TF_TRT unsupported Constant Type TensorRT	3	793	January 18, 2019
TensorRT not improving FPS on GTX 1080ti TensorRT	9	2398	November 21, 2018
No speed up with TensorRT FP16 or INT8 on NVIDIA V100 TensorRT	7	2811	November 15, 2019
No speed up tensorrt model in inference (xavier) Jetson AGX Xavier tensorrt	4	624	October 18, 2021
TensorRT can't speed on TensorFlow model Frameworks tensorflow	1	848	August 7, 2019
Inference time using TF-TRT is the same as Native Tensorflow for Object Detection Models TensorRT tensorrt , tf-trt	4	1008	March 31, 2022

TF-TRT speed up not reproducible on custom trained SSD Inception model from TF model zoo.

Related topics