I have an observation on TF-TRT support for tensorflow models. I am able to reproduce 1.5x-2x TensorRT speed-up on TF model-zoo models as well as any custom retrained models of ssd_inception_v2 for 300x300 resolution (resolution used on model zoo), but I am not able to reproduce the same 1.5x -2x speed up for high resolution retrained models on 1920x1080 of ssd_inception_v2. The tf frozen_graph and the tensorrt optimized graph run roughly at the same fps.
Are there any aspects that might affect the kernel fusion or creating TRTEngineOps for TF subgraphs with the high resolution model (as the weight matrix is much larger for this than the 300x300 models). But, I would still expect some amount of speed up either way as I do see creation of TRTEngineOps the same as the 300x300 model conversion. Since they are the same architecture trained with different resolution imagery and that the converted TRTEngineOps introduced are also the same, what could be the cause for this discrepancy?
I use FP16 precision. An example of the logs printed from TRT graph conversion (for both 300x300 and 1920x1080) is:
[tensorflow/contrib/tensorrt/convert/convert_graph.cc:913] Number of TensorRT candidate segments: 4
[tensorflow/contrib/tensorrt/convert/convert_graph.cc:1015] TensorRT node TRTEngineOp_0 added for segment 0 consisting of 945 nodes succeeded.
[tensorflow/contrib/tensorrt/convert/convert_graph.cc:1015] TensorRT node Postprocessor/TRTEngineOp_1 added for segment 1 consisting of 3 nodes succeeded.
[tensorflow/contrib/tensorrt/convert/convert_graph.cc:1015] TensorRT node TRTEngineOp_2 added for segment 2 consisting of 5 nodes succeeded.
[tensorflow/contrib/tensorrt/convert/convert_graph.cc:1015] TensorRT node TRTEngineOp_3 added for segment 3 consisting of 4 nodes succeeded.
[tensorflow/contrib/tensorrt/convert/trt_optimization_pass.cc:265] Returning from TensorRTOptimizer
[tensorflow/core/grappler/optimizers/meta_optimizer.cc:581] Optimization results for grappler item: tf_graph
[tensorflow/core/grappler/optimizers/meta_optimizer.cc:583] constant folding: Graph size after: 2014 nodes (-1393), 2809 edges (-1542), time = 331.582ms.
[tensorflow/core/grappler/optimizers/meta_optimizer.cc:583] layout: Graph size after: 2051 nodes (37), 2847 edges (38), time = 99.096ms.
[tensorflow/core/grappler/optimizers/meta_optimizer.cc:583] constant folding: Graph size after: 2041 nodes (-10), 2847 edges (0), time = 173.335ms.
[tensorflow/core/grappler/optimizers/meta_optimizer.cc:583] TensorRTOptimizer: Graph size after: 1088 nodes (-953), 1702 edges (-1145), time = 483.096ms.