Conversion with no speed improvement, TRT-TF

My machine info:
Ubuntu 16.04
GeForce GTX 1080
Nvidia Driver Version: 410.78
CUDA Version: 10.0
CUDNN version : 7.5.0
Tensorflow version 1.13.0 from official nvidia docker-hub
Tensorrt version: from tensorflow package (5.0.2)
Problem description:
I have managed to convert my model to TRT-TF using create_inference_graph method from tensorflow.contrib.tensorrt module. However I do not get statistically significant increase in speed,if any , whatever precision (FP16 or FP32) I use and I am trying to clear up, what can be the reasons for such situation. (Bellow is the console output for optimization)


  1. Could it be due to the fact, that tensor shapes are unknown and optimization can not be implemented ?
  2. Is it ok, that my converted .pb file exactly two times larger, than my original frozen not-optimized .pb file ?
  3. Can the few number of supported TensorRT operations be the reason for that ?
  4. I am going to try conversion on Nvidia RTX 2080 Ti , should I expect any benefits in comparison with my current Nvidia GTX 1080 ?

Nvidia GTX1080 console output for optimization :

INFO:tensorflow:Running against TensorRT version 5.0.2
2019-04-11 13:01:33.811488: I tensorflow/stream_executor/cuda/] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-04-11 13:01:33.811947: I tensorflow/core/grappler/] Number of eligible GPUs (core count >= 8): 1
2019-04-11 13:01:33.812059: I tensorflow/core/grappler/clusters/] Starting new session
2019-04-11 13:01:33.839792: I tensorflow/core/platform/profile_utils/] CPU Frequency: 4200000000 Hz
2019-04-11 13:01:33.841161: I tensorflow/compiler/xla/service/] XLA service 0x1e5050c0 executing computations on platform Host. Devices:
2019-04-11 13:01:33.841243: I tensorflow/compiler/xla/service/]   StreamExecutor device (0): <undefined>, <undefined>
2019-04-11 13:01:33.845844: I tensorflow/compiler/xla/service/] XLA service 0x1e52c8e0 executing computations on platform CUDA. Devices:
2019-04-11 13:01:33.845926: I tensorflow/compiler/xla/service/]   StreamExecutor device (0): GeForce GTX 1080, Compute Capability 6.1
2019-04-11 13:01:33.846698: I tensorflow/core/common_runtime/gpu/] Found device 0 with properties: 
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.7335
pciBusID: 0000:01:00.0
totalMemory: 7.93GiB freeMemory: 7.21GiB
2019-04-11 13:01:33.846780: I tensorflow/core/common_runtime/gpu/] Adding visible gpu devices: 0
2019-04-11 13:01:34.155614: I tensorflow/core/common_runtime/gpu/] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-04-11 13:01:34.155652: I tensorflow/core/common_runtime/gpu/]      0 
2019-04-11 13:01:34.155661: I tensorflow/core/common_runtime/gpu/] 0:   N 
2019-04-11 13:01:34.155794: I tensorflow/core/common_runtime/gpu/] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6926 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1)
2019-04-11 13:01:34.772860: I tensorflow/contrib/tensorrt/segment/] There are 7 ops of 4 different types in the graph that are not converted to TensorRT: ConcatV2, Transpose, Placeholder, NoOp, (For more information see
2019-04-11 13:01:34.789697: I tensorflow/contrib/tensorrt/convert/] Number of TensorRT candidate segments: 2
2019-04-11 13:01:34.797911: W tensorflow/contrib/tensorrt/convert/] Validation failed for TensorRTInputPH_0 and input slot 0: Input tensor with shape [?,?,?,3] has an unknown non-batch dimension at dim 1
2019-04-11 13:01:34.797949: W tensorflow/contrib/tensorrt/convert/] TensorRT node TRTEngineOp_0 added for segment 0 consisting of 4 nodes failed: Invalid argument: Validation failed for TensorRTInputPH_0 and input slot 0: Input tensor with shape [?,?,?,3] has an unknown non-batch dimension at dim 1. Fallback to TF...
2019-04-11 13:01:34.798309: W tensorflow/contrib/tensorrt/convert/] Validation failed for TensorRTInputPH_0 and input slot 0: Input tensor with shape [?,3,?,?] has an unknown non-batch dimension at dim 2
2019-04-11 13:01:34.798325: W tensorflow/contrib/tensorrt/convert/] TensorRT node TRTEngineOp_1 added for segment 1 consisting of 461 nodes failed: Invalid argument: Validation failed for TensorRTInputPH_0 and input slot 0: Input tensor with shape [?,3,?,?] has an unknown non-batch dimension at dim 2. Fallback to TF...
2019-04-11 13:01:34.810444: I tensorflow/core/grappler/optimizers/] Optimization results for grappler item: tf_graph
2019-04-11 13:01:34.810478: I tensorflow/core/grappler/optimizers/]   constant folding: Graph size after: 468 nodes (0), 478 edges (0), time = 61.211ms.
2019-04-11 13:01:34.810485: I tensorflow/core/grappler/optimizers/]   layout: Graph size after: 478 nodes (10), 484 edges (6), time = 25.671ms.
2019-04-11 13:01:34.810490: I tensorflow/core/grappler/optimizers/]   constant folding: Graph size after: 473 nodes (-5), 484 edges (0), time = 40.989ms.
2019-04-11 13:01:34.810495: I tensorflow/core/grappler/optimizers/]   TensorRTOptimizer: Graph size after: 473 nodes (0), 484 edges (0), time = 261.698ms.

The problem was in undetermined ( dynamic) shape of input placeholder , e.g. [?,?,?,3] or something like that. Changing it to [?,W,H,C] solved the problem.