Conversion with no speed improvement, TRT-TF

My machine info:
Ubuntu 16.04
GeForce GTX 1080
Nvidia Driver Version: 410.78
CUDA Version: 10.0
CUDNN version : 7.5.0
Tensorflow version 1.13.0 from official nvidia docker-hub nvcr.io/nvidia/tensorflow:18.09-py3.
Tensorrt version: from tensorflow package (5.0.2)
Problem description:
I have managed to convert my model to TRT-TF using create_inference_graph method from tensorflow.contrib.tensorrt module. However I do not get statistically significant increase in speed,if any , whatever precision (FP16 or FP32) I use and I am trying to clear up, what can be the reasons for such situation. (Bellow is the console output for optimization)

Questions:

  1. Could it be due to the fact, that tensor shapes are unknown and optimization can not be implemented ?
  2. Is it ok, that my converted .pb file exactly two times larger, than my original frozen not-optimized .pb file ?
  3. Can the few number of supported TensorRT operations be the reason for that ?
  4. I am going to try conversion on Nvidia RTX 2080 Ti , should I expect any benefits in comparison with my current Nvidia GTX 1080 ?

Nvidia GTX1080 console output for optimization :

INFO:tensorflow:Running against TensorRT version 5.0.2
2019-04-11 13:01:33.811488: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-04-11 13:01:33.811947: I tensorflow/core/grappler/devices.cc:51] Number of eligible GPUs (core count >= 8): 1
2019-04-11 13:01:33.812059: I tensorflow/core/grappler/clusters/single_machine.cc:359] Starting new session
2019-04-11 13:01:33.839792: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 4200000000 Hz
2019-04-11 13:01:33.841161: I tensorflow/compiler/xla/service/service.cc:161] XLA service 0x1e5050c0 executing computations on platform Host. Devices:
2019-04-11 13:01:33.841243: I tensorflow/compiler/xla/service/service.cc:168]   StreamExecutor device (0): <undefined>, <undefined>
2019-04-11 13:01:33.845844: I tensorflow/compiler/xla/service/service.cc:161] XLA service 0x1e52c8e0 executing computations on platform CUDA. Devices:
2019-04-11 13:01:33.845926: I tensorflow/compiler/xla/service/service.cc:168]   StreamExecutor device (0): GeForce GTX 1080, Compute Capability 6.1
2019-04-11 13:01:33.846698: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: 
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.7335
pciBusID: 0000:01:00.0
totalMemory: 7.93GiB freeMemory: 7.21GiB
2019-04-11 13:01:33.846780: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-04-11 13:01:34.155614: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-04-11 13:01:34.155652: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2019-04-11 13:01:34.155661: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2019-04-11 13:01:34.155794: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6926 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1)
2019-04-11 13:01:34.772860: I tensorflow/contrib/tensorrt/segment/segment.cc:461] There are 7 ops of 4 different types in the graph that are not converted to TensorRT: ConcatV2, Transpose, Placeholder, NoOp, (For more information see https://docs.nvidia.com/deeplearning/dgx/integrate-tf-trt/index.html#support-ops).
2019-04-11 13:01:34.789697: I tensorflow/contrib/tensorrt/convert/convert_graph.cc:928] Number of TensorRT candidate segments: 2
2019-04-11 13:01:34.797911: W tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3728] Validation failed for TensorRTInputPH_0 and input slot 0: Input tensor with shape [?,?,?,3] has an unknown non-batch dimension at dim 1
2019-04-11 13:01:34.797949: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:1036] TensorRT node TRTEngineOp_0 added for segment 0 consisting of 4 nodes failed: Invalid argument: Validation failed for TensorRTInputPH_0 and input slot 0: Input tensor with shape [?,?,?,3] has an unknown non-batch dimension at dim 1. Fallback to TF...
2019-04-11 13:01:34.798309: W tensorflow/contrib/tensorrt/convert/convert_nodes.cc:3728] Validation failed for TensorRTInputPH_0 and input slot 0: Input tensor with shape [?,3,?,?] has an unknown non-batch dimension at dim 2
2019-04-11 13:01:34.798325: W tensorflow/contrib/tensorrt/convert/convert_graph.cc:1036] TensorRT node TRTEngineOp_1 added for segment 1 consisting of 461 nodes failed: Invalid argument: Validation failed for TensorRTInputPH_0 and input slot 0: Input tensor with shape [?,3,?,?] has an unknown non-batch dimension at dim 2. Fallback to TF...
2019-04-11 13:01:34.810444: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:581] Optimization results for grappler item: tf_graph
2019-04-11 13:01:34.810478: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:583]   constant folding: Graph size after: 468 nodes (0), 478 edges (0), time = 61.211ms.
2019-04-11 13:01:34.810485: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:583]   layout: Graph size after: 478 nodes (10), 484 edges (6), time = 25.671ms.
2019-04-11 13:01:34.810490: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:583]   constant folding: Graph size after: 473 nodes (-5), 484 edges (0), time = 40.989ms.
2019-04-11 13:01:34.810495: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:583]   TensorRTOptimizer: Graph size after: 473 nodes (0), 484 edges (0), time = 261.698ms.

The problem was in undetermined ( dynamic) shape of input placeholder , e.g. [?,?,?,3] or something like that. Changing it to [?,W,H,C] solved the problem.