No Speedup after Tensorrt6 INT8

Hi,
I use i7 + GTX1660 ti.
Software info:
Tensorrt 6
tensorflow 1.15.0
cuda 10.1
cudnn 7.6.3
ubuntu 16 (docker)

I transfer my model to tensorrt engine using tftrt. The log is as shown:

2020-02-18 04:06:40.806897: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:633] Number of TensorRT candidate segments: 76
2020-02-18 04:06:41.444418: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:734] TensorRT node TRTEngineOp_0 added for segment 0 consisting of 10 nodes succeeded.
2020-02-18 04:06:41.444539: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:734] TensorRT node TRTEngineOp_1 added for segment 1 consisting of 19 nodes succeeded.
2020-02-18 04:06:41.444757: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:734] TensorRT node TRTEngineOp_2 added for segment 2 consisting of 17 nodes succeeded.
2020-02-18 04:06:41.444921: I tensorflow/compiler/tf2tensorrt/convert/convert_graph.cc:734] TensorRT node TRTEngineOp_3 added for segment 3 consisting of 17 nodes succeeded.

2020-02-18 04:06:42.007572: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:786] Optimization results for grappler item: TRTEngineOp_24_native_segment
2020-02-18 04:06:42.007581: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788] constant_folding: Graph size after: 20 nodes (0), 19 edges (0), time = 0.919ms.
2020-02-18 04:06:42.007587: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788] layout: Graph size after: 20 nodes (0), 19 edges (0), time = 0.652ms.
2020-02-18 04:06:42.007593: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788] constant_folding: Graph size after: 20 nodes (0), 19 edges (0), time = 0.765ms.
2020-02-18 04:06:42.007599: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788] TensorRTOptimizer: Graph size after: 20 nodes (0), 19 edges (0), time = 0.095ms.
2020-02-18 04:06:42.007605: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788] constant_folding: Graph size after: 20 nodes (0), 19 edges (0), time = 0.828ms.

graph_size(MB)(native_tf): 52.6
graph_size(MB)(trt): 52.8
num_nodes(native_tf): 4243
num_nodes(tftrt_total): 3429
num_nodes(trt_only): 76

Calibrating INT8…
2020-02-18 04:06:43.967599: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:812] Starting calibration thread on device 0, Calibration Resource @ 0x7f5e44009490
2020-02-18 04:06:43.967669: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer.so.6
2020-02-18 04:06:43.968091: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer_plugin.so.6
2020-02-18 04:06:58.017542: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-02-18 04:06:58.058014: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:812] Starting calibration thread on device 0, Calibration Resource @ 0x7f5e3c008780
2020-02-18 04:06:58.079357: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:812] Starting calibration thread on device 0, Calibration Resource @ 0x7f5e74007ed0
2020-02-18 04:06:58.152381: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:812] Starting calibration thread on device 0, Calibration Resource @ 0x7f5e6c01f3a0
2020-02-18 04:06:58.188232: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-02-18 04:06:58.188977: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:812] Starting calibration thread on device 0, Calibration Resource @ 0x7f5e5006f210
2020-02-18 04:06:58.258622: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:812] Starting calibration thread on device 0, Calibration Resource @ 0x7f5e6c0279a0
2020-02-18 04:06:58.301596: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:812] Starting calibration thread on device 0, Calibration Resource @ 0x7f5e6c027860
2020-02-18 04:06:58.382948: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:812] Starting calibration thread on device 0, Calibration Resource @ 0x7f5e6c03b280
2020-02-18 04:06:58.432091: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:812] Starting calibration thread on device 0, Calibration Resource @ 0x7f5e6c03fea0
2020-02-18 04:06:58.467576: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:812] Starting calibration thread on device 0, Calibration Resource @ 0x7f5e5007d720

2020-02-18 04:07:12.796185: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:812] Starting calibration thread on device 0, Calibration Resource @ 0x7f5e3c0b45c0
2020-02-18 04:07:12.836554: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:812] Starting calibration thread on device 0, Calibration Resource @ 0x7f5e3c0bbc70
2020-02-18 04:07:13.362552: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:812] Starting calibration thread on device 0, Calibration Resource @ 0x7f5e3c0c3520
2020-02-18 04:07:14.132879: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:812] Starting calibration thread on device 0, Calibration Resource @ 0x7f5e3c0c48f0
2020-02-18 04:07:14.132989: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:812] Starting calibration thread on device 0, Calibration Resource @ 0x7f5e4c001180
2020-02-18 04:07:14.331160: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:812] Starting calibration thread on device 0, Calibration Resource @ 0x7f5e3c0e3660

However, I test the engine model has the same speed with tensorflow pb model.
The log as infer as shown:

2020-02-18 04:10:37.066778: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:733] Building a new TensorRT engine for TRTEngineOp_0 input shapes: [[1,8192,3]]
2020-02-18 04:10:37.066912: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer.so.6
2020-02-18 04:10:37.067393: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer_plugin.so.6
2020-02-18 04:10:55.899338: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:733] Building a new TensorRT engine for fp_2/TRTEngineOp_14 input shapes: [[1,256,3]]
2020-02-18 04:10:55.916907: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:733] Building a new TensorRT engine for fp_1/TRTEngineOp_10 input shapes: [[1,1024,3]]
2020-02-18 04:10:55.988548: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:733] Building a new TensorRT engine for fp_0/TRTEngineOp_6 input shapes: [[1,8192,3]]
2020-02-18 04:10:56.048204: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-02-18 04:10:56.049951: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:733] Building a new TensorRT engine for TRTEngineOp_57 input shapes: [[1,64,8192,4]]
2020-02-18 04:10:56.470749: I tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:733] Building a new TensorRT engine for TRTEngineOp_58 input shapes: [[1,64,8192,2]]

By the way, I also test FP32, FP16, INT8, all of them have same speed. I think I convert the model successully but no speed up.
Is there some problem in my model or in TensorRT.
Could you please give me some ideas?
Apprecaite for your help in advace.

I suggest that you can try the different cuda version. As I knew, TRT can only support cuda 10.0 or 10.2.
Also, I knew some hardware cannot support int8 or int16 very well that it will influence the performance…