TensorRT model is double size

Description

Hello
I used nvcr.io/nvidia/tensorflow :19.09-py3
docker run -it --rm --gpus all -v /home/dalalalmotwaa/Desktop/tf_docker:/workspace/trt nvcr.io/nvidia/tensorflow:19.09-py3

I have tried to optimize my custom frozen model to run on TensorRT using create_inference_graph()

trt_graph = trt.create_inference_graph(
input_graph_def=frozen_graph,# frozen model
outputs=output_names,
max_batch_size=128,# specify your max batch size

max_workspace_size_bytes=2*(10**9),# specify the max workspace

precision_mode="FP32" , is_dynamic_op=True )	

with gfile.FastGFile(MODEL_PATH+“_TensorRT_model_FP16_128_tf_docker.pb”, ‘wb’) as f:
f.write(trt_graph.SerializeToString())
print(“TensorRT model is successfully stored!”)

but the TensorRT model is slower than frozen model ,also is double size

Note : I used this repository to convert the model : GitHub - ardianumam/Tensorflow-TensorRT: This repository is for my YT video series about optimizing a Tensorflow deep learning model using TensorRT. We demonstrate optimizing LeNet-like model and YOLOv3 model, and get 3.7x and 1.5x faster for the former and the latter, respectively, compared to the original models.

---- size ----
frozen model size is 72.6 MB
FP16 model size is 145.3 MB
---- time ----
TensorRT model
4.43951940536499 ms
1.1245734691619873 ms
0.3470771312713623 ms
0.244171142578125 ms
0.24429726600646973 ms
0.2525475025177002 ms
frozen model
0.256026029586792 ms
0.2516140937805176 ms
0.25358080863952637 ms
0.2555859088897705 ms
0.24196743965148926 ms
0.22859597206115723 ms
0.24667620658874512 ms

Environment

TensorRT Version: 6.0.1
GPU Type: GeForce RTX 2080 Ti
Nvidia Driver Version: 418.87.00
CUDA Version: 10.1.243
CUDNN Version: 7.6.4
Operating System + Version: Ubuntu 18.04
Python Version (if applicable): 3.6
TensorFlow Version (if applicable): 1.14.0.
Baremetal or Container (if container which image + tag): TensorFlow Release Notes :: NVIDIA Deep Learning Frameworks Documentation

Hi,

Can you try using latest TRT version on your system?

Also, if possible please try Yolo->ONNX-> TRT approach for better performance. For any supported layer you have to create a custom plugin.
Please refer to below sample:

“trtexec” command line tool is also very useful for testing, debugging and bench-marking your model:
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource/trtexec

Thanks
Thanks

Thank you very much I will try Yolo->ONNX-> TRT
What about resnet models ?

You can try the same approach for other model as well, like pb-> ONNX-> TRT or pt → ONNX → TRT

You can refer to TRT samples:
https://github.com/NVIDIA/TensorRT/tree/master/samples/opensource

Thanks