Description
Hello
I used nvcr.io/nvidia/tensorflow :19.09-py3
docker run -it --rm --gpus all -v /home/dalalalmotwaa/Desktop/tf_docker:/workspace/trt nvcr.io/nvidia/tensorflow:19.09-py3
I have tried to optimize my custom frozen model to run on TensorRT using create_inference_graph()
trt_graph = trt.create_inference_graph(
input_graph_def=frozen_graph,# frozen model
outputs=output_names,
max_batch_size=128,# specify your max batch size
max_workspace_size_bytes=2*(10**9),# specify the max workspace
precision_mode="FP32" , is_dynamic_op=True )
with gfile.FastGFile(MODEL_PATH+“_TensorRT_model_FP16_128_tf_docker.pb”, ‘wb’) as f:
f.write(trt_graph.SerializeToString())
print(“TensorRT model is successfully stored!”)
but the TensorRT model is slower than frozen model ,also is double size
Note : I used this repository to convert the model : GitHub - ardianumam/Tensorflow-TensorRT: This repository is for my YT video series about optimizing a Tensorflow deep learning model using TensorRT. We demonstrate optimizing LeNet-like model and YOLOv3 model, and get 3.7x and 1.5x faster for the former and the latter, respectively, compared to the original models.
---- size ----
frozen model size is 72.6 MB
FP16 model size is 145.3 MB
---- time ----
TensorRT model
4.43951940536499 ms
1.1245734691619873 ms
0.3470771312713623 ms
0.244171142578125 ms
0.24429726600646973 ms
0.2525475025177002 ms
frozen model
0.256026029586792 ms
0.2516140937805176 ms
0.25358080863952637 ms
0.2555859088897705 ms
0.24196743965148926 ms
0.22859597206115723 ms
0.24667620658874512 ms
Environment
TensorRT Version: 6.0.1
GPU Type: GeForce RTX 2080 Ti
Nvidia Driver Version: 418.87.00
CUDA Version: 10.1.243
CUDNN Version: 7.6.4
Operating System + Version: Ubuntu 18.04
Python Version (if applicable): 3.6
TensorFlow Version (if applicable): 1.14.0.
Baremetal or Container (if container which image + tag): TensorFlow Release 19.10 - NVIDIA Docs