TF-TRT optimzied PB file is too large( variable merged? )

calanchue · January 30, 2020, 7:45am

I used TF-TRT to optimize graph. But result PB file size is too big

I used the below script to convert graph.

docker run --rm --runtime=nvidia -it \
    -v /data:/tmp tensorflow/tensorflow:1.15.2-gpu-py3 \
    /usr/local/bin/saved_model_cli convert \
    --dir /tmp/model/1 \
    --output_dir /tmp/model.trt/1 \
    --tag_set serve \
    tensorrt --precision_mode FP32 --max_batch_size 32 --is_dynamic_op True

original PB size: 1.7M
optimized PB size: 1.1G
After look into the files, I found that there are no file under variables directory. My best guess is that every variables are merged to the optimized PB file.
original: variables/variables.data-00000-of-00001 856MB
optimized: no variables/variables.data-00000-of-00001 file

Because of PB size, i can’t load optimized model to tf-serving. there are 1gb limit for PB file.
Is there any known issue or am i used wrong option for optimizing?

SunilJB · January 30, 2020, 8:16am

Hi,

Could you please share the script and model file so we can help better?
Also, can you provide details on the platforms you are using:
o Linux distro and version
o GPU type
o Nvidia driver version
o CUDA version
o CUDNN version
o Python version [if using python]
o Tensorflow and PyTorch version
o TensorRT version

Meanwhile, please try generating the model in static mode or low precision.

Thanks

calanchue · January 31, 2020, 6:09am

model file : https://drive.google.com/file/d/1xZrVktHH0yjjO-M0HbK2oHo3ViPFoF_9/view?usp=sharing.
GPU: V100
Driver Version: 418.67
CUDA Version: 10.1
environment: As I wrote in the above script, I used tensorflow/tensorflow:1.15.2-gpu-py3 docker image. So anyone can reproduce result.

calanchue · January 31, 2020, 6:14am

I have tried static mode, and FP16. But there were no difference at all.
PB file is 1.1G and no variable files under variables directory.

SunilJB · February 3, 2020, 9:00am

Hi,

In previous provided link I am able to access the model file.
Can you share the sample script file as well to reproduce the issue?

Thanks

calanchue · February 3, 2020, 12:35pm

My script is here.
Change model_path and output_path before running the script.

model_path='path/to/saved_model'
output_path='path/to/optimized_model'

docker pull tensorflow/tensorflow:1.15.2-gpu-py3

docker run --rm --runtime=nvidia -it \
    -v $model_path:/model \
    -v $output_path:/output \
    tensorflow/tensorflow:1.15.2-gpu-py3 \
    /usr/local/bin/saved_model_cli convert \
    --dir /model \
    --output_dir /output \
    --tag_set serve \
    tensorrt --precision_mode FP32 --max_batch_size 32 --is_dynamic_op True

SunilJB · May 14, 2020, 6:47pm

Hi,

Request you to try NGC TF container and with Triton Inference Server and let us know if issue persist.
https://www.nvidia.com/en-in/gpu-cloud/containers/

Thanks