TF-TRT optimzied PB file is too large( variable merged? )

I used TF-TRT to optimize graph. But result PB file size is too big

I used the below script to convert graph.

docker run --rm --runtime=nvidia -it \
    -v /data:/tmp tensorflow/tensorflow:1.15.2-gpu-py3 \
    /usr/local/bin/saved_model_cli convert \
    --dir /tmp/model/1 \
    --output_dir /tmp/model.trt/1 \
    --tag_set serve \
    tensorrt --precision_mode FP32 --max_batch_size 32 --is_dynamic_op True

original PB size: 1.7M
optimized PB size: 1.1G
After look into the files, I found that there are no file under variables directory. My best guess is that every variables are merged to the optimized PB file.
original: variables/ 856MB
optimized: no variables/ file

Because of PB size, i can’t load optimized model to tf-serving. there are 1gb limit for PB file.
Is there any known issue or am i used wrong option for optimizing?


Could you please share the script and model file so we can help better?
Also, can you provide details on the platforms you are using:
o Linux distro and version
o GPU type
o Nvidia driver version
o CUDA version
o CUDNN version
o Python version [if using python]
o Tensorflow and PyTorch version
o TensorRT version

Meanwhile, please try generating the model in static mode or low precision.


model file :
GPU: V100
Driver Version: 418.67
CUDA Version: 10.1
environment: As I wrote in the above script, I used tensorflow/tensorflow:1.15.2-gpu-py3 docker image. So anyone can reproduce result.

I have tried static mode, and FP16. But there were no difference at all.
PB file is 1.1G and no variable files under variables directory.


In previous provided link I am able to access the model file.
Can you share the sample script file as well to reproduce the issue?


My script is here.
Change model_path and output_path before running the script.


docker pull tensorflow/tensorflow:1.15.2-gpu-py3

docker run --rm --runtime=nvidia -it \
    -v $model_path:/model \
    -v $output_path:/output \
    tensorflow/tensorflow:1.15.2-gpu-py3 \
    /usr/local/bin/saved_model_cli convert \
    --dir /model \
    --output_dir /output \
    --tag_set serve \
    tensorrt --precision_mode FP32 --max_batch_size 32 --is_dynamic_op True


Request you to try NGC TF container and with Triton Inference Server and let us know if issue persist.