TF-TRT optimzied PB file is too large( variable merged? )

I used TF-TRT to optimize graph. But result PB file size is too big

I used the below script to convert graph.

docker run --rm --runtime=nvidia -it \
    -v /data:/tmp tensorflow/tensorflow:1.15.2-gpu-py3 \
    /usr/local/bin/saved_model_cli convert \
    --dir /tmp/model/1 \
    --output_dir /tmp/model.trt/1 \
    --tag_set serve \
    tensorrt --precision_mode FP32 --max_batch_size 32 --is_dynamic_op True

original PB size: 1.7M
optimized PB size: 1.1G
After look into the files, I found that there are no file under variables directory. My best guess is that every variables are merged to the optimized PB file.
original: variables/variables.data-00000-of-00001 856MB
optimized: no variables/variables.data-00000-of-00001 file

Because of PB size, i can’t load optimized model to tf-serving. there are 1gb limit for PB file.
Is there any known issue or am i used wrong option for optimizing?

Hi,

Could you please share the script and model file so we can help better?
Also, can you provide details on the platforms you are using:
o Linux distro and version
o GPU type
o Nvidia driver version
o CUDA version
o CUDNN version
o Python version [if using python]
o Tensorflow and PyTorch version
o TensorRT version

Meanwhile, please try generating the model in static mode or low precision.

Thanks

model file : https://drive.google.com/file/d/1xZrVktHH0yjjO-M0HbK2oHo3ViPFoF_9/view?usp=sharing.
GPU: V100
Driver Version: 418.67
CUDA Version: 10.1
environment: As I wrote in the above script, I used tensorflow/tensorflow:1.15.2-gpu-py3 docker image. So anyone can reproduce result.

I have tried static mode, and FP16. But there were no difference at all.
PB file is 1.1G and no variable files under variables directory.

Hi,

In previous provided link I am able to access the model file.
Can you share the sample script file as well to reproduce the issue?

Thanks

My script is here.
Change model_path and output_path before running the script.

model_path='path/to/saved_model'
output_path='path/to/optimized_model'

docker pull tensorflow/tensorflow:1.15.2-gpu-py3

docker run --rm --runtime=nvidia -it \
    -v $model_path:/model \
    -v $output_path:/output \
    tensorflow/tensorflow:1.15.2-gpu-py3 \
    /usr/local/bin/saved_model_cli convert \
    --dir /model \
    --output_dir /output \
    --tag_set serve \
    tensorrt --precision_mode FP32 --max_batch_size 32 --is_dynamic_op True

Hi,

Request you to try NGC TF container and with Triton Inference Server and let us know if issue persist.


https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html

Thanks