When I used the TF-TRT tool to deploy the tensorflow model, I found that when deployed as FP32 precision, FP16 precision, and INT8 precision, the resulting file sizes varied greatly and were all much larger than the original tensorflow files. INT8 precision has the largest deployment file size. In theory, the deployment method with the lowest precision saves the smallest number of weight digits, and the file size should be reduced. I would like to ask a few questions.
- Why is the deployment file much larger than the original tensorflow model file?
- Why does the lower the deployment accuracy, the larger the file size?
TensorRT Version: 220.127.116.11
GPU Type: Titan Xp
Nvidia Driver Version: 440.36
CUDA Version: 10.1
CUDNN Version: 18.104.22.168
Operating System + Version: Ubuntu18.04
Python Version (if applicable): 3.6.9
TensorFlow Version (if applicable): 2.3.0
tensorflow model: 45M
Looks like you’re using a very old version of the TensorRT, we recommend you to please try on the latest TensorRT version and let us know if you still face this issue.
I installed jetpack5.0DP on jetson agx xavier, its environment is:
TensorRT Version : 22.214.171.124
CUDA Version : 11.4.4
CUDNN Version : 8.3.2
Operating System + Version : Ubuntu20.04
Python Version (if applicable) : 3.8.10
TensorFlow Version (if applicable) : 2.8.0
After doing the same TF-TRT conversion experiment, the obtained files still have the above two problems. The specific file size after deployment is:
tensorflow model: 45M
Could you please tell me why this happens?
We will get back to you on your queries, could you please share with us the issue repro script/model for better debugging here or via DM.
Thank you for your reply. The tensorflow model file and the deployed model file of the resnet18 classification model I used are in the following files.
tf_model.zip (39.7 MB)
tf_model_FP32.zip (79.2 MB)
tf_model_FP16.zip (79.2 MB)
tf_model_INT8.zip (89.1 MB)
The tf-trt based deployment code and demo image files I used are in the following files：
tf-trt.zip (21.3 KB)
Currently, there would be both TF and TRT portions of the network included. That’s why its expected to model size larger.
Even if INT precision is enabled, there is no guarantee that it will be used (TRT is allowed to use Fp16/32 if that is faster). So having the same engine size in all the cases is possible. INT models also save the calibration table, which could further increase the size. We are verifying to confirm.
Thank you very much for your reply. I have a general understanding of the reason. If there is a relevant detailed description published, please inform me.
Thank you so much for bringing this issue to our attention.
Actually, the calibration table is lightweight, There is some other reason for TF-TRT that results in a large saved model size.
We are tracking this issue internally and also created an issue in Tensorflow GIT - Variables saved in converted model · Issue #305 · tensorflow/tensorrt · GitHub.
Note that for the FP32 and FP16 conversion, the model was not built (which means the TRT engines were not saved to disk, they are created on the fly). If we call
input_fn can be the same function that was used for calibration) before
convert.save() then we shall see FP32 model size >= FP16 model size >= INT8 model size