Description
A clear and concise description of the bug or issue.
Environment
TensorRT Version: TensorRT 8.0.1
GPU Type: RTX 3070
Nvidia Driver Version: 470.63.01
CUDA Version: 11.3
CUDNN Version: 8.2.2.26
Operating System + Version: Ubuntu 20.04
Python Version (if applicable): 3.6
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 1.7
Baremetal or Container (if container which image + tag): Baremetal
Relevant Files
onnx_file
caliberation
int8_model
fp16_model
infer.py
Steps To Reproduce
- Convert ONNX to TRT model in FP16 and INT8
./trtexec --onnx=model.onnx --minShapes=input0:1x1x1024x256 --optShapes=input0:1x1x1024x500 --maxShapes=input0:1x1x1024x650 --fp16 --workspace=5000 --verbose --saveEngine=model_fp16.bin
./trtexec --onnx=model.onnx --minShapes=input0:1x1x1024x256 --optShapes=input0:1x1x1024x500 --maxShapes=input0:1x1x1024x650 --int8 --calib=calibration.cache --workspace=5000 --verbose --saveEngine=model_int8.bin
- Infer data with these models
It shows that FP16 is much faster than INT8 model:
Time Used for model model_fp16.bin: 6.716s
Time Used for model model_int8.bin: 15.277s
Please help me on this.
Thank you.
Lanny