Why does Int8 quantization occupy more GPU graphics memory than float16, TensorRT quantization


TensorRT Version:
GPU Type: NVIDIA GeForce RTX 3090
Nvidia Driver Version: 515.65.01
CUDA Version: 11.3
CUDNN Version: 8.4
Operating System + Version: Centos7
Python Version (if applicable): 3.8
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 1.11
Baremetal or Container (if container which image + tag):

model size

int8 8.1M
float16 15.4MB
float32 29.9M

