Why does Int8 quantization occupy more GPU graphics memory than float16, TensorRT quantization

liming19815 · May 29, 2023, 7:08am

Description

A clear and concise description of the bug or issue.

Environment

TensorRT Version:
GPU Type: NVIDIA GeForce RTX 3090
Nvidia Driver Version: 515.65.01
CUDA Version: 11.3
CUDNN Version: 8.4
Operating System + Version: Centos7
Python Version (if applicable): 3.8
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 1.11
Baremetal or Container (if container which image + tag):

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

Exact steps/commands to build your repro
Exact steps/commands to run your repro
Full traceback of errors encountered

model size

int8 8.1M
float16 15.4MB
float32 29.9M

AakankshaS · June 6, 2023, 7:56am

Hi, Please refer to the below links to perform inference in INT8

Thanks!

Topic		Replies	Views
Is there any layer that fp16 supports but int8 does not？ TensorRT	5	485	December 1, 2021
TensorRT TensorRT tensorrt , python	1	317	October 27, 2021
Int8 quantization TensorRT	1	499	December 16, 2021
TensorRT INT8 inference accuracy TensorRT	2	498	May 9, 2022
Question about the tensorrt precision transformation TensorRT	4	470	July 12, 2021
Excuse me, does the 3060Ti graphics card support TensorRT int8 quantization? TensorRT	1	1133	June 23, 2022
Data inferencing to INT8U quantized model TensorRT tensorrt	2	408	October 12, 2021
tensorRT FP8 support TensorRT tensorrt	2	2566	June 21, 2023
ConvNeXT inference with int8 quantization slower on tensorRT than fp32/fp16 TensorRT cudnn , tensorrt-model-optimizer	1	70	November 30, 2024
Tensorrt inference runs slower in RTX4090 than RTX 3090 Ti TensorRT tensorrt	3	2024	January 10, 2023

Why does Int8 quantization occupy more GPU graphics memory than float16, TensorRT quantization

Description

Environment

Relevant Files

Steps To Reproduce

Related topics