ConvNeXT inference with int8 quantization slower on tensorRT than fp32/fp16

Raj1234 · October 4, 2024, 5:21pm

Description

I am trying to quantize a convnext model to int8 but when I run inference it runs slower than my non quantized model.

Environment

TensorRT Version: 10.5.0

GPU Type: RTX 4090

Nvidia Driver Version: 556.12

CUDA Version:

CUDNN Version:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Fri_Jan__6_16:45:21_PST_2023
Cuda compilation tools, release 12.0, V12.0.140
Build cuda_12.0.r12.0/compiler.32267302_0

Operating System + Version: Ubuntu 24.04.1 LTS

Python Version (if applicable): 3.11
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 2.4.1cu11.8 (uses onnxruntime 1.16.1 when calling torch.onnx.export)
Baremetal or Container (if container which image + tag):

Relevant Files

quantization - Google Drive (drive folder with notebook, requrements file and onnx files to reproduce)

AakankshaS · November 30, 2024, 9:56am

Hi @Raj1234
Are you still facing this issue

Topic		Replies	Views
TRT Engin in INT8 is much slower than FP16 TensorRT	4	1865	November 11, 2021
TensorRT int8 slower than FP16 due to reformat layer TensorRT tensorrt , cudnn	0	40	October 11, 2024
Why does Int8 quantization occupy more GPU graphics memory than float16, TensorRT quantization TensorRT	1	480	June 6, 2023
TensorRT INT8 inference accuracy TensorRT	2	493	May 9, 2022
TensorRT inference time extremely slow TensorRT	1	443	January 31, 2023
TensorRT inference time issues with different driver version TensorRT	1	382	September 20, 2023
Tensorrt inference slower than tensorflow TensorRT	3	480	November 27, 2020
TensorRT inference time much slower than cuDNN TensorRT	3	2012	October 12, 2021
8bit quantized onnx file and its 8bit engine inference results differ TensorRT tensorrt	2	676	November 21, 2021
Int8 performance is less than fp16 TensorRT tensorrt	3	838	September 2, 2022

ConvNeXT inference with int8 quantization slower on tensorRT than fp32/fp16

Description

Environment

Relevant Files

Related topics