ConvNeXT inference with int8 quantization slower on tensorRT than fp32/fp16

Description

I am trying to quantize a convnext model to int8 but when I run inference it runs slower than my non quantized model.

Environment

TensorRT Version: 10.5.0

GPU Type: RTX 4090

Nvidia Driver Version: 556.12

CUDA Version:

CUDNN Version:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Fri_Jan__6_16:45:21_PST_2023
Cuda compilation tools, release 12.0, V12.0.140
Build cuda_12.0.r12.0/compiler.32267302_0

Operating System + Version: Ubuntu 24.04.1 LTS

Python Version (if applicable): 3.11
TensorFlow Version (if applicable):
PyTorch Version (if applicable): 2.4.1cu11.8 (uses onnxruntime 1.16.1 when calling torch.onnx.export)
Baremetal or Container (if container which image + tag):

Relevant Files

quantization - Google Drive (drive folder with notebook, requrements file and onnx files to reproduce)

Hi @Raj1234
Are you still facing this issue