Tensorrt inference runs slower in RTX4090 than RTX 3090 Ti


Recently we are trying to test RTX4090 by running yolov5 tensorrt int8 model engine, and found out the inference speed slower than RTX 3090 Ti, we can’t figure out what’s wrong with it, I want to know which TensorRT version begins to support RTX 4090 ?


TensorRT Version: TensorRT-
GPU Type: RTX 4090
Nvidia Driver Version: 522.06 DCH/win10 64
CUDA Version: 11.8
CUDNN Version: 8.6.0
Operating System + Version: Win10
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

  • Exact steps/commands to build your repro
  • Exact steps/commands to run your repro
  • Full traceback of errors encountered

Hi, Please refer to the below links to perform inference in INT8


thanks for your quick response , but my question is that I can run int8 engine successfully, but the speed seems not expected faster than 3090 Ti, which TensorRT version supports 4090 gpu card better ?


Could you please share with us the ONNX model and both complete verbose logs for better debugging.

Thank you.