Question about the tensorrt precision transformation


A clear and concise description of the bug or issue.


TensorRT Version :
GPU Type : host rtx3090, target tx2
Nvidia Driver Version : 455.32
CUDA Version : 11.1
CUDNN Version :
Operating System + Version : host ubuntu1804 target tx2 (jetpack4.4)
Python Version (if applicable) : 3.8.5
TensorFlow Version (if applicable) :
PyTorch Version (if applicable) : 1.9.0+cu111
Baremetal or Container (if container which image + tag) : container

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

  • Exact steps/commands to build your repro
  • Exact steps/commands to run your repro
  • Full traceback of errors encountered

The original idea was to create a torch of the fp32 model, then convert it to onnx, and finally convert it to fp16, int8. But when you’ve done a lot of debugging, you have to decide from the beginning to make the conversion exactly fp16. Is the workflow supposed to be like this? Or is there a problem with my code? I got the sample code from tensorrt and executed it. Or is it because the current host ubuntu18+rtx3090 does not support fp16?

Hi, Please refer to the below links to perform inference in INT8


Hello, I’m not interested in converting INT8, but I’d like to use FP16.
I have to turn over the parameters to use FP16 when creating the engine, right? I’m wondering why it’s not flagged.

I’m going to try to use FP16, can’t rtx3090 be converted? Thank you


Please refer support matrix doc to check hardware and precision support based on the GPU architecture and CUDA compute capability.
Also you can refer this doc talks more about RTX GPU architecture and DL precision it supports,

Thank you.