Question about the tensorrt precision transformation

yeongjae8066 · July 9, 2021, 6:23am

Description

A clear and concise description of the bug or issue.

Environment

TensorRT Version : 7.2.2.1
GPU Type : host rtx3090, target tx2
Nvidia Driver Version : 455.32
CUDA Version : 11.1
CUDNN Version : 8.0.5.43
Operating System + Version : host ubuntu1804 target tx2 (jetpack4.4)
Python Version (if applicable) : 3.8.5
TensorFlow Version (if applicable) :
PyTorch Version (if applicable) : 1.9.0+cu111
Baremetal or Container (if container which image + tag) : container nvcr.io/nvidia/tensorrt:20.12-py3

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

Exact steps/commands to build your repro
Exact steps/commands to run your repro
Full traceback of errors encountered

The original idea was to create a torch of the fp32 model, then convert it to onnx, and finally convert it to fp16, int8. But when you’ve done a lot of debugging, you have to decide from the beginning to make the conversion exactly fp16. Is the workflow supposed to be like this? Or is there a problem with my code? I got the sample code from tensorrt and executed it. Or is it because the current host ubuntu18+rtx3090 does not support fp16?

NVES · July 9, 2021, 6:37am

Hi, Please refer to the below links to perform inference in INT8
https://github.com/NVIDIA/TensorRT/blob/master/samples/opensource/sampleINT8/README.md

Thanks!

yeongjae8066 · July 9, 2021, 7:47am

Hello, I’m not interested in converting INT8, but I’d like to use FP16.
I have to turn over the parameters to use FP16 when creating the engine, right? I’m wondering why it’s not flagged.

yeongjae8066 · July 12, 2021, 8:25am

I’m going to try to use FP16, can’t rtx3090 be converted? Thank you

spolisetty · July 12, 2021, 10:34am

@yeongjae8066,

Please refer support matrix doc to check hardware and precision support based on the GPU architecture and CUDA compute capability.

Also you can refer this doc talks more about RTX GPU architecture and DL precision it supports, https://www.nvidia.com/content/PDF/nvidia-ampere-ga-102-gpu-architecture-whitepaper-v2.pdf

Thank you.

Topic		Replies	Views
Is there any layer that fp16 supports but int8 does not？ TensorRT	5	478	December 1, 2021
Fp16 engine（generate on windows with TRT861） get stuck on linux (TRT 861) TensorRT	1	383	September 13, 2023
What will the engine's precision be if setting with unsupported precision? TensorRT tensorrt	5	648	October 12, 2021
Acceleration with INT8 precision using TensorRT TensorRT tensorrt , cuda , deep-learning	6	740	February 13, 2021
TensorRT INT8 inference accuracy TensorRT	2	493	May 9, 2022
TensorRT int8 slower than FP16 due to reformat layer TensorRT tensorrt , cudnn	0	32	October 11, 2024
TensorRT TensorRT tensorrt , python	1	317	October 27, 2021
Why does Int8 quantization occupy more GPU graphics memory than float16, TensorRT quantization TensorRT	1	475	June 6, 2023
TensorRT, result error in fp16 TensorRT	1	693	October 19, 2021
Data inferencing to INT8U quantized model TensorRT tensorrt	2	406	October 12, 2021

Question about the tensorrt precision transformation

Description

Environment

Relevant Files

Steps To Reproduce

Related topics