What affects the floating point accuracy of an tensorrt engine output?

Firstly, I post this issue to TensorRT forum.
However, I heard it is better to post Jetson forum from AakankashaS.
So I moved this topic to Jetson forum.

Description

We know order of floating-point calculation may affect to the floating-point accuracy as follows.

CUDA Floating Point (nvidia.com)

So I want to know which modules are related to the order of calculation and output of tenserRT engine.

i) tensorRT engine only.
ii) TensorRT runtime
iii) others

Environment

TensorRT Version: 8.5.2.2
GPU Type: jetson orin 32gb
Nvidia Driver Version: 35.2.1
CUDA Version: 11.4
CUDNN Version: 8.6.0
Operating System + Version: Ubuntu 20.04.6 LTS
Python Version (if applicable):
TensorFlow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if container which image + tag):

Relevant Files

Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.)

Steps To Reproduce

Please include:

  • Exact steps/commands to build your repro
  • Exact steps/commands to run your repro
  • Full traceback of errors encountered

Hi,

This looks like a Jetson issue. Please refer to the below samples in case useful.

For any further assistance, we will move this post to to Jetson related forum.

Thanks!

Dear [AakankshaS],(Profile - AakankshaS - NVIDIA Developer Forums)

Thank you for your information.

However, I cannot get the answer from samples.

Could you move this this post to Jetson related forum?

Regards,
hiro

Hi,

When inferring with TensorRT, you will need to build the model (ex. ONNX) into the TensorRT engine.
At that point, you can choose which data format you want to convert to. For example, int8, fp32, and fp16.

If the same engine model is inferred, the output is expected to be similar.
Thanks.

Dear AastaLLL,

If the same engine model is inferred, the output is expected to be similar.

We plan to use fp16 format.
And we know order of floating-point calculation may affect to the floating-point accuracy.

So I want to which modules are related to the order of calculation and output of tenserRT engine.

i) tensorRT engine only.
ii) TensorRT runtime
iii) others

Regrards,
hiro

Hi,

The accuracy loss comes from data precision loss so it should be i).

If PTQ (post-training quantization) is used, the accuracy loss cannot be reduced.
But if using QAT (quantization-aware training), DNN can learn the possible accuracy loss and react.

Thanks.

Hi, AastaLLL,

The accuracy loss comes from data precision loss so it should be i).

My question is not data precision loss but operations order and accuracy .
Please check 2.2 Operations and Accuracy in following NVIDIA site.

CUDA Floating Point (nvidia.com)

The floating point value of ((A+B)+C) does not equal the value of (A+(B+C)) as follows.

So I want to confirm which modules are related to the order of calculation.

Regards,
hiro

Hi,

The implementation is included in the TensorRT runtime.
TensorRT engine only contains the serialized quantized data.

Could you share more about your use case?
Do you want to minimize the accuracy loss when inferring with fp16 mode?

Thanks.

Hi, AastaLLL,

We are considering the output validation of our program.
If the output values are only related to tensorRT engine, we can focus on the verification of tensorRT engine.

However, we understand that tensorRT runtime may be related to the output accuracy.

Regards,
hiro

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.