Firstly, I post this issue to TensorRT forum. However, I heard it is better to post Jetson forum from AakankashaS. So I moved this topic to Jetson forum.
Description
We know order of floating-point calculation may affect to the floating-point accuracy as follows.
When inferring with TensorRT, you will need to build the model (ex. ONNX) into the TensorRT engine.
At that point, you can choose which data format you want to convert to. For example, int8, fp32, and fp16.
If the same engine model is inferred, the output is expected to be similar.
Thanks.
The accuracy loss comes from data precision loss so it should be i).
If PTQ (post-training quantization) is used, the accuracy loss cannot be reduced.
But if using QAT (quantization-aware training), DNN can learn the possible accuracy loss and react.
We are considering the output validation of our program.
If the output values are only related to tensorRT engine, we can focus on the verification of tensorRT engine.
However, we understand that tensorRT runtime may be related to the output accuracy.