I use TensorRT to infer BEIT onnx model (opset 17),fp32 is OK on my GPU,but fp16 result is wrong. When i convert TensorRT model through onnn2trt C++ for fp16,it comes these warning:
[TRT] Warning: TensorRT encountered issues when converting weights between types and that could affect accuracy.
[TRT] Warning: - 73 weights are affected by this issue: Detected subnormal FP16 values.
[TRT] Warning: - 47 weights are affected by this issue: Detected values less than smallest positive FP16 subnormal value and converted them to the FP16 minimum subnormalized value.
Actually my own engine infer fp16 BEIT, accuracy is in Reasonable scope.
So I wondered is this incorrect in TensorRT fp16 related with the warnings? or it has some overflow in some optimized kernel like: fusion layers?
A clear and concise description of the bug or issue.
Environment
TensorRT Version: 8.5.1.7 GPU Type: RTX 4070 laptop Nvidia Driver Version: 536.25 CUDA Version: 11.8 CUDNN Version: 8.9.1 Operating System + Version: Windows 11 Python Version (if applicable): TensorFlow Version (if applicable): PyTorch Version (if applicable): Baremetal or Container (if container which image + tag):
We recommend that you please try on the latest TensorRT version 8.6.1.
If you still face the same issue, please share the issue repro ONNX model and complete verbose logs for better debugging.
@spolisetty
Plus, Is there any possible related API to help us trade-off accuracy and performance?Cause it is so so so hard to debug when there comes accuracy error.
for example:
As my imagination, could I do like step3~step5?
//1. ..... some init & config set
// 2. build model
nvinfer1::IHostMemory *serializedModel = builder->buildSerializedNetwork(*network, *config);
// 3. then I want to get the warning log about affected fp16 weights layers in buildSerializedNetwork.
... some API called
// 4. according to the warning I get in step 3, then I changed affected fp16 weights layers to fp32.
... some API called
// 5. finally, rebuild TRT model.
.... some API called
We think this level of difference is normal for FP16 networks. If you face any accuracy issues with FP16 in your real application, could you please provide minimal issue repro steps and scripts for better debugging?