Hi, the TRT team. When i inference a fp16 tensorrt engine in c++, i got the Nan output, but with the same c++ code and just replace the fp16 model with fp32 model, the output was correct. And, both the fp16 model and fp32 model can get correct outputj in python code. So, is there any change to do when i inference fp16 model in c++ code?
And i upload my fp32 onnx model named best.onnx, fp32 engine named best.engine, fp16 onnx model named best_fp16.onnx, fp16 engine named best_fp16.engine
Environment
TensorRT Version: 8.2.1.8 GPU Type: RTX3050 Nvidia Driver Version: 528.49 CUDA Version: 11.4 CUDNN Version: 8.2.2.26 Operating System + Version: Windows10 Python Version (if applicable): TensorFlow Version (if applicable): PyTorch Version (if applicable): 1.11.0 Baremetal or Container (if container which image + tag):
Relevant Files
Please attach or include links to any models, data, files, or scripts necessary to reproduce your issue. (Github repo, Google Drive, Dropbox, etc.) best.engine (13.2 MB) best.onnx (7.9 MB) best_fp16.engine (7.9 MB) best_fp16.onnx (4.0 MB)
Hi,
Can you try running your model with trtexec command, and share the “”–verbose"" log in case if the issue persist
You can refer below link for all the supported operators list, in case any operator is not supported you need to create a custom plugin to support that operation
Also, request you to share your model and script if not shared already so that we can help you better.
Meanwhile, for some common errors and queries please refer to below link:
And i tested another model yolov5s, it could get correct results in python both in fp32 and fp16, but it got Nan whether in fp32 or fp16. So i guess, maybe, the problem is in my preprocess code?
I had encountered the problem too, I have a tf model, when I use 8.5.1.7 + fp16, all score is ok, but when I update trt to 8.6.1 + fp16, all score nan.
My problem solved, onnx + fp32 input for 8.6.1 +fp16 has problem, but it is ok for 8.5.1. So I convert onnx to fp16 before feed to trt, it run correct.
My onnx was fp16 and after converted it to fp16 engine, the ouput was Nan in c++,but the ouput was correct in python. And i tried TensorRT 8.6.1 and 8.5.1, things didn’t get better. My onnx was from the yolov5/export.py.
When using FP16 to represent absolute values of ~1000, the accuracy loss is quite expected.
Our general advice is to either limit the maximum absolute value of the activations or explicitly set the precision of the affected layers to FP32.