I’m trying to convert a transformer in ONNX format to a TRT engine. When I convert the model in fp32 precision, everything is fine (the outputs of the onnx and trt engine are the same). But when I use fp16 precision, it gives me different results (uncomparable). I’ve stumbled across this issue on Github :
fp16 onnx -> fp16 tensorrt mismatched outputs · Issue #2336 · NVIDIA/TensorRT · GitHub. The problem seems very similar to mine as it seems that some nodes have different saturation values.
So my question is fairly simple, is it possible to force precision on certain types of ONNX nodes (in my case, put all Pow or ReduceMean to fp32). I know about the --layerPrecision option but i don’t think it responds to what i want to do exactly.
Thank you for your help !
TensorRT Version: 8.5.03
GPU Type: RTX A4000