TRTExec - Force precision on certain ONNX Op Nodes


I’m trying to convert a transformer in ONNX format to a TRT engine. When I convert the model in fp32 precision, everything is fine (the outputs of the onnx and trt engine are the same). But when I use fp16 precision, it gives me different results (uncomparable). I’ve stumbled across this issue on Github :
fp16 onnx -> fp16 tensorrt mismatched outputs · Issue #2336 · NVIDIA/TensorRT · GitHub. The problem seems very similar to mine as it seems that some nodes have different saturation values.
So my question is fairly simple, is it possible to force precision on certain types of ONNX nodes (in my case, put all Pow or ReduceMean to fp32). I know about the --layerPrecision option but i don’t think it responds to what i want to do exactly.
TensorRT Version: 8.5.03
GPU Type: RTX A4000

Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

  1. validating your model with the below snippet

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
2) Try running your model with trtexec command.

In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging