Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:
validating your model with the below snippet
check_model.py
import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.
In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!
I checked the model using check_model.py. I don’t see any error in ONNX model.
Following are the commands I used to get inference data:
For INT8 precision:
trtexec --useCudaGraph --loadEngine=bert_large_v1_1-int8.trt --int8 --shapes=segment_ids:1x384,input_mask:1x384,input_ids:1x384 --duration=300 --verbose
For FP16 precision:
trtexec --useCudaGraph --loadEngine=bert_large_v1_1-fp16.trt --fp16 --shapes=segment_ids:1x384,input_mask:1x384,input_ids:1x384 --duration=300 --verbose
For BEST precision:
trtexec --useCudaGraph --loadEngine=bert_large_v1_1-best.trt --best --shapes=segment_ids:1x384,input_mask:1x384,input_ids:1x384 --duration=300 --verbose
We do not support INT8-PTQ for the ONNX-BERT path yet.
To use ONNX-BERT with INT8, please use the QAT path (by explicitly inserting Q/DQ nodes).
Please refer,