A clear and concise description of the bug or issue.
TensorRT Version: 220.127.116.11
GPU Type: RTX2080
Nvidia Driver Version: 470.63.01
CUDA Version: 11.4
CUDNN Version: 8.5
Operating System + Version: Ubuntu 18.04
Python Version (if applicable): 1.7
PyTorch Version (if applicable): 1.9
I’m working on producing inference on a 8bit quantized model. I went through quantization process, exporting ONNX file. The ONNX runtime inference works correctly, while inference on a TensorRT engine of this ONNX file does not work correctly at all.
I also tried to compare between the ONNX and TesnorRT inference with Polygraph python API, however all outputs comparison return FAIL, while one output statistics comparison even return nans.
Here is my simple script in Polygraph:
build_onnxrt_session = SessionFromOnnx(’./quantized_detnet.onnx’)
engine = engine_from_bytes(bytes_from_path(’./quantized_detnet.eng"))
runners = [
run_results = Comparator.run(runners)
I also paid attention that the engine input and output positions were swapped in the engine comparing to the ONNX file (and original model)
I attach both ONNX model and engine files produced from it.