A weird bug: Two similar onnx but one engine with bug

lqs1 · February 26, 2023, 11:50am

Description

I have two simple onnx files, say o1.onnx and o3.onnx. o1.onnx is a subgraph of o3.onnx, where the only difference is that o3.onnx adds two more outputs than o1.onnx.

When I transform o1.onnx to trt engine, everything works fine. However, when I transform o3.onnx to trt engine, the engine outputs large error.

Environment

official docker container 22.12

Relevant Files

Related files: https://cloud.tsinghua.edu.cn/f/09c8c8a1d6a44fa0915a/?dl=1

Steps To Reproduce

import os
from polygraphy.backend.onnxrt import OnnxrtRunner
from polygraphy.backend.trt import TrtRunner
import numpy as np

feed_dict = {'input_0': np.load('bug.npy')}

BASE = 'o3'

import onnxruntime as ort
sess = ort.InferenceSession('{}.onnx'.format(BASE), providers=['CUDAExecutionProvider'])
with OnnxrtRunner(sess) as runner:
    outputs_ort = runner.infer(feed_dict)

import tensorrt as trt

TRT_LOGGER = trt.Logger()
trt.init_libnvinfer_plugins(TRT_LOGGER, '')

def load_engine(engine_file_path):
    assert os.path.exists(engine_file_path)
    print("Reading engine from file {}".format(engine_file_path))
    with open(engine_file_path, "rb") as f, trt.Runtime(TRT_LOGGER) as runtime:
        return runtime.deserialize_cuda_engine(f.read())

os.system('trtexec --onnx={}.onnx --saveEngine={}.trt --fp16 --buildOnly'.format(BASE, BASE))
engine = load_engine('{}.trt'.format(BASE))
with TrtRunner(engine) as runner:
    outputs_trt = runner.infer(feed_dict)

print('max error', np.abs(outputs_ort['output_0']-outputs_trt['output_0']).max())

When BASE='o1', the max error is just 9e-6, while when BASE='o3', the max error is 30+.

Moreover, the error only produced by my input npy file (which is the real input for my model). If I use polygraph run command, the output is normal.

NVES · February 26, 2023, 12:07pm

Hi,
Request you to share the ONNX model and the script if not shared already so that we can assist you better.
Alongside you can try few things:

validating your model with the below snippet

check_model.py

import sys
import onnx
filename = yourONNXmodel
model = onnx.load(filename)
onnx.checker.check_model(model).
2) Try running your model with trtexec command.

In case you are still facing issue, request you to share the trtexec “”–verbose"" log for further debugging
Thanks!

lqs1 · February 26, 2023, 12:19pm

related files are in this link: https://cloud.tsinghua.edu.cn/f/09c8c8a1d6a44fa0915a/?dl=1

lqs1 · February 27, 2023, 8:17am

Hi, I still face this issue. It’s werid because the downstream onnx node can affect upstream ones. Is there any bug with TensorRT?

Topic		Replies	Views
Different engines give different inference results when using the same onnx model and giving the same input TensorRT	4	974	December 31, 2023
Two machines with very similar SW stack but different GPUs generate different folded model using the Polygraphy tool on the same model onnx input TensorRT	7	820	June 22, 2022
Model onnx trt engine generation process report different results compared between two PCs TensorRT	9	1173	July 6, 2022
Tensorrt8.5 inference different with origin onnx model TensorRT	5	1418	January 23, 2023
ONNX Model and Tensorrt Engine gives different output for parseq model TensorRT onnx	4	1243	July 17, 2023
ONNX Model and Tensorrt Engine gives different output TensorRT tensorrt , onnx	4	763	March 21, 2023
ONNX to TRT Engine conversion Error TensorRT tensorrt	8	3733	May 25, 2022
TensorRT gives diffent results than ONNX and Pytorch TensorRT	8	1610	September 28, 2023
Outputs of tensorrt are too different according to the compute capabilities TensorRT	1	435	November 2, 2022
8bit quantized onnx file and its 8bit engine inference results differ TensorRT tensorrt	2	695	November 21, 2021