Description
I have two simple onnx files, say o1.onnx
and o3.onnx
. o1.onnx
is a subgraph of o3.onnx
, where the only difference is that o3.onnx
adds two more outputs than o1.onnx
.
When I transform o1.onnx
to trt engine, everything works fine. However, when I transform o3.onnx
to trt engine, the engine outputs large error.
Environment
official docker container 22.12
Relevant Files
Related files: https://cloud.tsinghua.edu.cn/f/09c8c8a1d6a44fa0915a/?dl=1
Steps To Reproduce
import os
from polygraphy.backend.onnxrt import OnnxrtRunner
from polygraphy.backend.trt import TrtRunner
import numpy as np
feed_dict = {'input_0': np.load('bug.npy')}
BASE = 'o3'
import onnxruntime as ort
sess = ort.InferenceSession('{}.onnx'.format(BASE), providers=['CUDAExecutionProvider'])
with OnnxrtRunner(sess) as runner:
outputs_ort = runner.infer(feed_dict)
import tensorrt as trt
TRT_LOGGER = trt.Logger()
trt.init_libnvinfer_plugins(TRT_LOGGER, '')
def load_engine(engine_file_path):
assert os.path.exists(engine_file_path)
print("Reading engine from file {}".format(engine_file_path))
with open(engine_file_path, "rb") as f, trt.Runtime(TRT_LOGGER) as runtime:
return runtime.deserialize_cuda_engine(f.read())
os.system('trtexec --onnx={}.onnx --saveEngine={}.trt --fp16 --buildOnly'.format(BASE, BASE))
engine = load_engine('{}.trt'.format(BASE))
with TrtRunner(engine) as runner:
outputs_trt = runner.infer(feed_dict)
print('max error', np.abs(outputs_ort['output_0']-outputs_trt['output_0']).max())
When BASE='o1'
, the max error is just 9e-6, while when BASE='o3'
, the max error is 30+.
Moreover, the error only produced by my input npy file (which is the real input for my model). If I use polygraph run command, the output is normal.