Hi, as described in the title, we make three ONNX model to reproduce this issue, they only consist of Conv and InstanceNormalization Op.
- demo_conv_in.onnx(https://www.dropbox.com/s/s8zplpn3106m22t/demo_conv_in.onnx?dl=0)
- demo_conv_10x_in.onnx(https://www.dropbox.com/s/2rxjoa2zvtjoixm/demo_conv_10x_in.onnx?dl=0)
- demo_conv.onnx(https://www.dropbox.com/s/jvebtt96vc557hl/demo_conv.onnx?dl=0)
demo_conv_in.onnx consist of a Conv op and an InstanceNormalization op; demo_conv_10x_in.onnx consist of a Conv op and an InstanceNormalization op, and the weight of its Conv op is 10 times larger than the first model, the rest of it is same as the first model; demo_conv.onnx only has a Conv op, and its weight is same as the first model
Then we use the same way to convert them to trt engine file, and compare their inference result with the original onnx file, we use the onnxruntime(CPU version) to do the inference
and the result as follows
- demo_conv_in.onnx and its converted trt engine, error is around 1e-3
- demo_conv_10x_in.onnx and its converted trt engine, error is around 1e-5
- demo_conv.onnx and its converted trt engine, error is around 1e-8
and we think the error around 1e-5 is acceptable, but 1e-3 is not acceptable, cause our model is not a simple classification model
The result of third model indicates that this issue is not caused by Conv op, but the only difference between the first model and second model is their weight in Conv op. So we guess there may be some fusion tricks when Conv op is followed by an InstanceNormalization op?(According to the OSS, we find the InstanceNormalization is actually implemented by BatchNormalizationTrainingForward in cudnn)
Environment settings
TensorRT 7.0 (including the libs compiling from newest OSS)
CUDA 10.0
CUDNN 7.6.3
and the code we use to inference onnx model
import onnxruntime
import cv2
import numpy as np
img_path = "test_image.jpg"
image = cv2.imread(img_path)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
image = cv2.resize(image, (128, 256)).astype(np.float32)
image = image / 255.0
img_data = np.expand_dims(image, 0)
img_data = np.transpose(img_data, [0, 3, 1, 2])
model_path = "demo_conv_in.onnx"
session_option = onnxruntime.SessionOptions()
session_option.log_severity_level = 4
model = onnxruntime.InferenceSession(model_path, sess_options=session_option)
ort_inputs_name = model.get_inputs()[0].name
ort_ouputs_names = [out.name for out in model.get_outputs()]
ort_outs = model.run(ort_ouputs_names, {ort_inputs_name: img_data.astype('float32')})
outputs = np.array(ort_outs[0]).astype("float32")
print(outputs)
code for onnx model convert to trt engine
import logging
import tensorrt as trt
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
EXPLICIT_BATCH = 1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
def GiB(val):
return val * 1 << 30
onnx_file_path = "demo_conv_in.onnx"
engine_file_path = "demo_conv_in.engine"
max_workspace_size = GiB(2);
with trt.Builder(TRT_LOGGER) as builder, builder.create_network(EXPLICIT_BATCH) as network, builder.create_builder_config() as builder_config, trt.OnnxParser(network, TRT_LOGGER) as parser:
builder.max_batch_size = 1
builder.max_workspace_size = max_workspace_size
with open(onnx_file_path, 'rb') as model:
logging.info("Beginning ONNX file parsing")
parser.parse(model.read())
engine = builder.build_cuda_engine(network)
if engine is None:
exit()
with open(engine_file_path, "wb") as f:
f.write(engine.serialize())
code for trt engine inference
import tensorrt as trt
import common
import cv2
import numpy as np
TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
trt.init_libnvinfer_plugins(TRT_LOGGER, "")
img_path = "test_image.jpg"
image = cv2.imread(img_path)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
image = cv2.resize(image, (128, 256)).astype(np.float32)
image = image / 255.0
img_data = np.expand_dims(image, 0)
img_data = np.transpose(img_data, [0, 3, 1, 2])
model_path = "demo_conv_in.engine"
with open(model_path, "rb") as f, trt.Runtime(TRT_LOGGER) as runtime:
engine = runtime.deserialize_cuda_engine(f.read())
context = engine.create_execution_context()
inputs, outputs, bindings, stream = common.allocate_buffers(engine)
inputs[0].host = img_data.ravel()
trt_outputs = common.do_inference_v2(
context,
bindings=bindings,
inputs=inputs,
outputs=outputs,
stream=stream
)
outputs = np.reshape(trt_outputs, (1, -1))
print(outputs)