Inference result inaccurate with Conv and InstanceNormalization under certain conditions

bindog · February 7, 2020, 9:13am

Hi, as described in the title, we make three ONNX model to reproduce this issue, they only consist of Conv and InstanceNormalization Op.

demo_conv_in.onnx(Dropbox - demo_conv_in.onnx - Simplify your life)
demo_conv_10x_in.onnx(Dropbox - demo_conv_10x_in.onnx - Simplify your life)
demo_conv.onnx(Dropbox - demo_conv.onnx - Simplify your life)

demo_conv_in.onnx consist of a Conv op and an InstanceNormalization op; demo_conv_10x_in.onnx consist of a Conv op and an InstanceNormalization op, and the weight of its Conv op is 10 times larger than the first model, the rest of it is same as the first model; demo_conv.onnx only has a Conv op, and its weight is same as the first model

Then we use the same way to convert them to trt engine file, and compare their inference result with the original onnx file, we use the onnxruntime(CPU version) to do the inference

and the result as follows

demo_conv_in.onnx and its converted trt engine, error is around 1e-3
demo_conv_10x_in.onnx and its converted trt engine, error is around 1e-5
demo_conv.onnx and its converted trt engine, error is around 1e-8

and we think the error around 1e-5 is acceptable, but 1e-3 is not acceptable, cause our model is not a simple classification model

The result of third model indicates that this issue is not caused by Conv op, but the only difference between the first model and second model is their weight in Conv op. So we guess there may be some fusion tricks when Conv op is followed by an InstanceNormalization op?(According to the OSS, we find the InstanceNormalization is actually implemented by BatchNormalizationTrainingForward in cudnn)

Environment settings

TensorRT 7.0 (including the libs compiling from newest OSS)
CUDA 10.0
CUDNN 7.6.3

and the code we use to inference onnx model

import onnxruntime
import cv2
import numpy as np

img_path = "test_image.jpg"
image = cv2.imread(img_path)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
image = cv2.resize(image, (128, 256)).astype(np.float32)
image = image / 255.0
img_data = np.expand_dims(image, 0)
img_data = np.transpose(img_data, [0, 3, 1, 2])

model_path = "demo_conv_in.onnx"
session_option = onnxruntime.SessionOptions()
session_option.log_severity_level = 4
model = onnxruntime.InferenceSession(model_path, sess_options=session_option)
ort_inputs_name = model.get_inputs()[0].name
ort_ouputs_names = [out.name for out in model.get_outputs()]

ort_outs = model.run(ort_ouputs_names, {ort_inputs_name: img_data.astype('float32')})
outputs = np.array(ort_outs[0]).astype("float32")
print(outputs)

code for onnx model convert to trt engine

import logging
import tensorrt as trt

os.environ["CUDA_VISIBLE_DEVICES"] = "0"
TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
EXPLICIT_BATCH = 1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
def GiB(val):
    return val * 1 << 30

onnx_file_path = "demo_conv_in.onnx"
engine_file_path = "demo_conv_in.engine"
max_workspace_size = GiB(2);

with trt.Builder(TRT_LOGGER) as builder, builder.create_network(EXPLICIT_BATCH) as network, builder.create_builder_config() as builder_config, trt.OnnxParser(network, TRT_LOGGER) as parser:
    builder.max_batch_size = 1
    builder.max_workspace_size = max_workspace_size
    with open(onnx_file_path, 'rb') as model:
        logging.info("Beginning ONNX file parsing")
        parser.parse(model.read())
    engine = builder.build_cuda_engine(network)
    if engine is None:
        exit()
    with open(engine_file_path, "wb") as f:
        f.write(engine.serialize())

code for trt engine inference

import tensorrt as trt
import common
import cv2
import numpy as np

TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
trt.init_libnvinfer_plugins(TRT_LOGGER, "")

img_path = "test_image.jpg"
image = cv2.imread(img_path)
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
image = cv2.resize(image, (128, 256)).astype(np.float32)
image = image / 255.0
img_data = np.expand_dims(image, 0)
img_data = np.transpose(img_data, [0, 3, 1, 2])

model_path = "demo_conv_in.engine"

with open(model_path, "rb") as f, trt.Runtime(TRT_LOGGER) as runtime:
    engine = runtime.deserialize_cuda_engine(f.read())
    context = engine.create_execution_context()
    inputs, outputs, bindings, stream = common.allocate_buffers(engine)
    inputs[0].host = img_data.ravel()
    trt_outputs = common.do_inference_v2(
                                    context,
                                    bindings=bindings,
                                    inputs=inputs,
                                    outputs=outputs,
                                    stream=stream
                                )
    outputs = np.reshape(trt_outputs, (1, -1))
    print(outputs)

SunilJB · February 7, 2020, 10:02am

Hi,

Thanks for the minimal repro and model files!
We’ll look into this and update you.

Could you please share the test image and GPU details as well so we can help better?

Thanks

bindog · February 7, 2020, 11:27am

The GPU we use is GTX 1080 Ti, here is one test image we use, Dropbox - white.jpg - Simplify your life

and we also find for some other images, the error may be even bigger, like this one Dropbox - black_case.jpg - Simplify your life

cassiopeia.klein · February 20, 2020, 8:59am

Is there any new progress with this issue? I encountered the same problem.

bindog · February 23, 2020, 12:25pm

My colleague and I finally found the cause of this issue, it is actually not related to TensorRT, it is caused by onnx-tensorrt(GitHub - onnx/onnx-tensorrt: ONNX-TensorRT: TensorRT backend for ONNX) which I think is also maintained by NVIDIA? Please check this line of code:

https://github.com/onnx/onnx-tensorrt/blob/5dca8737851118f6ab8a33ea1f7bcb7c9f06caf5/builtin_op_importers.cpp#L1557

I’m curious why TensorRT only supports epsilon values >= 1e-4, is it supported now in TensorRT 7.0?

And for those who do not want to recompile the OSS, I wrote a simple tool to change the epsilon value in onnx model:

https://github.com/bindog/onnx-surgery/blob/c63c65743b30dedaae4fc05dd8f3959c61e27735/example.py#L43

SunilJB · April 17, 2020, 5:58am

Hi,

Fix has been submitted to OSS and will be included in next release.

Thanks

Topic		Replies	Views
InstanceNormalization produces nan,inf on TensorRT by my model TensorRT tensorrt , onnx	8	1236	July 10, 2023
No converter registered for op type: InstanceNormalization TensorRT	1	760	October 30, 2019
Onnx output differs largely to TRT engine output TensorRT	14	2015	February 25, 2023
crash when converting onnx ReID model to tensorrt TensorRT	14	2267	October 12, 2021
TensorRT small model high RAM consumption during inference problem Jetson Orin Nano tensorrt , cuda , cudnn , yocto , jetson	10	287	November 7, 2024
Output from ONNX inference and trt inference are different Jetson TX2 tensorrt , tensorflow , nvbugs	6	932	October 18, 2021
Differences between tensorflow model inference and tensorRT model inference TensorRT tensorrt , tensorflow	6	1992	September 14, 2022
InstanceNorm causes errors unless onnx-simplifier is used first TensorRT tensorrt , cudnn	3	278	June 25, 2024
Inference result gets worse when converting pytorch model to TensorRT model TensorRT pytorch	6	1271	January 19, 2022
tensorRT inference unstable compared onnxruntime TensorRT	4	1441	May 4, 2021

Inference result inaccurate with Conv and InstanceNormalization under certain conditions

Related topics