Output from ONNX inference and trt inference are different

hoangtm.fami · April 17, 2021, 8:57pm

I am having problem with converting onnx model to tensorrt. Specifically, the results when inferred by the two models are different.
Running onnx model code:

import onnxruntime as rt
import numpy as np
sess = rt.InferenceSession('efficientdet-d0_nwonly.onnx')
img=np.ones((1, 512, 512, 3),dtype=np.float32)
y=sess.run(['output_0', 'output_1', 'output_2', 'output_3', 'output_4', 'output_5', 'output_6', 'output_7', 'output_8', 'output_9'],{"images:0":img})

This gives non zero outputs.

However when running trt engine, the outputs are mostly 0.

trt engine code:
import numpy as np
import tensorrt as trt
import pycuda.autoinit
import pycuda.driver as cuda
import time, math


class TRTInference:
    def __init__(self, trt_engine_path, trt_engine_datatype, net_type='EfficientNet'):
        self.cfx = cuda.Device(0).make_context()
        stream = cuda.Stream()

        TRT_LOGGER = trt.Logger(trt.Logger.INFO)
        trt.init_libnvinfer_plugins(TRT_LOGGER, '')
        runtime = trt.Runtime(TRT_LOGGER)

        # deserialize engine
        with open(trt_engine_path, 'rb') as f:
            buf = f.read()
            engine = runtime.deserialize_cuda_engine(buf)
        context = engine.create_execution_context()

        # prepare buffer
        host_inputs  = []
        cuda_inputs  = []
        host_outputs = []
        cuda_outputs = []
        bindings = []

        for binding in engine:
            size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size
            host_mem = cuda.pagelocked_empty(size, np.float32)
            cuda_mem = cuda.mem_alloc(host_mem.nbytes)

            bindings.append(int(cuda_mem))
            if engine.binding_is_input(binding):
                host_inputs.append(host_mem)
                cuda_inputs.append(cuda_mem)
            else:
                host_outputs.append(host_mem)
                cuda_outputs.append(cuda_mem)

        # store
        self.stream  = stream
        self.context = context
        self.engine  = engine

        self.host_inputs = host_inputs
        self.cuda_inputs = cuda_inputs
        self.host_outputs = host_outputs
        self.cuda_outputs = cuda_outputs
        self.bindings = bindings
        self.net_type = net_type


    def infer(self, image,result):
        np.copyto(self.host_inputs[0], image.ravel())

        # inference
        start_time = time.time()
        cuda.memcpy_htod_async(self.cuda_inputs[0], self.host_inputs[0], self.stream)
        self.context.execute_async(bindings=self.bindings, stream_handle=self.stream.handle)
        cuda.memcpy_dtoh_async(self.host_outputs[0], self.cuda_outputs[0], self.stream)
        self.stream.synchronize()
        print(" execute times of "+ self.net_type +':' +str(time.time()-start_time)+'\n')        
        result[self.net_type] = self.host_outputs
     
trt_engine=TRTInference('efficientdet-d0_nwonly.trt',trt_engine_datatype=trt.DataType.FLOAT,net_type='EfficientDet')
result={}
trt_engine.infer(2*np.ones((1,512, 512,3)),result)

Attachments:
efficientdet-d0_nwonly_trt.zip
efficientdet-d0_nwonly_onnx.zip

TRT version: 7.1.3.4
What might be the reason?

Same as: Output from ONNX inference and trt inference are different · Issue #1194 · NVIDIA/TensorRT · GitHub

AastaLLL · April 19, 2021, 3:37am

Hi,

Could you try to inference your model with trtexec to see if you can get a non-zero output first?

$ /usr/src/tensorrt/bin/trtexec --loadEngine=efficientdet-d0_nwonly.trt --dumpOutput

Thanks.

hoangtm.fami · April 19, 2021, 6:12am

Hi,
It’s nonzero. What might be the reason my code above does not work for tensorrt?
Thanks

hoangtm.fami · April 20, 2021, 6:57pm

@AastaLLL
The results from onnxruntime and tensorrt are still different quite significantly and it does not relate to the bug in my above code. Please find below: polygraphy run efficientdet-d0_nwonly.onnx --trt --onnxrt

[I] Runner: trt-runner-N0-04/20/21-20:43:57          | Activating and starting inference
[TensorRT] WARNING: [TRT]/home/jenkins/workspace/OSS/L0_MergeRequest/oss/parsers/onnx/onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[I] Building engine with configuration: max_workspace_size=16777216 bytes (16.00 MB) | tf32=False, fp16=False, int8=False, strict_types=False | 1 profiles
[I] Runner: trt-runner-N0-04/20/21-20:43:57          | Completed 1 iterations.
[I] Runner: onnxrt-runner-N0-04/20/21-20:43:57       | Activating and starting inference
[I] Runner: onnxrt-runner-N0-04/20/21-20:43:57       | Completed 1 iterations.
[I] Accuracy Comparison | trt-runner-N0-04/20/21-20:43:57 vs. onnxrt-runner-N0-04/20/21-20:43:57
[I]     Comparing Output: 'output_4' (dtype=float32, shape=(1, 4, 4, 810)) with 'output_4' (dtype=float32, shape=(1, 4, 4, 810))
[I]         Required tolerances: [atol=0.87389] OR [rtol=1e-05, atol=0.87383] OR [rtol=0.20267, atol=1e-05]
        Runner: trt-runner-N0-04/20/21-20:43:57          | Stats: mean=-6.5328, min=-9.6361 at (0, 1, 2, 664), max=-3.5694 at (0, 2, 2, 54)
        Runner: onnxrt-runner-N0-04/20/21-20:43:57       | Stats: mean=-6.4176, min=-9.3779 at (0, 1, 2, 664), max=-3.2235 at (0, 1, 1, 324)
[E]         FAILED | Difference exceeds tolerance (rtol=1e-05, atol=1e-05)
[I]     Comparing Output: 'output_9' (dtype=float32, shape=(1, 4, 4, 36)) with 'output_9' (dtype=float32, shape=(1, 4, 4, 36))
[I]         Required tolerances: [atol=0.17882] OR [rtol=1e-05, atol=0.17882] OR [rtol=7.0958, atol=1e-05]
        Runner: trt-runner-N0-04/20/21-20:43:57          | Stats: mean=-0.091448, min=-0.54466 at (0, 2, 2, 19), max=0.32343 at (0, 0, 0, 0)
        Runner: onnxrt-runner-N0-04/20/21-20:43:57       | Stats: mean=-0.096405, min=-0.56071 at (0, 2, 2, 31), max=0.30159 at (0, 0, 0, 0)
[E]         FAILED | Difference exceeds tolerance (rtol=1e-05, atol=1e-05)
[I]     Comparing Output: 'output_3' (dtype=float32, shape=(1, 8, 8, 810)) with 'output_3' (dtype=float32, shape=(1, 8, 8, 810))
[I]         Required tolerances: [atol=0.68331] OR [rtol=1e-05, atol=0.68324] OR [rtol=0.17018, atol=1e-05]
        Runner: trt-runner-N0-04/20/21-20:43:57          | Stats: mean=-6.8013, min=-9.7306 at (0, 3, 1, 162), max=-2.9203 at (0, 4, 3, 594)
        Runner: onnxrt-runner-N0-04/20/21-20:43:57       | Stats: mean=-6.8167, min=-9.4594 at (0, 3, 3, 162), max=-2.7687 at (0, 4, 4, 594)
[E]         FAILED | Difference exceeds tolerance (rtol=1e-05, atol=1e-05)
[I]     Comparing Output: 'output_8' (dtype=float32, shape=(1, 8, 8, 36)) with 'output_8' (dtype=float32, shape=(1, 8, 8, 36))
[I]         Required tolerances: [atol=0.20207] OR [rtol=1e-05, atol=0.20206] OR [rtol=119.12, atol=1e-05]
        Runner: trt-runner-N0-04/20/21-20:43:57          | Stats: mean=-0.0084417, min=-0.42068 at (0, 2, 0, 3), max=0.49135 at (0, 4, 4, 30)
        Runner: onnxrt-runner-N0-04/20/21-20:43:57       | Stats: mean=-0.0094595, min=-0.42487 at (0, 3, 7, 11), max=0.54484 at (0, 4, 4, 30)
[E]         FAILED | Difference exceeds tolerance (rtol=1e-05, atol=1e-05)
[I]     Comparing Output: 'output_2' (dtype=float32, shape=(1, 16, 16, 810)) with 'output_2' (dtype=float32, shape=(1, 16, 16, 810))
[I]         Required tolerances: [atol=0.92894] OR [rtol=1e-05, atol=0.92887] OR [rtol=0.16824, atol=1e-05]
        Runner: trt-runner-N0-04/20/21-20:43:57          | Stats: mean=-7.1851, min=-10.13 at (0, 7, 6, 72), max=-4.2356 at (0, 11, 5, 594)
        Runner: onnxrt-runner-N0-04/20/21-20:43:57       | Stats: mean=-7.1618, min=-9.8995 at (0, 6, 5, 72), max=-4.3057 at (0, 10, 5, 594)
[E]         FAILED | Difference exceeds tolerance (rtol=1e-05, atol=1e-05)
[I]     Comparing Output: 'output_7' (dtype=float32, shape=(1, 16, 16, 36)) with 'output_7' (dtype=float32, shape=(1, 16, 16, 36))
[I]         Required tolerances: [atol=0.40775] OR [rtol=1e-05, atol=0.40774] OR [rtol=14476, atol=1e-05]
        Runner: trt-runner-N0-04/20/21-20:43:57          | Stats: mean=-0.0096572, min=-1.2089 at (0, 0, 0, 3), max=1.4409 at (0, 0, 0, 0)
        Runner: onnxrt-runner-N0-04/20/21-20:43:57       | Stats: mean=-0.011599, min=-0.80111 at (0, 0, 0, 3), max=1.0734 at (0, 0, 0, 0)
[E]         FAILED | Difference exceeds tolerance (rtol=1e-05, atol=1e-05)
[I]     Comparing Output: 'output_1' (dtype=float32, shape=(1, 32, 32, 810)) with 'output_1' (dtype=float32, shape=(1, 32, 32, 810))
[I]         Required tolerances: [atol=3.3337] OR [rtol=1e-05, atol=3.3336] OR [rtol=0.48781, atol=1e-05]
        Runner: trt-runner-N0-04/20/21-20:43:57          | Stats: mean=-7.4557, min=-11.968 at (0, 1, 1, 683), max=-4.2842 at (0, 28, 28, 603)
        Runner: onnxrt-runner-N0-04/20/21-20:43:57       | Stats: mean=-7.4016, min=-10.791 at (0, 15, 21, 72), max=-4.0198 at (0, 29, 29, 291)
[E]         FAILED | Difference exceeds tolerance (rtol=1e-05, atol=1e-05)
[I]     Comparing Output: 'output_6' (dtype=float32, shape=(1, 32, 32, 36)) with 'output_6' (dtype=float32, shape=(1, 32, 32, 36))
[I]         Required tolerances: [atol=1.6836] OR [rtol=1e-05, atol=1.6835] OR [rtol=8588.9, atol=1e-05]
        Runner: trt-runner-N0-04/20/21-20:43:57          | Stats: mean=0.0020516, min=-5.0281 at (0, 0, 0, 3), max=5.0922 at (0, 0, 0, 0)
        Runner: onnxrt-runner-N0-04/20/21-20:43:57       | Stats: mean=-0.00058793, min=-3.3446 at (0, 0, 0, 3), max=3.5692 at (0, 0, 0, 0)
[E]         FAILED | Difference exceeds tolerance (rtol=1e-05, atol=1e-05)
[I]     Comparing Output: 'output_0' (dtype=float32, shape=(1, 64, 64, 810)) with 'output_0' (dtype=float32, shape=(1, 64, 64, 810))
[I]         Required tolerances: [atol=4.199] OR [rtol=1e-05, atol=4.1989] OR [rtol=0.53565, atol=1e-05]
        Runner: trt-runner-N0-04/20/21-20:43:57          | Stats: mean=-7.9652, min=-13.541 at (0, 4, 1, 702), max=-2.5416 at (0, 0, 0, 0)
        Runner: onnxrt-runner-N0-04/20/21-20:43:57       | Stats: mean=-7.9062, min=-13.093 at (0, 1, 1, 702), max=-2.4043 at (0, 0, 0, 0)
[E]         FAILED | Difference exceeds tolerance (rtol=1e-05, atol=1e-05)
[I]     Comparing Output: 'output_5' (dtype=float32, shape=(1, 64, 64, 36)) with 'output_5' (dtype=float32, shape=(1, 64, 64, 36))
[I]         Required tolerances: [atol=11.113] OR [rtol=1e-05, atol=11.113] OR [rtol=21514, atol=1e-05]
        Runner: trt-runner-N0-04/20/21-20:43:57          | Stats: mean=-0.0075114, min=-11.986 at (0, 1, 0, 2), max=10.978 at (0, 0, 0, 1)
        Runner: onnxrt-runner-N0-04/20/21-20:43:57       | Stats: mean=-0.0072811, min=-19.534 at (0, 0, 0, 2), max=13.411 at (0, 1, 0, 1)
[E]         FAILED | Difference exceeds tolerance (rtol=1e-05, atol=1e-05)
[E]     FAILED | Mismatched outputs: ['output_4', 'output_9', 'output_3', 'output_8', 'output_2', 'output_7', 'output_1', 'output_6', 'output_0', 'output_5']
[E] FAILED | Command: /usr/local/bin/polygraphy run efficientdet-d0_nwonly.onnx --trt --onnxrt

AastaLLL · April 23, 2021, 6:40am

Hi,

Thanks for the testing.

We are checking this internally.
Will share more information with you later.

AastaLLL · May 14, 2021, 4:43am

Hi,

We confirm that this issue is fixed in our next release.
Will let you know once the new software is available.

Thanks.

Topic		Replies	Views
ONNX Model and Tensorrt Engine gives different output TensorRT tensorrt , onnx	13	5649	June 29, 2022
tensorRT inference unstable compared onnxruntime TensorRT	4	1453	May 4, 2021
TensorRT gives diffent results than ONNX and Pytorch TensorRT	8	1830	September 28, 2023
TensorRT 10.1: Different inference results of onnxruntime and tensorrt TensorRT	2	241	August 21, 2024
BUG: Output TRT engine from trtexec has completely different inference than input model TensorRT tensorrt , debugging-and-troubleshooting	3	2355	January 4, 2022
Onnx output differs largely to TRT engine output TensorRT	14	2047	February 25, 2023
Incorrect inference results after converting from ONNX to TRT with trtexec TensorRT tensorrt , python , onnx	4	1686	December 9, 2022
Onnx -> tensorrt fp32 conversion performance degradation different outputs TensorRT tensorrt , pytorch , onnx	4	2239	November 29, 2022
Differences between tensorflow model inference and tensorRT model inference TensorRT tensorrt , tensorflow	6	2000	September 14, 2022
Inference result gets worse when converting pytorch model to TensorRT model TensorRT pytorch	6	1277	January 19, 2022

Output from ONNX inference and trt inference are different

Related topics