Output from ONNX inference and trt inference are different

I am having problem with converting onnx model to tensorrt. Specifically, the results when inferred by the two models are different.
Running onnx model code:

import onnxruntime as rt
import numpy as np
sess = rt.InferenceSession('efficientdet-d0_nwonly.onnx')
img=np.ones((1, 512, 512, 3),dtype=np.float32)
y=sess.run(['output_0', 'output_1', 'output_2', 'output_3', 'output_4', 'output_5', 'output_6', 'output_7', 'output_8', 'output_9'],{"images:0":img})

This gives non zero outputs.

However when running trt engine, the outputs are mostly 0.

trt engine code:
import numpy as np
import tensorrt as trt
import pycuda.autoinit
import pycuda.driver as cuda
import time, math


class TRTInference:
    def __init__(self, trt_engine_path, trt_engine_datatype, net_type='EfficientNet'):
        self.cfx = cuda.Device(0).make_context()
        stream = cuda.Stream()

        TRT_LOGGER = trt.Logger(trt.Logger.INFO)
        trt.init_libnvinfer_plugins(TRT_LOGGER, '')
        runtime = trt.Runtime(TRT_LOGGER)

        # deserialize engine
        with open(trt_engine_path, 'rb') as f:
            buf = f.read()
            engine = runtime.deserialize_cuda_engine(buf)
        context = engine.create_execution_context()

        # prepare buffer
        host_inputs  = []
        cuda_inputs  = []
        host_outputs = []
        cuda_outputs = []
        bindings = []

        for binding in engine:
            size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size
            host_mem = cuda.pagelocked_empty(size, np.float32)
            cuda_mem = cuda.mem_alloc(host_mem.nbytes)

            bindings.append(int(cuda_mem))
            if engine.binding_is_input(binding):
                host_inputs.append(host_mem)
                cuda_inputs.append(cuda_mem)
            else:
                host_outputs.append(host_mem)
                cuda_outputs.append(cuda_mem)

        # store
        self.stream  = stream
        self.context = context
        self.engine  = engine

        self.host_inputs = host_inputs
        self.cuda_inputs = cuda_inputs
        self.host_outputs = host_outputs
        self.cuda_outputs = cuda_outputs
        self.bindings = bindings
        self.net_type = net_type


    def infer(self, image,result):
        np.copyto(self.host_inputs[0], image.ravel())

        # inference
        start_time = time.time()
        cuda.memcpy_htod_async(self.cuda_inputs[0], self.host_inputs[0], self.stream)
        self.context.execute_async(bindings=self.bindings, stream_handle=self.stream.handle)
        cuda.memcpy_dtoh_async(self.host_outputs[0], self.cuda_outputs[0], self.stream)
        self.stream.synchronize()
        print(" execute times of "+ self.net_type +':' +str(time.time()-start_time)+'\n')        
        result[self.net_type] = self.host_outputs
     
trt_engine=TRTInference('efficientdet-d0_nwonly.trt',trt_engine_datatype=trt.DataType.FLOAT,net_type='EfficientDet')
result={}
trt_engine.infer(2*np.ones((1,512, 512,3)),result)

Attachments:
efficientdet-d0_nwonly_trt.zip
efficientdet-d0_nwonly_onnx.zip

TRT version: 7.1.3.4
What might be the reason?

Same as: Output from ONNX inference and trt inference are different · Issue #1194 · NVIDIA/TensorRT · GitHub

Hi,

Could you try to inference your model with trtexec to see if you can get a non-zero output first?

$ /usr/src/tensorrt/bin/trtexec --loadEngine=efficientdet-d0_nwonly.trt --dumpOutput

Thanks.

Hi,
It’s nonzero. What might be the reason my code above does not work for tensorrt?
Thanks

@AastaLLL
The results from onnxruntime and tensorrt are still different quite significantly and it does not relate to the bug in my above code. Please find below: polygraphy run efficientdet-d0_nwonly.onnx --trt --onnxrt

[I] Runner: trt-runner-N0-04/20/21-20:43:57          | Activating and starting inference
[TensorRT] WARNING: [TRT]/home/jenkins/workspace/OSS/L0_MergeRequest/oss/parsers/onnx/onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[I] Building engine with configuration: max_workspace_size=16777216 bytes (16.00 MB) | tf32=False, fp16=False, int8=False, strict_types=False | 1 profiles
[I] Runner: trt-runner-N0-04/20/21-20:43:57          | Completed 1 iterations.
[I] Runner: onnxrt-runner-N0-04/20/21-20:43:57       | Activating and starting inference
[I] Runner: onnxrt-runner-N0-04/20/21-20:43:57       | Completed 1 iterations.
[I] Accuracy Comparison | trt-runner-N0-04/20/21-20:43:57 vs. onnxrt-runner-N0-04/20/21-20:43:57
[I]     Comparing Output: 'output_4' (dtype=float32, shape=(1, 4, 4, 810)) with 'output_4' (dtype=float32, shape=(1, 4, 4, 810))
[I]         Required tolerances: [atol=0.87389] OR [rtol=1e-05, atol=0.87383] OR [rtol=0.20267, atol=1e-05]
        Runner: trt-runner-N0-04/20/21-20:43:57          | Stats: mean=-6.5328, min=-9.6361 at (0, 1, 2, 664), max=-3.5694 at (0, 2, 2, 54)
        Runner: onnxrt-runner-N0-04/20/21-20:43:57       | Stats: mean=-6.4176, min=-9.3779 at (0, 1, 2, 664), max=-3.2235 at (0, 1, 1, 324)
[E]         FAILED | Difference exceeds tolerance (rtol=1e-05, atol=1e-05)
[I]     Comparing Output: 'output_9' (dtype=float32, shape=(1, 4, 4, 36)) with 'output_9' (dtype=float32, shape=(1, 4, 4, 36))
[I]         Required tolerances: [atol=0.17882] OR [rtol=1e-05, atol=0.17882] OR [rtol=7.0958, atol=1e-05]
        Runner: trt-runner-N0-04/20/21-20:43:57          | Stats: mean=-0.091448, min=-0.54466 at (0, 2, 2, 19), max=0.32343 at (0, 0, 0, 0)
        Runner: onnxrt-runner-N0-04/20/21-20:43:57       | Stats: mean=-0.096405, min=-0.56071 at (0, 2, 2, 31), max=0.30159 at (0, 0, 0, 0)
[E]         FAILED | Difference exceeds tolerance (rtol=1e-05, atol=1e-05)
[I]     Comparing Output: 'output_3' (dtype=float32, shape=(1, 8, 8, 810)) with 'output_3' (dtype=float32, shape=(1, 8, 8, 810))
[I]         Required tolerances: [atol=0.68331] OR [rtol=1e-05, atol=0.68324] OR [rtol=0.17018, atol=1e-05]
        Runner: trt-runner-N0-04/20/21-20:43:57          | Stats: mean=-6.8013, min=-9.7306 at (0, 3, 1, 162), max=-2.9203 at (0, 4, 3, 594)
        Runner: onnxrt-runner-N0-04/20/21-20:43:57       | Stats: mean=-6.8167, min=-9.4594 at (0, 3, 3, 162), max=-2.7687 at (0, 4, 4, 594)
[E]         FAILED | Difference exceeds tolerance (rtol=1e-05, atol=1e-05)
[I]     Comparing Output: 'output_8' (dtype=float32, shape=(1, 8, 8, 36)) with 'output_8' (dtype=float32, shape=(1, 8, 8, 36))
[I]         Required tolerances: [atol=0.20207] OR [rtol=1e-05, atol=0.20206] OR [rtol=119.12, atol=1e-05]
        Runner: trt-runner-N0-04/20/21-20:43:57          | Stats: mean=-0.0084417, min=-0.42068 at (0, 2, 0, 3), max=0.49135 at (0, 4, 4, 30)
        Runner: onnxrt-runner-N0-04/20/21-20:43:57       | Stats: mean=-0.0094595, min=-0.42487 at (0, 3, 7, 11), max=0.54484 at (0, 4, 4, 30)
[E]         FAILED | Difference exceeds tolerance (rtol=1e-05, atol=1e-05)
[I]     Comparing Output: 'output_2' (dtype=float32, shape=(1, 16, 16, 810)) with 'output_2' (dtype=float32, shape=(1, 16, 16, 810))
[I]         Required tolerances: [atol=0.92894] OR [rtol=1e-05, atol=0.92887] OR [rtol=0.16824, atol=1e-05]
        Runner: trt-runner-N0-04/20/21-20:43:57          | Stats: mean=-7.1851, min=-10.13 at (0, 7, 6, 72), max=-4.2356 at (0, 11, 5, 594)
        Runner: onnxrt-runner-N0-04/20/21-20:43:57       | Stats: mean=-7.1618, min=-9.8995 at (0, 6, 5, 72), max=-4.3057 at (0, 10, 5, 594)
[E]         FAILED | Difference exceeds tolerance (rtol=1e-05, atol=1e-05)
[I]     Comparing Output: 'output_7' (dtype=float32, shape=(1, 16, 16, 36)) with 'output_7' (dtype=float32, shape=(1, 16, 16, 36))
[I]         Required tolerances: [atol=0.40775] OR [rtol=1e-05, atol=0.40774] OR [rtol=14476, atol=1e-05]
        Runner: trt-runner-N0-04/20/21-20:43:57          | Stats: mean=-0.0096572, min=-1.2089 at (0, 0, 0, 3), max=1.4409 at (0, 0, 0, 0)
        Runner: onnxrt-runner-N0-04/20/21-20:43:57       | Stats: mean=-0.011599, min=-0.80111 at (0, 0, 0, 3), max=1.0734 at (0, 0, 0, 0)
[E]         FAILED | Difference exceeds tolerance (rtol=1e-05, atol=1e-05)
[I]     Comparing Output: 'output_1' (dtype=float32, shape=(1, 32, 32, 810)) with 'output_1' (dtype=float32, shape=(1, 32, 32, 810))
[I]         Required tolerances: [atol=3.3337] OR [rtol=1e-05, atol=3.3336] OR [rtol=0.48781, atol=1e-05]
        Runner: trt-runner-N0-04/20/21-20:43:57          | Stats: mean=-7.4557, min=-11.968 at (0, 1, 1, 683), max=-4.2842 at (0, 28, 28, 603)
        Runner: onnxrt-runner-N0-04/20/21-20:43:57       | Stats: mean=-7.4016, min=-10.791 at (0, 15, 21, 72), max=-4.0198 at (0, 29, 29, 291)
[E]         FAILED | Difference exceeds tolerance (rtol=1e-05, atol=1e-05)
[I]     Comparing Output: 'output_6' (dtype=float32, shape=(1, 32, 32, 36)) with 'output_6' (dtype=float32, shape=(1, 32, 32, 36))
[I]         Required tolerances: [atol=1.6836] OR [rtol=1e-05, atol=1.6835] OR [rtol=8588.9, atol=1e-05]
        Runner: trt-runner-N0-04/20/21-20:43:57          | Stats: mean=0.0020516, min=-5.0281 at (0, 0, 0, 3), max=5.0922 at (0, 0, 0, 0)
        Runner: onnxrt-runner-N0-04/20/21-20:43:57       | Stats: mean=-0.00058793, min=-3.3446 at (0, 0, 0, 3), max=3.5692 at (0, 0, 0, 0)
[E]         FAILED | Difference exceeds tolerance (rtol=1e-05, atol=1e-05)
[I]     Comparing Output: 'output_0' (dtype=float32, shape=(1, 64, 64, 810)) with 'output_0' (dtype=float32, shape=(1, 64, 64, 810))
[I]         Required tolerances: [atol=4.199] OR [rtol=1e-05, atol=4.1989] OR [rtol=0.53565, atol=1e-05]
        Runner: trt-runner-N0-04/20/21-20:43:57          | Stats: mean=-7.9652, min=-13.541 at (0, 4, 1, 702), max=-2.5416 at (0, 0, 0, 0)
        Runner: onnxrt-runner-N0-04/20/21-20:43:57       | Stats: mean=-7.9062, min=-13.093 at (0, 1, 1, 702), max=-2.4043 at (0, 0, 0, 0)
[E]         FAILED | Difference exceeds tolerance (rtol=1e-05, atol=1e-05)
[I]     Comparing Output: 'output_5' (dtype=float32, shape=(1, 64, 64, 36)) with 'output_5' (dtype=float32, shape=(1, 64, 64, 36))
[I]         Required tolerances: [atol=11.113] OR [rtol=1e-05, atol=11.113] OR [rtol=21514, atol=1e-05]
        Runner: trt-runner-N0-04/20/21-20:43:57          | Stats: mean=-0.0075114, min=-11.986 at (0, 1, 0, 2), max=10.978 at (0, 0, 0, 1)
        Runner: onnxrt-runner-N0-04/20/21-20:43:57       | Stats: mean=-0.0072811, min=-19.534 at (0, 0, 0, 2), max=13.411 at (0, 1, 0, 1)
[E]         FAILED | Difference exceeds tolerance (rtol=1e-05, atol=1e-05)
[E]     FAILED | Mismatched outputs: ['output_4', 'output_9', 'output_3', 'output_8', 'output_2', 'output_7', 'output_1', 'output_6', 'output_0', 'output_5']
[E] FAILED | Command: /usr/local/bin/polygraphy run efficientdet-d0_nwonly.onnx --trt --onnxrt

Hi,

Thanks for the testing.

We are checking this internally.
Will share more information with you later.

Hi,

We confirm that this issue is fixed in our next release.
Will let you know once the new software is available.

Thanks.