Output from ONNX inference and trt inference are different

I am having problem with converting onnx model to tensorrt. Specifically, the results when inferred by the two models are different.
Running onnx model code:

import onnxruntime as rt
import numpy as np
sess = rt.InferenceSession('efficientdet-d0_nwonly.onnx')
img=np.ones((1, 512, 512, 3),dtype=np.float32)
y=sess.run(['output_0', 'output_1', 'output_2', 'output_3', 'output_4', 'output_5', 'output_6', 'output_7', 'output_8', 'output_9'],{"images:0":img})

This gives non zero outputs.

However when running trt engine, the outputs are mostly 0.

trt engine code:
import numpy as np
import tensorrt as trt
import pycuda.autoinit
import pycuda.driver as cuda
import time, math


class TRTInference:
    def __init__(self, trt_engine_path, trt_engine_datatype, net_type='EfficientNet'):
        self.cfx = cuda.Device(0).make_context()
        stream = cuda.Stream()

        TRT_LOGGER = trt.Logger(trt.Logger.INFO)
        trt.init_libnvinfer_plugins(TRT_LOGGER, '')
        runtime = trt.Runtime(TRT_LOGGER)

        # deserialize engine
        with open(trt_engine_path, 'rb') as f:
            buf = f.read()
            engine = runtime.deserialize_cuda_engine(buf)
        context = engine.create_execution_context()

        # prepare buffer
        host_inputs  = []
        cuda_inputs  = []
        host_outputs = []
        cuda_outputs = []
        bindings = []

        for binding in engine:
            size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size
            host_mem = cuda.pagelocked_empty(size, np.float32)
            cuda_mem = cuda.mem_alloc(host_mem.nbytes)

            bindings.append(int(cuda_mem))
            if engine.binding_is_input(binding):
                host_inputs.append(host_mem)
                cuda_inputs.append(cuda_mem)
            else:
                host_outputs.append(host_mem)
                cuda_outputs.append(cuda_mem)

        # store
        self.stream  = stream
        self.context = context
        self.engine  = engine

        self.host_inputs = host_inputs
        self.cuda_inputs = cuda_inputs
        self.host_outputs = host_outputs
        self.cuda_outputs = cuda_outputs
        self.bindings = bindings
        self.net_type = net_type


    def infer(self, image,result):
        np.copyto(self.host_inputs[0], image.ravel())

        # inference
        start_time = time.time()
        cuda.memcpy_htod_async(self.cuda_inputs[0], self.host_inputs[0], self.stream)
        self.context.execute_async(bindings=self.bindings, stream_handle=self.stream.handle)
        cuda.memcpy_dtoh_async(self.host_outputs[0], self.cuda_outputs[0], self.stream)
        self.stream.synchronize()
        print(" execute times of "+ self.net_type +':' +str(time.time()-start_time)+'\n')        
        result[self.net_type] = self.host_outputs
     
trt_engine=TRTInference('efficientdet-d0_nwonly.trt',trt_engine_datatype=trt.DataType.FLOAT,net_type='EfficientDet')
result={}
trt_engine.infer(2*np.ones((1,512, 512,3)),result)

Attachments:
efficientdet-d0_nwonly_trt.zip
efficientdet-d0_nwonly_onnx.zip

TRT version: 7.1.3.4
What might be the reason?

Same as: Output from ONNX inference and trt inference are different · Issue #1194 · NVIDIA/TensorRT · GitHub

Hi,

Could you try to inference your model with trtexec to see if you can get a non-zero output first?

$ /usr/src/tensorrt/bin/trtexec --loadEngine=efficientdet-d0_nwonly.trt --dumpOutput

Thanks.

Hi,
It’s nonzero. What might be the reason my code above does not work for tensorrt?
Thanks

@AastaLLL
The results from onnxruntime and tensorrt are still different quite significantly and it does not relate to the bug in my above code. Please find below: polygraphy run efficientdet-d0_nwonly.onnx --trt --onnxrt

[I] Runner: trt-runner-N0-04/20/21-20:43:57          | Activating and starting inference
[TensorRT] WARNING: [TRT]/home/jenkins/workspace/OSS/L0_MergeRequest/oss/parsers/onnx/onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[I] Building engine with configuration: max_workspace_size=16777216 bytes (16.00 MB) | tf32=False, fp16=False, int8=False, strict_types=False | 1 profiles
[I] Runner: trt-runner-N0-04/20/21-20:43:57          | Completed 1 iterations.
[I] Runner: onnxrt-runner-N0-04/20/21-20:43:57       | Activating and starting inference
[I] Runner: onnxrt-runner-N0-04/20/21-20:43:57       | Completed 1 iterations.
[I] Accuracy Comparison | trt-runner-N0-04/20/21-20:43:57 vs. onnxrt-runner-N0-04/20/21-20:43:57
[I]     Comparing Output: 'output_4' (dtype=float32, shape=(1, 4, 4, 810)) with 'output_4' (dtype=float32, shape=(1, 4, 4, 810))
[I]         Required tolerances: [atol=0.87389] OR [rtol=1e-05, atol=0.87383] OR [rtol=0.20267, atol=1e-05]
        Runner: trt-runner-N0-04/20/21-20:43:57          | Stats: mean=-6.5328, min=-9.6361 at (0, 1, 2, 664), max=-3.5694 at (0, 2, 2, 54)
        Runner: onnxrt-runner-N0-04/20/21-20:43:57       | Stats: mean=-6.4176, min=-9.3779 at (0, 1, 2, 664), max=-3.2235 at (0, 1, 1, 324)
[E]         FAILED | Difference exceeds tolerance (rtol=1e-05, atol=1e-05)
[I]     Comparing Output: 'output_9' (dtype=float32, shape=(1, 4, 4, 36)) with 'output_9' (dtype=float32, shape=(1, 4, 4, 36))
[I]         Required tolerances: [atol=0.17882] OR [rtol=1e-05, atol=0.17882] OR [rtol=7.0958, atol=1e-05]
        Runner: trt-runner-N0-04/20/21-20:43:57          | Stats: mean=-0.091448, min=-0.54466 at (0, 2, 2, 19), max=0.32343 at (0, 0, 0, 0)
        Runner: onnxrt-runner-N0-04/20/21-20:43:57       | Stats: mean=-0.096405, min=-0.56071 at (0, 2, 2, 31), max=0.30159 at (0, 0, 0, 0)
[E]         FAILED | Difference exceeds tolerance (rtol=1e-05, atol=1e-05)
[I]     Comparing Output: 'output_3' (dtype=float32, shape=(1, 8, 8, 810)) with 'output_3' (dtype=float32, shape=(1, 8, 8, 810))
[I]         Required tolerances: [atol=0.68331] OR [rtol=1e-05, atol=0.68324] OR [rtol=0.17018, atol=1e-05]
        Runner: trt-runner-N0-04/20/21-20:43:57          | Stats: mean=-6.8013, min=-9.7306 at (0, 3, 1, 162), max=-2.9203 at (0, 4, 3, 594)
        Runner: onnxrt-runner-N0-04/20/21-20:43:57       | Stats: mean=-6.8167, min=-9.4594 at (0, 3, 3, 162), max=-2.7687 at (0, 4, 4, 594)
[E]         FAILED | Difference exceeds tolerance (rtol=1e-05, atol=1e-05)
[I]     Comparing Output: 'output_8' (dtype=float32, shape=(1, 8, 8, 36)) with 'output_8' (dtype=float32, shape=(1, 8, 8, 36))
[I]         Required tolerances: [atol=0.20207] OR [rtol=1e-05, atol=0.20206] OR [rtol=119.12, atol=1e-05]
        Runner: trt-runner-N0-04/20/21-20:43:57          | Stats: mean=-0.0084417, min=-0.42068 at (0, 2, 0, 3), max=0.49135 at (0, 4, 4, 30)
        Runner: onnxrt-runner-N0-04/20/21-20:43:57       | Stats: mean=-0.0094595, min=-0.42487 at (0, 3, 7, 11), max=0.54484 at (0, 4, 4, 30)
[E]         FAILED | Difference exceeds tolerance (rtol=1e-05, atol=1e-05)
[I]     Comparing Output: 'output_2' (dtype=float32, shape=(1, 16, 16, 810)) with 'output_2' (dtype=float32, shape=(1, 16, 16, 810))
[I]         Required tolerances: [atol=0.92894] OR [rtol=1e-05, atol=0.92887] OR [rtol=0.16824, atol=1e-05]
        Runner: trt-runner-N0-04/20/21-20:43:57          | Stats: mean=-7.1851, min=-10.13 at (0, 7, 6, 72), max=-4.2356 at (0, 11, 5, 594)
        Runner: onnxrt-runner-N0-04/20/21-20:43:57       | Stats: mean=-7.1618, min=-9.8995 at (0, 6, 5, 72), max=-4.3057 at (0, 10, 5, 594)
[E]         FAILED | Difference exceeds tolerance (rtol=1e-05, atol=1e-05)
[I]     Comparing Output: 'output_7' (dtype=float32, shape=(1, 16, 16, 36)) with 'output_7' (dtype=float32, shape=(1, 16, 16, 36))
[I]         Required tolerances: [atol=0.40775] OR [rtol=1e-05, atol=0.40774] OR [rtol=14476, atol=1e-05]
        Runner: trt-runner-N0-04/20/21-20:43:57          | Stats: mean=-0.0096572, min=-1.2089 at (0, 0, 0, 3), max=1.4409 at (0, 0, 0, 0)
        Runner: onnxrt-runner-N0-04/20/21-20:43:57       | Stats: mean=-0.011599, min=-0.80111 at (0, 0, 0, 3), max=1.0734 at (0, 0, 0, 0)
[E]         FAILED | Difference exceeds tolerance (rtol=1e-05, atol=1e-05)
[I]     Comparing Output: 'output_1' (dtype=float32, shape=(1, 32, 32, 810)) with 'output_1' (dtype=float32, shape=(1, 32, 32, 810))
[I]         Required tolerances: [atol=3.3337] OR [rtol=1e-05, atol=3.3336] OR [rtol=0.48781, atol=1e-05]
        Runner: trt-runner-N0-04/20/21-20:43:57          | Stats: mean=-7.4557, min=-11.968 at (0, 1, 1, 683), max=-4.2842 at (0, 28, 28, 603)
        Runner: onnxrt-runner-N0-04/20/21-20:43:57       | Stats: mean=-7.4016, min=-10.791 at (0, 15, 21, 72), max=-4.0198 at (0, 29, 29, 291)
[E]         FAILED | Difference exceeds tolerance (rtol=1e-05, atol=1e-05)
[I]     Comparing Output: 'output_6' (dtype=float32, shape=(1, 32, 32, 36)) with 'output_6' (dtype=float32, shape=(1, 32, 32, 36))
[I]         Required tolerances: [atol=1.6836] OR [rtol=1e-05, atol=1.6835] OR [rtol=8588.9, atol=1e-05]
        Runner: trt-runner-N0-04/20/21-20:43:57          | Stats: mean=0.0020516, min=-5.0281 at (0, 0, 0, 3), max=5.0922 at (0, 0, 0, 0)
        Runner: onnxrt-runner-N0-04/20/21-20:43:57       | Stats: mean=-0.00058793, min=-3.3446 at (0, 0, 0, 3), max=3.5692 at (0, 0, 0, 0)
[E]         FAILED | Difference exceeds tolerance (rtol=1e-05, atol=1e-05)
[I]     Comparing Output: 'output_0' (dtype=float32, shape=(1, 64, 64, 810)) with 'output_0' (dtype=float32, shape=(1, 64, 64, 810))
[I]         Required tolerances: [atol=4.199] OR [rtol=1e-05, atol=4.1989] OR [rtol=0.53565, atol=1e-05]
        Runner: trt-runner-N0-04/20/21-20:43:57          | Stats: mean=-7.9652, min=-13.541 at (0, 4, 1, 702), max=-2.5416 at (0, 0, 0, 0)
        Runner: onnxrt-runner-N0-04/20/21-20:43:57       | Stats: mean=-7.9062, min=-13.093 at (0, 1, 1, 702), max=-2.4043 at (0, 0, 0, 0)
[E]         FAILED | Difference exceeds tolerance (rtol=1e-05, atol=1e-05)
[I]     Comparing Output: 'output_5' (dtype=float32, shape=(1, 64, 64, 36)) with 'output_5' (dtype=float32, shape=(1, 64, 64, 36))
[I]         Required tolerances: [atol=11.113] OR [rtol=1e-05, atol=11.113] OR [rtol=21514, atol=1e-05]
        Runner: trt-runner-N0-04/20/21-20:43:57          | Stats: mean=-0.0075114, min=-11.986 at (0, 1, 0, 2), max=10.978 at (0, 0, 0, 1)
        Runner: onnxrt-runner-N0-04/20/21-20:43:57       | Stats: mean=-0.0072811, min=-19.534 at (0, 0, 0, 2), max=13.411 at (0, 1, 0, 1)
[E]         FAILED | Difference exceeds tolerance (rtol=1e-05, atol=1e-05)
[E]     FAILED | Mismatched outputs: ['output_4', 'output_9', 'output_3', 'output_8', 'output_2', 'output_7', 'output_1', 'output_6', 'output_0', 'output_5']
[E] FAILED | Command: /usr/local/bin/polygraphy run efficientdet-d0_nwonly.onnx --trt --onnxrt

Hi,

Thanks for the testing.

We are checking this internally.
Will share more information with you later.

1 Like

Hi,

We confirm that this issue is fixed in our next release.
Will let you know once the new software is available.

Thanks.