Inference Discrepancy Between TensorRT 10.13.2 (Thor) and 8.6.2 (Orin)

Hi NVIDIA Team,

I’m observing inconsistent inference results for the same ONNX model on two platforms:

  • Thor (TensorRT 10.13.2)

  • Orin (TensorRT 8.6.2)

Key Details:

  1. The input tensor is identical across both platforms (shape, dtype, values).

  2. ONNX Runtime inference matches Orin’s output but differs from Thor’s.

  3. Model link for repro: https://drive.google.com/file/d/1rNg2gJrOMgkCLsQTXj9bbensLPP5XpSn/view?usp=sharing

Request:
Could you help investigate what could cause this discrepancy?How can I systematically troubleshoot this?

Thank you for your expertise!

Best regards,

Hi,

Could you share more details about this issue?
Is the result generated from Thor correct?

The model might run with a different algorithm between Thor and Orin, so the output won’t be identical.

Thanks.

Hi,

Thank you for your prompt reply.

I’d like to clarify that the output of this model generated on Thor is indeed incorrect. To ensure a fair comparison, I’ve verified that the inference code running on both Thor and Orin is identical. Additionally, the TensorRT engines used on both platforms were generated separately from the same ONNX model, using the exact same conversion command on each device.

To further isolate the issue, I saved the input tensor (just before model inference) as a binary file from both platforms. The inputs are nearly identical.

Moreover, I implemented a local ONNX-based inference script using the same ONNX model and fed it the saved input tensor. The result from this ONNX reference implementation matches the output from Orin and is functionally correct, whereas the output from Thor deviates significantly and is incorrect.

For your convenience, I’m happy to provide:

1. The ONNX model,

2. A representative input binary file (with matching dimensions, randomly initialized for data privacy),

3. And the local ONNX inference script I used for validation.

https://drive.google.com/file/d/16jWhq8y68iYhvKGtqAvI0YPNu8aL-p05/view?usp=sharing

These should allow you to reproduce the discrepancy on your end. If you need any additional materials or information to help reproduce or diagnose the issue, please don’t hesitate to let me know—I’d be glad to assist.

Thank you again for your support—I look forward to your insights.

Best regards.

Hi,

Thanks for providing more details about this issue.

We try to reproduce the accuracy drop with polygraphy on the ONNX model attached (ubt_20251229.onnx).

However, we found the output is nan for both ONNXRuntime and TensorRT backends.
Could you help us check why the output is not valid?

$ polygraphy run ubt_20251229.onnx --onnxrt --trt --verbose
...
[I] Accuracy Comparison | onnxrt-runner-N0-01/05/26-05:25:14 vs. trt-runner-N0-01/05/26-05:25:14
[I]     Comparing Output: 'scores' (dtype=float32, shape=(1, 1)) with 'scores' (dtype=float32, shape=(1, 1))
[I]         Tolerance: [abs=1e-05, rel=1e-05] | Checking elemwise error
[I]         onnxrt-runner-N0-01/05/26-05:25:14: scores | Stats: mean=nan, std-dev=nan, var=nan, median=nan, min=nan at (0, 0), max=nan at (0, 0), avg-magnitude=nan, p90=nan, p95=nan, p99=nan
[I]             ---- Values ----
                    [[nan]]
[V]             Could not generate histogram. Note: Error was: supplied range of [nan, nan] is not finite
[I]             
[I]         trt-runner-N0-01/05/26-05:25:14: scores | Stats: mean=nan, std-dev=nan, var=nan, median=nan, min=nan at (0, 0), max=nan at (0, 0), avg-magnitude=nan, p90=nan, p95=nan, p99=nan
[I]             ---- Values ----
                    [[nan]]
[V]             Could not generate histogram. Note: Error was: supplied range of [nan, nan] is not finite
[I]             
[I]         Error Metrics: scores
[I]             Minimum Required Tolerance: elemwise error | [abs=nan] OR [rel=nan] (requirements may be lower if both abs/rel tolerances are set)
[I]             Absolute Difference | Stats: mean=nan, std-dev=nan, var=nan, median=nan, min=nan at (0, 0), max=nan at (0, 0), avg-magnitude=nan, p90=nan, p95=nan, p99=nan
[I]                 ---- Values ----
                        [[nan]]
[V]                 Could not generate histogram. Note: Error was: autodetected range of [nan, nan] is not finite
[I]                 
[I]             Relative Difference | Stats: mean=nan, std-dev=nan, var=nan, median=nan, min=nan at (0, 0), max=nan at (0, 0), avg-magnitude=nan, p90=nan, p95=nan, p99=nan
[I]                 ---- Values ----
                        [[nan]]
[V]                 Could not generate histogram. Note: Error was: autodetected range of [nan, nan] is not finite
[I]                 
[E]         FAILED | Output: 'scores' | Difference exceeds tolerance (rel=1e-05, abs=1e-05)
[E]     FAILED | Mismatched outputs: ['scores']
[E] Accuracy Summary | onnxrt-runner-N0-01/05/26-05:25:14 vs. trt-runner-N0-01/05/26-05:25:14 | Passed: 0/1 iterations | Pass Rate: 0.0%
[E] FAILED | Runtime: 12.508s | Command: /home/nvidia/topic_356086/env/bin/polygraphy run ubt_20251229.onnx --onnxrt --trt --verbose

The tool helps our internal team to check the issue.
It can be installed via the following command:

$ pip3 install polygraphy

Thanks.

Hi,

With the source attached on Jan 4, we can reproduce this issue internally.
We need to check this issue with our internal team and provide more information to you later.

Thanks.

Hi,

Thank you for confirming the issue and providing an update. We appreciate your team’s efforts in reproducing and investigating the problem.

Please feel free to reach out if any additional information or collaboration is needed from our side. We look forward to your further insights and resolution steps.

Thanks.

Hi,

Thanks a lot for your patience.

Our internal team needs more time for this issue.
We will keep you updated on any progress.

Thanks.

Hi,

Thank you for the update — I really appreciate your team’s continued attention to this matter.

I understand that these things can take time, and I truly value the effort you’re putting in. Please do keep me posted as things progress. I’m looking forward to hearing more updates from you soon.

Thanks.

Hi,

I hope you’re doing well. I’d like to check if there’s any update on this issue. Could you also confirm whether your team was able to reproduce the problem—specifically, that the engine’s inference results indeed don’t align with those from ONNX?

Thank you!

Hi,

We are able to reproduce this issue.
But unfortunately, our internal team doesn’t have the resources to check this issue in the upcoming release.

Will keep you updated on any progress.
Thanks.

Hi ,

Thank you for confirming the issue and for your proactive efforts in reproducing it.
We truly appreciate your team’s dedication to addressing this matter and look forward to hearing about your progress .

Thanks.

Hi,

Thanks for your patience.

When debugging further, we found TensorRT can generate correct output (compared to ONNXRuntime) when setting batchsize=1.
Could you check if this is a possible workaround for your use case?

$ polygraphy run ./ubt_20260104.onnx --onnxrt --trt --input-shapes render_input:[1,160,160,6] transf_input:[1,160,160,6] --load-inputs input_01.json --save-outputs out_b01_thor.json
[I] RUNNING | Command: /home/nvidia/bug_5781693/env/bin/polygraphy run ./ubt_20260104.onnx --onnxrt --trt --input-shapes render_input:[1,160,160,6] transf_input:[1,160,160,6] --load-inputs input_01.json --save-outputs out_b01_thor.json
[I] Loading input data from input_01.json
[I] TF32 is disabled by default. Turn on TF32 for better performance with minor accuracy differences.
[I] onnxrt-runner-N0-01/26/26-03:32:02  | Activating and starting inference
2026-01-26 03:32:02.197725798 [W:onnxruntime:Default, device_discovery.cc:164 DiscoverDevicesForPlatform] GPU device discovery failed: device_discovery.cc:89 ReadFileContents Failed to open file: "/sys/class/drm/card3/device/vendor"
[I] Creating ONNX-Runtime Inference Session with providers: ['CPUExecutionProvider']
[I] onnxrt-runner-N0-01/26/26-03:32:02 
    ---- Inference Input(s) ----
    {render_input [dtype=float32, shape=(1, 160, 160, 6)],
     transf_input [dtype=float32, shape=(1, 160, 160, 6)]}
[I] onnxrt-runner-N0-01/26/26-03:32:02 
    ---- Inference Output(s) ----
    {scores [dtype=float32, shape=(1, 1)]}
[I] onnxrt-runner-N0-01/26/26-03:32:02  | Completed 1 iteration(s) in 112.5 ms | Average inference time: 112.5 ms.
[I] trt-runner-N0-01/26/26-03:32:02     | Activating and starting inference
[I] Configuring with profiles:[
        Profile 0:
            {render_input [min=[1, 160, 160, 6], opt=[1, 160, 160, 6], max=[1, 160, 160, 6]],
             transf_input [min=[1, 160, 160, 6], opt=[1, 160, 160, 6], max=[1, 160, 160, 6]]}
    ]
[W] profileSharing0806 is on by default in TensorRT 10.0. This flag is deprecated and has no effect.
[I] Building engine with configuration:
    Flags                  | []
    Engine Capability      | EngineCapability.STANDARD
    Memory Pools           | [WORKSPACE: 125771.70 MiB, TACTIC_DRAM: 115847.00 MiB, TACTIC_SHARED_MEMORY: 1024.00 MiB]
    Tactic Sources         | [EDGE_MASK_CONVOLUTIONS, JIT_CONVOLUTIONS]
    Profiling Verbosity    | ProfilingVerbosity.DETAILED
    Preview Features       | [PROFILE_SHARING_0806]
[I] Finished engine building in 8.646 seconds
[I] trt-runner-N0-01/26/26-03:32:02    
    ---- Inference Input(s) ----
    {render_input [dtype=float32, shape=(1, 160, 160, 6)],
     transf_input [dtype=float32, shape=(1, 160, 160, 6)]}
[I] trt-runner-N0-01/26/26-03:32:02    
    ---- Inference Output(s) ----
    {scores [dtype=float32, shape=(1, 1)]}
[I] trt-runner-N0-01/26/26-03:32:02     | Completed 1 iteration(s) in 7.963 ms | Average inference time: 7.963 ms.
[I] Saving inference results to out_b01_thor.json
[I] Accuracy Comparison | onnxrt-runner-N0-01/26/26-03:32:02 vs. trt-runner-N0-01/26/26-03:32:02
[I]     Comparing Output: 'scores' (dtype=float32, shape=(1, 1)) with 'scores' (dtype=float32, shape=(1, 1))
[I]         Tolerance: [abs=1e-05, rel=1e-05] | Checking elemwise error
[I]         onnxrt-runner-N0-01/26/26-03:32:02: scores | Stats: mean=8.6679, std-dev=0, var=0, median=8.6679, min=8.6679 at (0, 0), max=8.6679 at (0, 0), avg-magnitude=8.6679, p90=8.6679, p95=8.6679, p99=8.6679
[I]         trt-runner-N0-01/26/26-03:32:02: scores | Stats: mean=8.6679, std-dev=0, var=0, median=8.6679, min=8.6679 at (0, 0), max=8.6679 at (0, 0), avg-magnitude=8.6679, p90=8.6679, p95=8.6679, p99=8.6679
[I]         Error Metrics: scores
[I]             Minimum Required Tolerance: elemwise error | [abs=1.9073e-06] OR [rel=2.2005e-07] (requirements may be lower if both abs/rel tolerances are set)
[I]             Absolute Difference | Stats: mean=1.9073e-06, std-dev=0, var=0, median=1.9073e-06, min=1.9073e-06 at (0, 0), max=1.9073e-06 at (0, 0), avg-magnitude=1.9073e-06, p90=1.9073e-06, p95=1.9073e-06, p99=1.9073e-06
[I]             Relative Difference | Stats: mean=2.2005e-07, std-dev=0, var=0, median=2.2005e-07, min=2.2005e-07 at (0, 0), max=2.2005e-07 at (0, 0), avg-magnitude=2.2005e-07, p90=2.2005e-07, p95=2.2005e-07, p99=2.2005e-07
[I]         PASSED | Output: 'scores' | Difference is within tolerance (rel=1e-05, abs=1e-05)
[I]     PASSED | All outputs matched | Outputs: ['scores']
[I] Accuracy Summary | onnxrt-runner-N0-01/26/26-03:32:02 vs. trt-runner-N0-01/26/26-03:32:02 | Passed: 1/1 iterations | Pass Rate: 100.0%
[I] PASSED | Runtime: 11.781s | Command: /home/nvidia/bug_5781693/env/bin/polygraphy run ./ubt_20260104.onnx --onnxrt --trt --input-shapes render_input:[1,160,160,6] transf_input:[1,160,160,6] --load-inputs input_01.json --save-outputs out_b01_thor.json

Thanks.

Hi,

Thank you for your follow-up.

I apologize for not following up on this forum thread recently. I have already communicated via other channels that setting batchsize=1 is not feasible for our use case. However, I have received the latest workaround suggestion. I will test this solution shortly and provide feedback on the results. I hope this resolves the issue. Thank you again for your ongoing support and dedication to this matter.

Best regards,

Issue resolved by offline support via another channel.