Hi,
Thanks for your patience.
When debugging further, we found TensorRT can generate correct output (compared to ONNXRuntime) when setting batchsize=1.
Could you check if this is a possible workaround for your use case?
$ polygraphy run ./ubt_20260104.onnx --onnxrt --trt --input-shapes render_input:[1,160,160,6] transf_input:[1,160,160,6] --load-inputs input_01.json --save-outputs out_b01_thor.json
[I] RUNNING | Command: /home/nvidia/bug_5781693/env/bin/polygraphy run ./ubt_20260104.onnx --onnxrt --trt --input-shapes render_input:[1,160,160,6] transf_input:[1,160,160,6] --load-inputs input_01.json --save-outputs out_b01_thor.json
[I] Loading input data from input_01.json
[I] TF32 is disabled by default. Turn on TF32 for better performance with minor accuracy differences.
[I] onnxrt-runner-N0-01/26/26-03:32:02 | Activating and starting inference
2026-01-26 03:32:02.197725798 [W:onnxruntime:Default, device_discovery.cc:164 DiscoverDevicesForPlatform] GPU device discovery failed: device_discovery.cc:89 ReadFileContents Failed to open file: "/sys/class/drm/card3/device/vendor"
[I] Creating ONNX-Runtime Inference Session with providers: ['CPUExecutionProvider']
[I] onnxrt-runner-N0-01/26/26-03:32:02
---- Inference Input(s) ----
{render_input [dtype=float32, shape=(1, 160, 160, 6)],
transf_input [dtype=float32, shape=(1, 160, 160, 6)]}
[I] onnxrt-runner-N0-01/26/26-03:32:02
---- Inference Output(s) ----
{scores [dtype=float32, shape=(1, 1)]}
[I] onnxrt-runner-N0-01/26/26-03:32:02 | Completed 1 iteration(s) in 112.5 ms | Average inference time: 112.5 ms.
[I] trt-runner-N0-01/26/26-03:32:02 | Activating and starting inference
[I] Configuring with profiles:[
Profile 0:
{render_input [min=[1, 160, 160, 6], opt=[1, 160, 160, 6], max=[1, 160, 160, 6]],
transf_input [min=[1, 160, 160, 6], opt=[1, 160, 160, 6], max=[1, 160, 160, 6]]}
]
[W] profileSharing0806 is on by default in TensorRT 10.0. This flag is deprecated and has no effect.
[I] Building engine with configuration:
Flags | []
Engine Capability | EngineCapability.STANDARD
Memory Pools | [WORKSPACE: 125771.70 MiB, TACTIC_DRAM: 115847.00 MiB, TACTIC_SHARED_MEMORY: 1024.00 MiB]
Tactic Sources | [EDGE_MASK_CONVOLUTIONS, JIT_CONVOLUTIONS]
Profiling Verbosity | ProfilingVerbosity.DETAILED
Preview Features | [PROFILE_SHARING_0806]
[I] Finished engine building in 8.646 seconds
[I] trt-runner-N0-01/26/26-03:32:02
---- Inference Input(s) ----
{render_input [dtype=float32, shape=(1, 160, 160, 6)],
transf_input [dtype=float32, shape=(1, 160, 160, 6)]}
[I] trt-runner-N0-01/26/26-03:32:02
---- Inference Output(s) ----
{scores [dtype=float32, shape=(1, 1)]}
[I] trt-runner-N0-01/26/26-03:32:02 | Completed 1 iteration(s) in 7.963 ms | Average inference time: 7.963 ms.
[I] Saving inference results to out_b01_thor.json
[I] Accuracy Comparison | onnxrt-runner-N0-01/26/26-03:32:02 vs. trt-runner-N0-01/26/26-03:32:02
[I] Comparing Output: 'scores' (dtype=float32, shape=(1, 1)) with 'scores' (dtype=float32, shape=(1, 1))
[I] Tolerance: [abs=1e-05, rel=1e-05] | Checking elemwise error
[I] onnxrt-runner-N0-01/26/26-03:32:02: scores | Stats: mean=8.6679, std-dev=0, var=0, median=8.6679, min=8.6679 at (0, 0), max=8.6679 at (0, 0), avg-magnitude=8.6679, p90=8.6679, p95=8.6679, p99=8.6679
[I] trt-runner-N0-01/26/26-03:32:02: scores | Stats: mean=8.6679, std-dev=0, var=0, median=8.6679, min=8.6679 at (0, 0), max=8.6679 at (0, 0), avg-magnitude=8.6679, p90=8.6679, p95=8.6679, p99=8.6679
[I] Error Metrics: scores
[I] Minimum Required Tolerance: elemwise error | [abs=1.9073e-06] OR [rel=2.2005e-07] (requirements may be lower if both abs/rel tolerances are set)
[I] Absolute Difference | Stats: mean=1.9073e-06, std-dev=0, var=0, median=1.9073e-06, min=1.9073e-06 at (0, 0), max=1.9073e-06 at (0, 0), avg-magnitude=1.9073e-06, p90=1.9073e-06, p95=1.9073e-06, p99=1.9073e-06
[I] Relative Difference | Stats: mean=2.2005e-07, std-dev=0, var=0, median=2.2005e-07, min=2.2005e-07 at (0, 0), max=2.2005e-07 at (0, 0), avg-magnitude=2.2005e-07, p90=2.2005e-07, p95=2.2005e-07, p99=2.2005e-07
[I] PASSED | Output: 'scores' | Difference is within tolerance (rel=1e-05, abs=1e-05)
[I] PASSED | All outputs matched | Outputs: ['scores']
[I] Accuracy Summary | onnxrt-runner-N0-01/26/26-03:32:02 vs. trt-runner-N0-01/26/26-03:32:02 | Passed: 1/1 iterations | Pass Rate: 100.0%
[I] PASSED | Runtime: 11.781s | Command: /home/nvidia/bug_5781693/env/bin/polygraphy run ./ubt_20260104.onnx --onnxrt --trt --input-shapes render_input:[1,160,160,6] transf_input:[1,160,160,6] --load-inputs input_01.json --save-outputs out_b01_thor.json
Thanks.