TensorRT’s result is different between 1080ti and jetson tx2

Arleyzhang · January 20, 2019, 7:31pm

I implemented my ssd model at 1080ti and jetson tx2, but they have different result, although these changes have little impact on the final detection output, but I want to know why.

I printed some of the media layers’s statistical data: min, min_index, max, max_index, sum, mean, tss(sum of squares) and var, and I found something strange：

1080ti:
[mbox_conf]:
min:-6.643838(0.000000) min_index:174167.000000(0.000000) max:16.168436(0.000000) max_index:140217.000000(0.000000) sum:-708.053076(-0.000049) mean:-0.003861(-0.000000) tss:1259940.186802(-0.002180) var:6.870952(-0.000000)

jetson TX2 ：
[mbox_conf]:
min:-6.640625(0.000000) min_index:174167.000000(0.000000) max:16.162971(0.000000) max_index:140217.000000(0.000000) sum:-709.651303(0.000066) mean:-0.003870(0.000000) tss:1259667.312911(0.002610) var:6.869464(0.000000)

The numbers in parentheses represent the difference between this time and last time I run the same pirture at the same paltform and these two models’ precision are all FP32. The 1080ti and jetson tx2 have slight difference and any two inference operation will result in slight difference too, the diff(the numbers in parentheses) is only in sum and tss. And I found that these slight changes began at conv4_3.

Is there a generation of random numbers during the inference? Or this is the cumulative error of CUDA bottom numerical calculation？

Is there a statement about tensorrt cut the precision of some heavy operations, such as cutting 32-bit to 16-bit, to reduce the computation cost during the inference. And then after getting 16-bit results, it restores them to 32-bit by filling some random numbers？

And the optimization of tensorrt is hardware-dependent or not?

NVES · January 24, 2019, 9:28pm

Hello,

Optimization of tensorrt is GPU-dependent. A generated TensorRT engine is valid for a specific GPU — more precisely, a specific CUDA Compute Capability. For example, if you generate a PLAN for an NVIDIA P4 (compute capability 6.1) you can’t use that PLAN on an NVIDIA Tesla V100 (compute capability 7.0).

Topic		Replies	Views
Same tensorRT code get different result TensorRT	10	2341	July 23, 2019
model accuracy penalty with tensorRT on jetson TX2 TensorRT	0	871	June 7, 2019
Is TensorRT “floating-point 16 precision mode” non-deterministic on Jetson TX2? Jetson TX2	5	1619	August 6, 2019
A little error in accuracy of the tensorrt infer results with different gpus TensorRT tensorrt	1	549	March 14, 2022
TensorRT model performance on desktop gpu Jetson TX2 tensorrt	3	514	May 4, 2021
Question about TensorRT reproducibility on different architectures TensorRT	2	1055	September 16, 2021
Onnx -> TensorRT. No speed difference between models TensorRT	1	572	June 24, 2021
Performance difference between Jetpack and TensorRT versions Jetson Nano tensorrt , jetson-inference	6	626	May 26, 2023
TensorRT model accuracy on different GPUs TensorRT	3	2030	October 3, 2018
Decrease in accuracy of TensorRt model on jetson TX2 Jetson TX2 tensorrt	3	506	August 26, 2020

TensorRT&rsquo;s result is different between 1080ti and jetson tx2

Related topics

TensorRT’s result is different between 1080ti and jetson tx2