Is TensorRT inference deterministic/reproducibile?

daniel.widmann · November 19, 2020, 10:45am

So I was searching the net, but I still can’t find a clear answer to the question: is TensorRT inference deterministic?

I have seen that the developer guide has a section about determinism of the builder:

My question is about what happens afterwards:

If I use the same engine to do inference on the same data, will I always get the same (bit correct) results?
Are there perhaps some layers that are deterministic and others that are not? If so, is there a list somewhere available?
If I use the algorithm selector to make the builder deterministic, can I also reproduce the same inference results on two different GPUs?

AakankshaS · November 19, 2020, 11:10am

Hi @daniel.widmann,
If you are using same engine with same input, TensorRT should be deterministic.
However I don’t think engine building is supposed to be deterministic as tactics are chosen based on observed runtime. If you’re outputting your log with info level, you should be able to compare tactic selection between the two engines. Since different tactics/kernels could change order of operations, you would expect floating point differences.
You can refer to the below link.

Thanks!

daniel.widmann · November 24, 2020, 2:05pm

Thank you for your fast answer. It is very good to hear that TensorRT can be deterministic!

Regarding your example: It does indeed show a way to make the builder deterministic, as well, which is very interesting. In the example, custom algorithm selectors are provided to cache chosen tactics and to read them back in the next build. So, to come back to my third question ones more: Could I use this approach to cache my chosen tactics, then build the network on a different GPU using the cached tactics, and finally get a network that behaves the same on both GPUs? Or is it unavoidable that two different types of GPUs will always have small deviations in the network output?

AakankshaS · December 1, 2020, 5:43am

Hi @daniel.widmann,

I believe so.
Even with same tactic, the output may still have small differences across GPUs with different architecture. There is no reason in general to believe that a set of tactics valid on one GPU is valid on a different GPU.

Thanks!

daniel.widmann · December 1, 2020, 9:57am

Ok, I suppose that was to be expected. Thanks a lot for the information!

Topic		Replies	Views
Non-deterministic TensorRT engine building TensorRT tensorrt	3	678	March 10, 2021
Deterministic TensorRT optimization TensorRT tensorrt	8	833	October 12, 2021
Question about TensorRT reproducibility on different architectures TensorRT	3	999	October 12, 2021
Trtexec generates different engines when using the same platform/machine with the same onnx model TensorRT	3	1248	March 29, 2022
Is TensorRT “floating-point 16 precision mode” non-deterministic on Jetson TX2? Jetson TX2	6	1530	October 18, 2021
Is there any easy way to easily get Determinism for different batch size inferring? TensorRT	1	418	June 29, 2022
Two TRT compiled engines that were generated from the same Onnx model show different inference average times TensorRT cudnn	2	221	August 11, 2024
TensorRT and Triton Server - different results each time TensorRT	2	550	October 12, 2021
Run to run variation with TensorRT TensorRT tensorrt	1	439	September 2, 2022
TensorRT and Triton Server - different results each time Triton Inference Server (archived)	0	838	May 31, 2021

Is TensorRT inference deterministic/reproducibile?

Related topics