CUDA 9 for learning, CUDA 8 for inference Different results on test set

I have a case where the learning was done on a workstation using CUDA 8, and the inference on a TX2 with CUDA 8 (the carrier board does not support JetPack 3.2 yet) where the test set was giving different results on the workstation and the TX2.

Once I reinstalled the workstation from 0 to be as similar as possible to the TX2, including CUDA 8, both results are the same.

What gives? is it a precision issue?

Just a thought – have you compared cuDNN versions?