TX2 produces different results across runs and comparing that on a server using Tensorflow and Keras

Hi AastaLLL,

Thanks very much for the analysis. We finally find out the large bias between server execution and TX2 execution origins from an atrous convolutional layer (line 299 in [1]). By modifying this unusual convolutional layer to a normal convolutional layer, the difference of the output between a server and TX2 becomes negligible. We do not know the root cause why this atrous convolutional layer causes such execution difference, it maybe because Keras, Tensorflow, any embedded computation library, CuDNN, CUDA or the hardware. The software stack is too deep to locate the root cause.

Reference:
[1] https://github.com/pierluigiferrari/ssd_keras/blob/master/models/keras_ssd512.py

Many thanks.