Different results with tensorflow on 1080Ti and Tesla K80 (seeds make deterministic on same machine)

Is there a script out there to check if parallelization produces the same results across different GPUs?

I have a personal machine with a 1080Ti. I recently started using a Tesla K80 on Google Compute so I can run a lot more trainings. Both GPUs are run with Ubuntu 16.04, tensorflow 1.10.0, and python 3.5.2. Both are ran with the same code on github.

The code is deterministic on the same machine. Tensorflow doesn’t make it easy, but I found a way to do it with tf.contrib.stateless_random_uniform(). So with my personal machine with the 1080Ti, running the training twice gives the same results.

The code is also deterministic on Google Compute with the Tesla K80. If I run the training twice, it gives me the same results.

The problem is the results are not the same when comparing the 1080Ti and the Tesla K80. I believe it is related to the difference in GPUs because the results are the same between another Tesla K80 in another server farm zone. So it is my belief that the different GPUs cannot guarantee the same results due to parallelism differences in the GPU. Can anyone confirm this?

Code for reproducibility is available on request, however, I am really looking for just a general answer. To further confirm this, is there a script out there to check if parallelization produces the same results across different GPUs?

Image showing different rewards between the 1080Ti and Tesla K80