P100 and GTX 1060 tf serving results are different

with same saved models using at both machines
what are all the things I need to make sure to get the same results on 2 different machines
like tensorflow version, cuBLAS version…
can someone help me please

There is no guarantee that you will obtain bitwise identical results when changing GPUs. Floating point math provides finite precision and is sensitive to things like the order in which numbers are summed. Such variance is expected to be small, and generally should not affect confident predictions of your model.

For example, if you are running image classification, the probabilities of each category may change a little from architecture to architecture, but that small change can only change the order of predictions in cases where the probabilities were close to one another and the model wasn’t very confident in the first place.

All that said, GTX1060 and P100 are similar architectures. It is possible that at least some of the observed difference is the result of TF choosing different CUDNN algos. If you are using TF 1.14+, you can try exporting the envvar TF_DETERMINISTIC_OPS=1 to get a more consistent algo choice. But even if this happens to work, it is not something you can rely on when moving between more distantly related architectures (going from Pascal to Turing, for example).

Of course, bugs are always possible, so if you see large discrepancies between your model predictions on P100 and GTX1060, please provide a reproducer (here or privately by filing a bug through the registered developer program).