Hi All,
I’m running a freezed pytorch model (in eval mode).
I get different output when running with different cudnn versions ~1e-5.
Looks like the difference starts from the first conv2d layer, and aggregates to a bigger change in output.
used flages (that didn’t helped):
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
Tested on Ubuntu20.04:
torch 1.13, cuda11.7: cudnn8700, cudnn8902, cudnn8500.
PS. I’ve also tried using different cuda 11.7 vs. 11.6, but haven’t observed the differences reported above.
Hi @valtmaneddy ,
If anything in the hardware-software stack changes then floating-point arithmetic results cannot (currently) be guaranteed to be bit-identical. Changes in hardware-software stack include but are not limited to: chip architecture, chip version within architecture, amount of system memory, driver version, CUDA version, NCCL version, DL library version, CPU version, other software package version such as Numpy or other libraries, chip and/or node physical interconnect, multi-node/multi-GPU regime. To guarantee bit-exact results it’s necessary to freeze the software stack using containers and freeze the hardware versions, including all components involved in compute (CPU, GPU, DPU, interconnect, etc)
Thanks