my environment : Power 9 with Volta GPUs, cuda 9.1 cudnn 7.0.5
I’m running a PyTorch test called test_Conv2d_groups_nobias on different size tensors of float16 and float32. The test first runs using float32, and then runs the same test using float16. The float32 test passes , but the float16 test fails with a precision failure.
But if I reverse the order of which size is tested first (i.e. run test on float16 first then float32) the test passes.
Also if I run the test with float32 on cuda device 0 and then run the test with float16 on device 1 the test passes.
My conclusion here is that gpu memory has garbage left over that is affecting the test when using float16 after running the same test using float32.
This same test does not happen on Power8 with Pascal GPU with cuda 9.1 and cudnn 7.0.4