I’m investigating an issue where I have a model trained on a system using an RTX 6000 GPU that I’m using for inferencing on another system with an RTX A6000 installed. The model is converted at runtime to a DAG built around cuDNN. It appears that the use of the TF32 floating point format during inferencing with an RTX A6000 on a model trained using FP32 causes enough errors to accumulate to make the output of the inference engine unsatisfactory. I’ve verified that using cuDNN 7.6 for training vs cuDNN 8.1 for inferencing isn’t the cause of this problem by using an RTX 6000 in both environments. Is there a way to disable the TF32 computations at run-time on demand for a specific cuDNN context? It looks like PyTorch has a way to do this, but I can’t find it documented in the cuDNN API.
Edit: It looks like I can call cudnnSetConvolutionMathType (…, CUDNN_FMA_MATH) and this will supposedly not use TF32, but results are binary identical whether or not I call that.