I would like to perform convolutions on an A100 excluding the tensor units.
When I invoke cudnnFindConvolutionForwardAlgorithm, cudnnFindConvolutionBackwardFilterAlgorithm and cudnnFindConvolutionBackwardDataAlgorithm functions with a convolution descriptor set with a CUDNN_FMA_MATH math type the results contain only performance obtained with CUDNN_FMA_MATH despite the documentation says that
“[the function] will attempt both the provided convDescmathType and CUDNN_DEFAULT_MATH (assuming the two differ)”.
Can I rely on this behavior? I.e., will passing to these functions a CUDNN_FMA_MATH descriptor produce a list containing only CUDNN_FMA_MATH attempts?
PS from the docs: “With NVIDIA Ampere Architecture and CUDA toolkit 11, CUDNN_DEFAULT_MATH permits TF32 Tensor Core operation and CUDNN_FMA_MATH does not.”