cudnnFindConvolution*Algorithm excluding tensor units

Hi,

I would like to perform convolutions on an A100 excluding the tensor units.

When I invoke cudnnFindConvolutionForwardAlgorithm, cudnnFindConvolutionBackwardFilterAlgorithm and cudnnFindConvolutionBackwardDataAlgorithm functions with a convolution descriptor set with a CUDNN_FMA_MATH math type the results contain only performance obtained with CUDNN_FMA_MATH despite the documentation says that
“[the function] will attempt both the provided convDescmathType and CUDNN_DEFAULT_MATH (assuming the two differ)”.

Can I rely on this behavior? I.e., will passing to these functions a CUDNN_FMA_MATH descriptor produce a list containing only CUDNN_FMA_MATH attempts?

Many thanks,
Paolo

PS from the docs: “With NVIDIA Ampere Architecture and CUDA toolkit 11, CUDNN_DEFAULT_MATH permits TF32 Tensor Core operation and CUDNN_FMA_MATH does not.”

Hi,

Yes. As with NVIDIA Ampere Architecture and CUDA toolkit 11, CUDNN_DEFAULT_MATH permits TF32 Tensor Core operation and CUDNN_FMA_MATH does not.
https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnnMathType_t

Thank you.

Thanks for your answer, spolisetty.

I suppose that CUDNN_TENSOR_OP_MATH allows TF32 Tensor Core operations (i.e. converting fp32 into tf32) for CUDA 11 and Ampere architecture. In other words, the documentation doesn’t consider converting fp32 into tf32 an active datatype down conversion.

Did I understand correctly?

Thanks again,
Paolo