cuDNN convolution : which dimensions trigger Tensor Cores use?


I would like the cudnn convolution to use the computing power of Tensor Cores.

I found here the cudnn convolution requirements for Tensor Cores operations : Developer Guide :: NVIDIA Deep Learning cuDNN Documentation

I create an example that satisfied those conditions.

  • Using a supported convolution function : I use cudnnConvolutionForward()
  • Using a supported algorithm : I use CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM
  • Using a supported data type : I work in FP32 so I set the convolution math type to CUDNN_TENSOR_OP_MATH_ALLOW_CONVERSION
  • Using supported channel dimensions : my number of input and output channels is a multiple of 8
  • Using supported data layout : all the tensors are of type CUDNN_TENSOR_NCHW

I get the following results :

1D experiment

Layout format Input dim Kernel dim Input channels Output channels TC* utilization
NCHW 16 6 8 8 Yes
NCHW 32 6 8 8 No
NCHW 64 6 8 8 No
NCHW 128 6 8 8 No

*TC = Tensor Cores

Tensor Cores are only used when input dimension is 16 (same results with kernel dim = 3).

I know that Tensor Cores supported the following matrix sizes.

Matrix 1 Matrix B Accumulator Matrix Size (m-n-k)
_half _half float 16x16x16
_half _half float 32x8x16
_half _half float 8x32x16

Is there a reason why a multiple of 16 as input dimension does not use Tensor Cores ?

2D experiment

Layout format Input dim Kernel dim Input channels Output channels TC utilization
NCHW 16x16 3x3 8 8 Yes
NCHW 32x32 3x3 8 8 No
NCHW 64x64 3x3 8 8 Yes

I don’t understand why TC would not be used when input dimension is 32x32.

Thanks in advance.

Hi @julie.fraysse ,
Apologies for the miss,
Are you still facing this issue?