Hi,

I would like the cudnn convolution to use the computing power of Tensor Cores.

I found here the cudnn convolution requirements for Tensor Cores operations : Developer Guide :: NVIDIA Deep Learning cuDNN Documentation

I create an example that satisfied those conditions.

- Using a supported convolution function : I use cudnnConvolutionForward()
- Using a supported algorithm : I use CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM
- Using a supported data type : I work in FP32 so I set the convolution math type to CUDNN_TENSOR_OP_MATH_ALLOW_CONVERSION
- Using supported channel dimensions : my number of input and output channels is a multiple of 8
- Using supported data layout : all the tensors are of type CUDNN_TENSOR_NCHW

I get the following results :

*1D experiment*

Layout format | Input dim | Kernel dim | Input channels | Output channels | TC* utilization |
---|---|---|---|---|---|

NCHW | 16 | 6 | 8 | 8 | Yes |

NCHW | 32 | 6 | 8 | 8 | No |

NCHW | 64 | 6 | 8 | 8 | No |

NCHW | 128 | 6 | 8 | 8 | No |

***TC = Tensor Cores**

Tensor Cores are only used when input dimension is 16 (same results with kernel dim = 3).

I know that Tensor Cores supported the following matrix sizes.

Matrix 1 | Matrix B | Accumulator | Matrix Size (m-n-k) |
---|---|---|---|

_half | _half | float | 16x16x16 |

_half | _half | float | 32x8x16 |

_half | _half | float | 8x32x16 |

Is there a reason why a multiple of 16 as input dimension does not use Tensor Cores ?

*2D experiment*

Layout format | Input dim | Kernel dim | Input channels | Output channels | TC utilization |
---|---|---|---|---|---|

NCHW | 16x16 | 3x3 | 8 | 8 | Yes |

NCHW | 32x32 | 3x3 | 8 | 8 | No |

NCHW | 64x64 | 3x3 | 8 | 8 | Yes |

I don’t understand why TC would not be used when input dimension is 32x32.

Thanks in advance.