Hi,
I would like the cudnn convolution to use the computing power of Tensor Cores.
I found here the cudnn convolution requirements for Tensor Cores operations : Developer Guide :: NVIDIA Deep Learning cuDNN Documentation
I create an example that satisfied those conditions.
- Using a supported convolution function : I use cudnnConvolutionForward()
- Using a supported algorithm : I use CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM
- Using a supported data type : I work in FP32 so I set the convolution math type to CUDNN_TENSOR_OP_MATH_ALLOW_CONVERSION
- Using supported channel dimensions : my number of input and output channels is a multiple of 8
- Using supported data layout : all the tensors are of type CUDNN_TENSOR_NCHW
I get the following results :
1D experiment
Layout format | Input dim | Kernel dim | Input channels | Output channels | TC* utilization |
---|---|---|---|---|---|
NCHW | 16 | 6 | 8 | 8 | Yes |
NCHW | 32 | 6 | 8 | 8 | No |
NCHW | 64 | 6 | 8 | 8 | No |
NCHW | 128 | 6 | 8 | 8 | No |
*TC = Tensor Cores |
Tensor Cores are only used when input dimension is 16 (same results with kernel dim = 3).
I know that Tensor Cores supported the following matrix sizes.
Matrix 1 | Matrix B | Accumulator | Matrix Size (m-n-k) |
---|---|---|---|
_half | _half | float | 16x16x16 |
_half | _half | float | 32x8x16 |
_half | _half | float | 8x32x16 |
Is there a reason why a multiple of 16 as input dimension does not use Tensor Cores ?
2D experiment
Layout format | Input dim | Kernel dim | Input channels | Output channels | TC utilization |
---|---|---|---|---|---|
NCHW | 16x16 | 3x3 | 8 | 8 | Yes |
NCHW | 32x32 | 3x3 | 8 | 8 | No |
NCHW | 64x64 | 3x3 | 8 | 8 | Yes |
I don’t understand why TC would not be used when input dimension is 32x32.
Thanks in advance.