I have tested 2D convolution and 3D convolution using cuDNN library with c++ API in order to achieve tensorcore acceleration.

**The environment is as follow:**

Windows 10

cuda 10.0

cudnn 7.6.5

visual studio 2017

RTX 2080 TI

It seems that 3D convolution does not have a fp16-optimized Tensor core kernel and any acceleration. I used Nsight System profiling tool to know the kernel function of each test case.

**I tested following configuration:**

# argument

[tensorcore flag, data type, format, # of iteration, batch_size, in_channels, out_channels, image height, image width] --> [used kernel, time (sec)]

# 2D Convolution test (3x3 conv)

[CUDNN_DEFAULT_MATH, CUDNN_DATA_FLOAT, CUDNN_TENSOR_NCHW, 4000, 8, 64, 64, 128, 128] --> [volta_scudnn_128x64_relu_small_nn_v1, 3.1 sec]

[CUDNN_DEFAULT_MATH, CUDNN_DATA_HALF, CUDNN_TENSOR_NCHW, 4000, 8, 64, 64, 128, 128] --> [volta_hcudnn_128x128_relu_small_nn_v1, 3.1 sec]

[CUDNN_TENSOR_OP_MATH, CUDNN_DATA_HALF, CUDNN_TENSOR_NCHW, 4000, 8, 64, 64, , 128, 128] --> [turing_h1688cudnn_256x64_sliced1x2_ldg8_relu_exp_small_nhwc_tn_v1, 1.3 sec]

# 3D Convolution test (3x3x3 conv)

[CUDNN_DEFAULT_MATH, CUDNN_DATA_FLOAT, CUDNN_TENSOR_NCHW, 100, 1, 64, 64, 128, 128, 128] --> [volta_scudnn_128x64_stridedB_splitK_small_nn_v1, 3.8 sec]

[CUDNN_DEFAULT_MATH, CUDNN_DATA_HALF, CUDNN_TENSOR_NCHW, 100, 1, 64, 64, 128, 128, 128] --> [volta_hcudnn_128x128_stridedB_splitK_small_nn_v1, 3.75 sec]

[CUDNN_TENSOR_OP_MATH, CUDNN_DATA_HALF, CUDNN_TENSOR_NCHW, 100, 1, 64, 64, 128, 128, 128] --> [volta_hcudnn_128x128_stridedB_splitK_small_nn_v1, 3.8 sec]

We could know that 2D Convolution uses optimized kernel ‘turing_h1688cudnn_256x64_sliced1x2_ldg8_relu_exp_small_nhwc_tn_v1’ and achieves acceleration when using CUDNN_TENSOR_OP_MATH flag and fp16 type.

However, 3D Convolution does not use optimized kernel, rather uses non-tensorcore kernel ‘volta_hcudnn_128x128_stridedB_splitK_small_nn_v1’.

I would like to know whether an optimized tensorcore kernel for 3D Convolution exists or not.

If there exists the optimized tensorcore kernel for 3D Convolution, what could be the name of it?