Is there tensorcore kernel for 3D convolution?

nei0964 · November 18, 2019, 7:24am

I have tested 2D convolution and 3D convolution using cuDNN library with c++ API in order to achieve tensorcore acceleration.

The environment is as follow:
Windows 10
cuda 10.0
cudnn 7.6.5
visual studio 2017
RTX 2080 TI

It seems that 3D convolution does not have a fp16-optimized Tensor core kernel and any acceleration. I used Nsight System profiling tool to know the kernel function of each test case.

I tested following configuration:

argument

[tensorcore flag, data type, format, # of iteration, batch_size, in_channels, out_channels, image height, image width] → [used kernel, time (sec)]

2D Convolution test (3x3 conv)

[CUDNN_DEFAULT_MATH, CUDNN_DATA_FLOAT, CUDNN_TENSOR_NCHW, 4000, 8, 64, 64, 128, 128] → [volta_scudnn_128x64_relu_small_nn_v1, 3.1 sec]
[CUDNN_DEFAULT_MATH, CUDNN_DATA_HALF, CUDNN_TENSOR_NCHW, 4000, 8, 64, 64, 128, 128] → [volta_hcudnn_128x128_relu_small_nn_v1, 3.1 sec]
[CUDNN_TENSOR_OP_MATH, CUDNN_DATA_HALF, CUDNN_TENSOR_NCHW, 4000, 8, 64, 64, , 128, 128] → [turing_h1688cudnn_256x64_sliced1x2_ldg8_relu_exp_small_nhwc_tn_v1, 1.3 sec]

3D Convolution test (3x3x3 conv)

[CUDNN_DEFAULT_MATH, CUDNN_DATA_FLOAT, CUDNN_TENSOR_NCHW, 100, 1, 64, 64, 128, 128, 128] → [volta_scudnn_128x64_stridedB_splitK_small_nn_v1, 3.8 sec]
[CUDNN_DEFAULT_MATH, CUDNN_DATA_HALF, CUDNN_TENSOR_NCHW, 100, 1, 64, 64, 128, 128, 128] → [volta_hcudnn_128x128_stridedB_splitK_small_nn_v1, 3.75 sec]
[CUDNN_TENSOR_OP_MATH, CUDNN_DATA_HALF, CUDNN_TENSOR_NCHW, 100, 1, 64, 64, 128, 128, 128] → [volta_hcudnn_128x128_stridedB_splitK_small_nn_v1, 3.8 sec]

We could know that 2D Convolution uses optimized kernel ‘turing_h1688cudnn_256x64_sliced1x2_ldg8_relu_exp_small_nhwc_tn_v1’ and achieves acceleration when using CUDNN_TENSOR_OP_MATH flag and fp16 type.

However, 3D Convolution does not use optimized kernel, rather uses non-tensorcore kernel ‘volta_hcudnn_128x128_stridedB_splitK_small_nn_v1’.

I would like to know whether an optimized tensorcore kernel for 3D Convolution exists or not.

If there exists the optimized tensorcore kernel for 3D Convolution, what could be the name of it?

184699559 · November 25, 2019, 12:12pm

I have the same problem with you ! CUDNN_TENSOR_OP_MATH or CUDNN_TENSOR_OP_MATH_ALLOW_CONVERSION math type and CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM algorithm, with input[256,256,64,64,64] kernel[3,3,3]. I can’t activate Tensor core