Is there tensorcore kernel for 3D convolution?

nei0964 · November 18, 2019, 7:25am

I have tested 2D convolution and 3D convolution using cuDNN library with c++ API in order to achieve tensorcore acceleration.

The environment is as follow:
Windows 10
cuda 10.0
cudnn 7.6.5
visual studio 2017
RTX 2080 TI

It seems that 3D convolution does not have a fp16-optimized Tensor core kernel and any acceleration. I used Nsight System profiling tool to know the kernel function of each test case.

I tested following configuration:

argument

[tensorcore flag, data type, format, # of iteration, batch_size, in_channels, out_channels, image height, image width] → [used kernel, time (sec)]

2D Convolution test (3x3 conv)

[CUDNN_DEFAULT_MATH, CUDNN_DATA_FLOAT, CUDNN_TENSOR_NCHW, 4000, 8, 64, 64, 128, 128] → [volta_scudnn_128x64_relu_small_nn_v1, 3.1 sec]
[CUDNN_DEFAULT_MATH, CUDNN_DATA_HALF, CUDNN_TENSOR_NCHW, 4000, 8, 64, 64, 128, 128] → [volta_hcudnn_128x128_relu_small_nn_v1, 3.1 sec]
[CUDNN_TENSOR_OP_MATH, CUDNN_DATA_HALF, CUDNN_TENSOR_NCHW, 4000, 8, 64, 64, , 128, 128] → [turing_h1688cudnn_256x64_sliced1x2_ldg8_relu_exp_small_nhwc_tn_v1, 1.3 sec]

3D Convolution test (3x3x3 conv)

[CUDNN_DEFAULT_MATH, CUDNN_DATA_FLOAT, CUDNN_TENSOR_NCHW, 100, 1, 64, 64, 128, 128, 128] → [volta_scudnn_128x64_stridedB_splitK_small_nn_v1, 3.8 sec]
[CUDNN_DEFAULT_MATH, CUDNN_DATA_HALF, CUDNN_TENSOR_NCHW, 100, 1, 64, 64, 128, 128, 128] → [volta_hcudnn_128x128_stridedB_splitK_small_nn_v1, 3.75 sec]
[CUDNN_TENSOR_OP_MATH, CUDNN_DATA_HALF, CUDNN_TENSOR_NCHW, 100, 1, 64, 64, 128, 128, 128] → [volta_hcudnn_128x128_stridedB_splitK_small_nn_v1, 3.8 sec]

We could know that 2D Convolution uses optimized kernel ‘turing_h1688cudnn_256x64_sliced1x2_ldg8_relu_exp_small_nhwc_tn_v1’ and achieves acceleration when using CUDNN_TENSOR_OP_MATH flag and fp16 type.

However, 3D Convolution does not use optimized kernel, rather uses non-tensorcore kernel ‘volta_hcudnn_128x128_stridedB_splitK_small_nn_v1’.

I would like to know whether an optimized tensorcore kernel for 3D Convolution exists or not.

If there exists the optimized tensorcore kernel for 3D Convolution, what could be the name of it?

SunilJB · December 18, 2019, 11:59am

Hi,

We have some 3D tensor core support in 7.6.5 for Volta.
Please refer below link for more details:
https://docs.nvidia.com/deeplearning/sdk/cudnn-best-practices/index.html#rec-settings-3d-conv

Thanks

273235057 · December 27, 2019, 8:30am

I have tested in V100 for 3d Convolution and achieves acceleration, but the same code in T4 does not have any acceleration.
Is there any diffenent between using sensor core on T4 and V100?

The environment is as follow:
Centos 6.6
cuda 10.0
cudnn 7.6.5
V100/T4

convolution’s info:
kernel(3,3,3)
pad(1,1,1)
stide(1,1,1)
dilate(1,1,1)
input shape(32, 32, 32)
input output and filter’s dtype is fp16, batch size and channel are all multiple of 8

SunilJB · December 30, 2019, 6:07am

Hi,

As mentioned earlier sine we have some 3D tensor core support in 7.6.5 only for Volta. You are getting acceleration on V100 and not on T4.
T4 - NVIDIA Turing architecture
V100 - NVIDIA Volta architecture

Thanks

Topic		Replies	Views
Is there tensorcore kernel for 3D convolution? cuDNN	0	1674	November 18, 2019
Is there tensorcore kernel for 3D convolution? Deep Learning (Training & Inference) mixed-precision	1	1014	November 25, 2019
Conv3D - Running it on Tensor Core - cuDNN cuDNN	6	1711	June 12, 2020
Does TensorRT support conv3d with Tensor Core ? TensorRT	13	2128	April 26, 2021
TensorRT 7 conv3d is not running on Tensor Cores TensorRT	7	1501	September 22, 2021
Conv3D does not use Tensor Cores TensorRT tensorrt , cuda , cudnn	8	1205	October 23, 2020
TensorRT 7 conv3d is not running on Tensor Cores Jetson Xavier NX tensorrt	15	1816	November 11, 2021
Make all tensorrt optimizations compatible with 3D convolution TensorRT	6	1482	April 15, 2021
Low performance for convolution in cuDNN on Tesla V100 cuDNN	5	2214	August 2, 2018
Cudnn can't use tensorcore cuDNN	0	804	March 16, 2023

Is there tensorcore kernel for 3D convolution?

argument

2D Convolution test (3x3 conv)

3D Convolution test (3x3x3 conv)

Related topics