CUDNN half2 convolution

Hi, I’m checking mixed precision programming of cudnn: Mixed-Precision Programming with CUDA 8 | NVIDIA Developer Blog. I’m wondering whether cudnn has any convolution kernel supporting half2 data type? Seems like it can greatly improve peak throughput compared to half type. I see there is kernel with name turing_fp16_s1688cudnn_fp16_256x64_sliced1x2_ldg8_relu_f2f_exp_small_nhwc_tn_v1. Does it use half2 or half?


Hi @user39874,
Currently it supports on half.
turing_fp16_s1688cudnn_fp16_256x64_sliced1x2_ldg8_relu_f2f_exp_small_nhwc_tn_v1 is fp16 input/output, fp32 accumulate.