Hi, I’m checking mixed precision programming of cudnn: Mixed-Precision Programming with CUDA 8 | NVIDIA Technical Blog. I’m wondering whether cudnn has any convolution kernel supporting half2 data type? Seems like it can greatly improve peak throughput compared to half type. I see there is kernel with name turing_fp16_s1688cudnn_fp16_256x64_sliced1x2_ldg8_relu_f2f_exp_small_nhwc_tn_v1. Does it use half2 or half?
Thanks!