Regarding enabling turing specific kernels in cuDNN

Hi Nvidia,

I’m working on an application that intends to maximally utilize the underlying GPU of 2080 Ti for highest performance.

Below is a snapshot of profile report. In here, most of the top hotspot kernels start with volta_.
My questions is, since my target GPU is based on Turing, is there anything that I have to do with cuDNN to utilize kernels optimized for Turing? (I assume kernels with volta_
are optimized for volta arch.)

Kindly clarify. Thank you in advance.

Type Time(%) Time Calls Avg Min Max Name
GPU activities: 16.73% 700.07ms 4824 145.12us 47.874us 1.1891ms volta_scudnn_winograd_128x128_ldg1_ldg4_relu_tile148t_nt_v1
9.93% 415.49ms 3358 123.73us 37.698us 362.61us void cudnn::detail::implicit_convolve_sgemm<float, float, int=1024, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>(int, int, int, float const , int, float, cudnn::detail::implicit_convolve_sgemm<float, float, int=1024, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>*, kernel_conv_params, int, float, float, int, float, float, int, int)
6.09% 254.76ms 3315 76.851us 33.442us 182.47us volta_scudnn_128x64_relu_interior_nn_v1
5.87% 245.62ms 710 345.95us 17.921us 21.536ms volta_gcgemm_32x32_nt

Thanks and Regards,
Sandeep

Hi Again,

From the Turing Tuning guide I find below statements -

Any binary compiled for Volta will run on Turing, but Volta binaries using Tensor Cores will only be able to reach half of Turing’s Tensor Core peak performance. Recompiling the binary specifically for Turing would allow it to reach the peak performance. See the Turing Compatibility Guide for more information.

So I’ll have to recompile cuDNN for Turing to see further performance boost? Please correct/clarify.

Thanks and Regards,
Sandeep Soni