Hi Nvidia,
I’m working on an application that intends to maximally utilize the underlying GPU of 2080 Ti for highest performance.
Below is a snapshot of profile report. In here, most of the top hotspot kernels start with volta_.
My questions is, since my target GPU is based on Turing, is there anything that I have to do with cuDNN to utilize kernels optimized for Turing? (I assume kernels with volta_ are optimized for volta arch.)
Kindly clarify. Thank you in advance.
Type Time(%) Time Calls Avg Min Max Name
GPU activities: 16.73% 700.07ms 4824 145.12us 47.874us 1.1891ms volta_scudnn_winograd_128x128_ldg1_ldg4_relu_tile148t_nt_v1
9.93% 415.49ms 3358 123.73us 37.698us 362.61us void cudnn::detail::implicit_convolve_sgemm<float, float, int=1024, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>(int, int, int, float const , int, float, cudnn::detail::implicit_convolve_sgemm<float, float, int=1024, int=5, int=5, int=3, int=3, int=3, int=1, bool=1, bool=0, bool=1>*, kernel_conv_params, int, float, float, int, float, float, int, int)
6.09% 254.76ms 3315 76.851us 33.442us 182.47us volta_scudnn_128x64_relu_interior_nn_v1
5.87% 245.62ms 710 345.95us 17.921us 21.536ms volta_gcgemm_32x32_nt
Thanks and Regards,
Sandeep