Seeing calls to volta_gcgemm_64x64_nt and volta_cgemm_64x32 in a Turing GPU with AMP .

sgambient · May 27, 2019, 2:31am

I have enabled AMP , but see that most of compute is still no-AMP per the following nvprof output. Wondering if anyone can offer any insight on why volta_gcgemm_64x64_nt and volta_cgemm_64x32_tn are used instead of the AMP versions. What other options can I use to gain more visibility ? Also, was surprised to see the “volta_” prefix in a Turing GPU !!

======== Profiling result:
“Type”,“Time(%)”,“Time”,“Calls”,“Avg”,“Min”,“Max”,“Name”
,%,s,ms,ms,ms,
“GPU activities”,37.990251,718.518716,2780,258.459969,25.534796,589.099815,“volta_gcgemm_64x64_nt”
“GPU activities”,8.646679,163.536706,1366,119.719404,54.162721,213.920884,“volta_cgemm_64x32_tn”
“GPU activities”,6.893051,130.369919,3732,34.932989,0.083008,86.454414,“void cudnn::detail::implicit_convolve_sgemm<__half, __half, int=512, int=6, int=8, int=3, int=3, int=5, int=1, bool=1, bool=0, bool=1>(int, int, int, __half const , int, __half, cudnn::detail::implicit_convolve_sgemm<__half, __half, int=512, int=6, int=8, int=3, int=3, int=5, int=1, bool=1, bool=0, bool=1>*, kernel_conv_params, int, float, float, int, __half, __half, int, int)”
“GPU activities”,5.083105,96.137988,3046,31.562044,0.081248,93.622654,“void cudnn::detail::explicit_convolve_sgemm<__half, int, int=512, int=6, int=8, int=3, int=3, int=5, int=0, bool=1>(int, int, int, __half const , int, __half const , int, cudnn::detail::explicit_convolve_sgemm<__half, int, int=512, int=6, int=8, int=3, int=3, int=5, int=0, bool=1>, kernel_conv_params, int, int, float, float, int, __half const *, __half const *)”
“GPU activities”,3.115977,58.933228,187030,0.315100,0.002304,6.883499,“void cudnn::detail::bn_fw_inf_1C11_kernel_NCHW<__half, float, bool=1, int=1>(float, cudnn::detail::bn_fw_inf_1C11_kernel_NCHW<__half, float, bool=1, int=1>, cudnnTensorStruct, __half const , float, cudnnTensorStruct, float, cudnn::detail::bn_fw_inf_1C11_kernel_NCHW<__half, float, bool=1, int=1> const *, cudnn::detail::bn_fw_inf_1C11_kernel_NCHW<__half, float, bool=1, int=1> const , cudnn::detail::bn_fw_inf_1C11_kernel_NCHW<__half, float, bool=1, int=1> const , cudnn::detail::bn_fw_inf_1C11_kernel_NCHW<__half, float, bool=1, int=1> const , cudnn::detail::bn_fw_inf_1C11_kernel_NCHW<__half, float, bool=1, int=1>)”
“GPU activities”,2.642579,49.979722,110832,0.450950,0.019904,8.392769,“turing_fp16_s1688cudnn_fp16_128x128_ldg8_relu_f2f_exp_interior_nhwc_tn_v1”
“GPU activities”,2.408994,45.561889,10354,4.400414,0.233118,14.193811,“turing_fp16_s1688cudnn_fp16_256x128_ldg8_relu_f2f_exp_small_nhwc_tn_v1”
“GPU activities”,2.187566,41.373952,9388,4.407110,0.097215,34.017020,“turing_fp16_s1688cudnn_fp16_256x128_ldg8_relu_f2f_exp_interior_nhwc_tn_v1”