Questions about "__c_mset16_sky" from the profiling summary

Hi,

I am making some profiling on a NVIDIA Tesla T4 GPU, using compiler and profiler from NVHPC 21.3.

As you can see from the part of the profiling summary below, I have a couple of __c_mset16_sky which I have no idea what they are. I would like to know what __c_mset16_sky means and how to trace it back in the source code.

  2.24%   298.4ms  | |     | rhs_m_add_flux_contribution_ (/home/x_ql/ess_openacc/src/rhs_m.F90:255 0xa31c)
  2.24%   298.4ms  | |     | | hyperviscosity_m_hyper_viscosity_ (/home/x_ql/ess_openacc/src/hyperViscosity_m.F90:0 0x156f)
  0.62%  82.316ms  | |     | |   hyperviscosity_m_traditionalsbp_dissipation_ (/home/x_ql/ess_openacc/src/hyperViscosity_m.F90:110 0xc83)
  0.39%  51.448ms  | |     | |   | __c_mset16_sky (0xcf02a157)
  0.15%  20.579ms  | |     | |   | sbp_operators_m_diff_xi_all_ (/home/x_ql/ess_openacc/src/sbp_operators_m.F90:892 0x68e)
  0.15%  20.579ms  | |     | |   | | sbp_operators_m_diff_xi_one_ (/home/x_ql/ess_openacc/src/sbp_operators_m.F90:736 0x41f)
  0.15%  20.579ms  | |     | |   | |   sbp_operators_m_dcsrmv_acc_ (/home/x_ql/ess_openacc/src/sbp_operators_m.F90:5047 0x4fb)
  0.15%  20.579ms  | |     | |   | |     __pgi_uacc_computedone (../../src/computeexitdone.c:90 0xd161002b)
  0.15%  20.579ms  | |     | |   | |       __pgi_uacc_computedone2 (../../src/computeexitdone.c:59 0xd1610196)
  0.15%  20.579ms  | |     | |   | |         __pgi_uacc_cuda_wait (../../src/cuda_wait.c:77 0xd119644c)
  0.15%  20.579ms  | |     | |   | |           cuStreamSynchronize (0xce4fc25f)
  0.08%   10.29ms  | |     | |   | __c_mset16_sky (0xcf02a168)
  0.54%  72.027ms  | |     | |   hyperviscosity_m_traditionalsbp_dissipation_ (/home/x_ql/ess_openacc/src/hyperViscosity_m.F90:111 0xca7)
  0.23%  30.869ms  | |     | |   | __c_mset16_sky (0xcf02a157)
  0.08%   10.29ms  | |     | |   | __c_mset16_sky (0xcf02a171)

Thanks in advance for your help.
Best
qiang

These look like host side memset calls. The compiler’s idiom recognition will often replace bulk copy operations (like array syntax “Array=value”) with calls to memset since they are more efficient than generating an implicit loop.

Thanks Mat.