Using per-kernel command like switch

I have profiled some metrics for a kernel regex which is “-k ElecEw_Vd”. Now, when I want to import the results with “–summary per-kernel”, I see multiple instances

 nbnxn_kernel_ElecEw_VdwLJFsw_F_cuda(cu_atomdata,cu_nbparam,Nbnxm::gpu_plist,bool), Block Size 64, Grid Size 3550, Device 0, 99 invocations
    Section: Command line profiler metrics
    Metric Name                                         Metric Unit Minimum      Maximum       Average
    --------------------------------------------------- ----------- ------------ ------------- ------------
    dram__sectors_read.sum                              sector      94003.000000 207336.000000 97702.292929
    dram__sectors_write.sum                             sector      14638.000000 33130.000000  25976.222222
    sm__cycles_active.avg.pct_of_peak_sustained_elapsed %           91.110182    96.204656     94.053113

  nbnxn_kernel_ElecEw_VdwLJFsw_F_cuda(cu_atomdata,cu_nbparam,Nbnxm::gpu_plist,bool), Block Size 64, Grid Size 3572, Device 0, 99 invocations
    Section: Command line profiler metrics
    Metric Name                                         Metric Unit Minimum      Maximum       Average
    --------------------------------------------------- ----------- ------------ ------------- ------------
    dram__sectors_read.sum                              sector      94047.000000 240262.000000 98858.282828
    dram__sectors_write.sum                             sector      14671.000000 33208.000000  24987.060606
    sm__cycles_active.avg.pct_of_peak_sustained_elapsed %           91.364360    96.112414     94.184479

Any comment on that?

The different instances use different grid sizes for launching, and are thus shown separately by the tool.

Grid Size 3550
Grid Size 3572