Ncu misses kernels during loop

Problem as titled:
A kernel specified by kernel name under filter is only profiled for the 1st iteration(launch) in the loop no matter how I set the capture count.

Each iteration contains several kernels’ launches, and here’s what I checked already:

  1. When no filter is set, all kernels are profiled for the 1st iteration, then only the first two kernels in the loop gets profiled for the 2nd iteration, then only the second kernel in the loop gets profiled for later iterations.
  2. During interactive-profile, all kernels’ launches are visible, but still not profiled.

Task specific:
I need to profile a specific kernel (aggregated over all its launches) called “compute_V_collect_spike_learnFF_fast” in this code repo “GitHub - g13/patchV1: a simulation framework for a patch of binocular V1”, the source file of the kernel is in “./src/” and it shows up in the main function in “./src/” at line 5934. The code can be compiled with “./src/compile” (need to modified the paths for output file"

Example output of ncu:

==PROF== Profiling "logRand_init" - 1: 
==PROF== Profiling "rand_spInit" - 2: 
==PROF== Profiling "store_PM" - 3: 
==PROF== Profiling "recal_G_mat" - 4: 
==PROF== Profiling "recal_G_mat" - 5: 

1st iteration begins:

==PROF== Profiling "virtual_LGN_convol" - 6: 
==PROF== Profiling "LGN_nonlinear" - 7: 
==PROF== Profiling "compute_V_collect_spike_learn..." - 8:  ### Target Kernel
==PROF== Profiling "recal_G_mat" - 9: 
==PROF== Profiling "recal_G_mat" - 10: 

later iterations:

==PROF== Profiling "virtual_LGN_convol" - 11:      # iteration 2
==PROF== Profiling "LGN_nonlinear" - 12: 
==PROF== Profiling "LGN_nonlinear" - 13:       # iteration 3
==PROF== Profiling "LGN_nonlinear" - 14:       # iteration 4
==PROF== Profiling "LGN_nonlinear" - 15:       # iteration 5
==PROF== Profiling "LGN_nonlinear" - 16:       # ...
==PROF== Profiling "LGN_nonlinear" - 17: 
==PROF== Profiling "LGN_nonlinear" - 18:

I’ve also tried different options, any idea why this happens?