Ncu profile file not created


I’m using ncu 2021.1.0.0 version. Using $ MPi and 4 Gpus, trying to profile a particular kernel in my code ( as:

srun ncu -o profile_output --kernel-regex nvkernel_axhelm_omp__F1L614_1_ --launch-skip 297 --launch-count 1 “./nek5000”

Such command (apart srun) is suggested by Nsight-System GUI. I added -o out_profile to have a profilation file to open with nsys-ui. I get:

==WARNING== No kernels were profiled.
==WARNING== Profiling kernels launched by child processes requires the --target-processes all option.

and no profilation is done. Adding “–target-processes all” as suggested, I get:

==WARNING== No kernels were profiled.

I would like do have a profilation output to open in nsys-ui. What I’m doing wrong?

The profilation of entire application instead appears works well,at the moment is in running but after 1 hour is still in running (using application replay as ncu suggest), so I cannot reply the entire application eath time.

Could you help me? Thanks.

The command generally looks fine. Is it possible that the number of times this kernel is launched varies between runs? Also, is nvkernel_axhelm_omp__F1L614_1 the base function name of the kernel, or is it the mangled name? You can use

--kernel-name-base [function,demangled,mangled]

to switch between what ncu uses to check against the --kernel-regex filter string.
You could also check if it works without --launch-skip, to see if the kernel filtering/name matching works at all?

Hi, it worked with the following command:

srun ncu -o profile --target-processes all --kernel-name-base=function --kernel-regex axhelm_omp --launch-skip 297 --launch-count 1 “./nek5000”

Thanks for your help. The entire application after 10 hours in running the profiled still didn’t finished. Very strange.

The entire application after 10 hours in running the profiled still didn’t finished
Can you comment on how many kernels are being profiled in this setup? What is the set of metrics/sections you are collecting (would be the default if you didn’t set anything specific)? Is the application still in the first application replay pass, or did it move to a subsequent pass by now?

Hi felix, this is my command line:

srun ncu --replay-mode application --target-processes all -f -o ncu_profile ./nek5000

I don’t know how many kernels are profiled, but the code is quite big.

The command line output of ncu should give you an idea of this, look for lines such as “Profiling (name) - (number)” where the number shows you which kernel instance this is. Please note that ncu is not designed to profile huge numbers of kernels, particularly as the results obtained likely are not actionable to what you want to achieve.

Instead, you would normally use Nsight Systems first to identify which kernels are valuable optimization targets, if any, and then use Nsight Compute to target these specific kernels for more detailed performance metric analysis.