Error failed to profile kernel

I am trying to profile a single process, single thread program using nsight compute. But it fails when trying to profile the specified kernel

ncu --section SpeedOfLight --export ncu/ncompute -k score_generation_long ./my_program
==PROF== Connected to process 811598 (/home/liuxs/profile/minimap2/minimap2)
[config_batch] mem_per_stream: 38154063052 x 1 streams
[config_stream] max_grid: 118002, max_anchors_per_stream: 1465540000, max_num_cut_per_stream: 2935660
==PROF== Profiling "score_generation_long": 0%
==ERROR== Failed to profile kernel "score_generation_long" in process 811598
==PROF== Trying to shutdown target application
==ERROR== The application returned an error code (9).
==ERROR== An error occurred while trying to profile.
==WARNING== No kernels were profiled.
==WARNING== Profiling kernels launched by child processes requires the --target-processes all option.

The output doesn’t contain any useful information, how can I know what’s wrong with the program? I can independently run the program without error.

It turns out to be an out-of-memory. How could NCU uses so much more memory compared with running without NCU? Is there any way to limit the required memory size?

Nsight Compute stores and restores kernel state in memory in order to replay the kernel multiple times. That can double the memory footprint. To avoid this you can switch to application replay with “–replay-mode application”. This avoids the memory storage from needing to replay. Let me know if that solves your issue.