Kerner launch error with ncu

Dear ncu users,

I’m trying to profile a kernel using ncu version 2021.1.0.0:

srun ncu -o profile -f --target-processes all --kernel-name-base=function --kernel-regex axhelm_omp --launch-count 1 “./nek5000”

having the following error:

==ERROR== Failed to profile kernel “nvkernel_axhelm_omp__F1L608_1_” in process 21044
Fatal error: expression ‘HX_CU_CALL_CHECK(p_cuStreamSynchronize(stream[dev]))’ (value 1) is not equal to expression ‘HX_SUCCESS’ (value 0)
==PROF== Profiling “nvkernel_axhelm_omp__F1L608_1_”: 0%
==WARNING== Backing up device memory in system memory. Kernel replay might be slow. Consider using application replay to avoid memory save-and-restore.
==ERROR== The application returned an error code (6).
==ERROR== An error occurred while trying to profile.
==WARNING== No kernels were profiled.
…50%…100% - 11 passes
==PROF== Profiling “nvkernel_axhelm_omp__F1L608_1_”: 0%==PROF== Received signal
==PROF== Trying to shutdown target application
==PROF== Profiling “nvkernel_axhelm_omp__F1L608_1_”: ==PROF== Received signal
==PROF== Trying to shutdown target application
==PROF== Received signal
==PROF== Trying to shutdown target application

The application works well with nsys and without profiling. What is the problem with such kernel? Thanks.

It is hard to know right off if this is an nsight issue or an OpenMP (I assume that is what you are using due to the kernel name) issue. Can you make a small reproducer? Does your app use most or all of the GPU memory? The “Backing up device memory in system memory” message seems suspect.

Yes sorry, the code are using OPENMP offloading. I have not a small reproducer unfortunately.

Does your app use most or all of the GPU memory?

The entire application is not profiled from NCU. But such kernel takes about 87% of GPU memory. Attached the screenshot from ncu launching a particular instance of kernel with the following command:

srun ncu -o profile -f --target-processes all --kernel-name-base=function --kernel-regex axhelm_omp --launch-skip 297 --launch-count 1 “./nek5000”

Do you think this is a root of my problem? If yes, how can I solve?

Sorry, so what is the difference between the failing case in your first comment, and the successful case here?