Using Nsight Compute (ncu) alongside srun

miguelzavalan · July 3, 2021, 3:34am

Hi, I would like to use nsight compute ncu with srun but it does not seem to target the application.

I have used nsys alongside srun and it works fine. Is this feature not introduced with ncu alongside srun?

A sample command I have used is:
nsys profile --stats=true -o $outputfile srun ./myapp

Nsys works fine and was able to generate the report for all the kernel, API calls.

However when I replace nsys with ncu:
ncu --target-processes all --kernel-id ::regex:^.*${kernel}.*$:1 --set full -o ${outputfile}_ncu srun ./myapp

It manages to run the application but does not detect any kernels.
“==WARNING== No kernels were profiled.”

I tested this same command with a simple openacc vecadd cpp program and it was also unable to detect the kernel for the program when using srun. When I removed srun and ran the application directly ./ it generated the nsight compute ncu file fine.

This leads me to believe that there is no support between using ncu and srun?

felix_dt · July 5, 2021, 2:42pm

I am not aware of any specific issues between srun and ncu, but I have some general questions about the workflow you try to achieve. Given that nsys works for you, it appears you are allocating a resource from the local node with srun and hence execute both the host and target processes on the same system? I wasn’t able to test this flow with srun locally so far, but launching mpirun under ncu worked without problems:

ncu --target-processes all mpirun ./CudaApp

Independent of your specific issue, you could very likely simply invert the order of ncu and srun commands to make it work, i.e. use

srun ncu --target-processes all --kernel-id ::regex:^.*${kernel}.*$:1 --set full -o ${outputfile}_ncu ./myapp

This command would even work if srun allocated on a remote node, as the ncu host process would then also be launched on that node. In your example, the host process could be on a different node than the target process, which would not work.

miguelzavalan · July 5, 2021, 4:22pm

Hi Felix,

Thank you for your response!

Yes, so the system I am using is a supercomputer system. I am using sbatch scripts from the login node to allocate SLURM resources and use srun to run my application across those resources on compute nodes each equipped with A100 GPUs.

From examples online, I have seen ncu works fine with mpirun, unfortunately the system only allows srun as the run command.

After contacting their IT, they also suggested to invert the order. My command now looks like below:

srun -n $ranks ncu --kernel-id ::regex:'^.*update_top.*$':1 --set full -o update_top_ncu $command

I think the command is close to working, however now I receive this error:
“srun -n $ranks ncu --kernel-id ::regex:‘^.*update_top.*$’:1 --set full -o update_top_ncu $command’ resulted in 100 recursions!”

The application began to run and ncu stopped producing “no kernels were profiled”, but the application exits immediately and no further output produced.

Is this an error message from ncu?

–

A bit more background information if it helps:
My application is an OpenACC-Offload to GPU + MPI application. I am trying to run the application across multiple nodes where each node has 4 gpus each. I am using SLURM to allocate multiple nodes, --nasks-per-node=4, --gres=gpu:4. I then call sbatch on an sbatch script to allocate from within the login node.

I have also tried to use srun with interactive mode to allocate directly to a compute node and try to use ncu ./myapp, however, that also results in a different error, I assume it is specific to the system:
[cli_0]: write_line error; fd=8 buf=:cmd=init pmi_version=1 pmi_subversion=1
:
system msg for write_line failure : Bad file descriptor
[cli_0]: Unable to write to PMI_fd
[cli_0]: write_line error; fd=8 buf=:cmd=get_appnum
:
system msg for write_line failure : Bad file descriptor
Error:PMI: PMI_Get_appnum(&appnum) = -1

felix_dt · July 5, 2021, 5:37pm

I think in this config, you still need to run your app under mpirun on the local system, e.g. mpirun -n X mpirun ./myapp, so that the necessary mpi environment is setup. The error message suggests to me that this is not the case.

resulted in 100 recursions

No, this doesn’t come from ncu. You can already tell by the fact that srun is part of the error message, but ncu wouldn’t know about srun. Potentially you used the wrong apostrophes? In the error, you use backtick which can be interpreted differently from single- or double-quotes on the shell. Maybe try using double-quotes for the regex?

yuri_S · August 16, 2022, 8:48am

Hi,

Did you solve this problem? I want to use Nsight compute with slurm, and I am using sbatch script too. But I can’t get the result. I would appriciate it if you could let me know how you use it.

I used the following command line in .sh file.

srun ncu --target-processes all -o report_$OMPI_COMM_WORLD_RANK python parallel.py

but when it reached send kernel, Nsight compute is keep running but it can’t profile the kernel.
I’m wondering how I can use Nsight compute with slurm on multi-node environment.
Thank you.

1941910456 · April 22, 2023, 1:43am

Hi,
I have the same problem, did you get some solutions?
My Nsight Compute just stucks at cuda library invoking. It keeps running without any output. And Nsight system also works fine.
Thank you.

Sanjiv.Satoor · April 24, 2023, 5:38am

Please provide more details such as the ncu version, ncu command line options used and if you see any errors or warnings when profiling. Are you also using srun?

Refer the Multi-Process Support section in the Nsight Compute CLI document.

Topic		Replies	Views
Ncu profile file not created Nsight Compute	5	1114	September 1, 2021
Option to profile only master process Nsight Compute cuda	23	3541	December 1, 2023
Nsight compute can not profile when use openmpi and nvshmem in multi-gpus Nsight Compute cuda , kernel	3	1160	January 31, 2024
Nsight-Compute returns “No kernels were profiled” warning Nsight Compute	9	1452	July 27, 2023
Question about profiling nccl kernels with Nsight Compute Nsight Compute	20	4921	February 13, 2025
Cannot profile on slurm environment Nsight Compute	2	681	August 17, 2022
Nsight Compute with MPI: ‘No Kernels Were Profiled’ Warning and Hanging Issue Nsight Compute	3	123	March 31, 2025
Nsight Compute not reporting/profiling all kernels profiled by Nsight Systems Nsight Compute	9	570	March 27, 2024
Nsight-compute print "the application returned an error code (249)" Nsight Compute	5	1460	February 13, 2023
Can't Get NCU GUI To Import Properly Nsight Compute	8	1344	October 5, 2020

Using Nsight Compute (ncu) alongside srun

Related topics