Nsight compute can not profile when use openmpi and nvshmem in multi-gpus

728882065 · November 10, 2023, 1:11pm

Hello, I want to use ncu in my cuda program.
I run the program with openmpi and nvshmem.my environment is:

docker container create by nvidia/cuda:11.3.1-cudnn8-devel-ubuntu20.04
Two GA100 without Nvlink
nccl 2.9.8
openmpi 4.1.1
nvshmem 2.0.3

My program is a deeplearning program using data parallelism。I use nvshmem which run in mpi to exchange data. The application shell is like:

mpirun -np 2 application args

I try to use ncu-ui 2023.2.2 in windows to profile it remotely. But the environment cannot inherit from remote.
So I use the ncu download by ncu-ui 2023.2.2 before.
I have tried two method to run it:

/var/tmp/target/linux-desktop-glibc_2_11_3-x64/ncu --config-file off --export /var/tmp/report%i --force-overwrite --target-processes all --replay-mode application --app-replay-match grid --app-replay-buffer file --app-replay-mode relaxed --launch-count 1 --section-folder /var/tmp/sections mpirun --allow-run-as-root -np 2 application args

and

mpirun --allow-run-as-root -np 2 /var/tmp/target/linux-desktop-glibc_2_11_3-x64/ncu --config-file off --export /var/tmp/report%i --force-overwrite --target-processes all --replay-mode range --launch-count 1 --section-folder /var/tmp/sections application args

But neither of these methods is feasible.
When I use the previous method, which is to start mpirun using ncu, an error is reported as follows:

==PROF== Profiling “nvshmemi_init_array_kernel” - 0 (1/1): Application replay pass 5
==ERROR== Failed to profile “nvshmemi_init_array_kernel” in process 3128
==ERROR== Failed to profile “nvshmemi_init_array_kernel” in process 3129
==PROF== Trying to shutdown target application
==ERROR== An error occurred while trying to profile.
==ERROR== Unexpected number of profiled kernels. Application replay requires that the execution, combined with selected filters, guarantees a consistent set of kernels in all passes.
==ERROR== Check the --app-replay-match option for different matching strategies.
==WARNING== No kernels were profiled.

and When using mpirun to start ncu programs separately, it will cause errors during replay：

It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here’s some additional information (which may only be relevant to an
Open MPI developer):
getting local rank failed

→ Returned value No permission (-17) instead of ORTE_SUCCESS

*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
[24df9172edc5:03014] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!

I try to use the range replay mode, but:

==PROF== Profiling “range” - 0 (1/1): ==PROF== Profiling “range” - 0 (1/1): 0%…50%…100%

==ERROR== LaunchFailed
==ERROR== Failed to profile “range” in process 3163
==PROF== Trying to shutdown target application
0%…50%…100%

==ERROR== LaunchFailed
==ERROR== Failed to profile “range” in process 3162
==PROF== Trying to shutdown target application
==ERROR== The application returned an error code (9).
==ERROR== An error occurred while trying to profile.
==WARNING== No ranges were profiled.

Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.

==ERROR== The application returned an error code (9).
==ERROR== An error occurred while trying to profile.
==WARNING== No ranges were profiled.

My cuda fuction is a Dist GCN, use the nvshmem to transfer the data.By the way,nsys can easily use in this program.
How to fix this problem?

Now I use the old ncu-2021.1.1.0 intstalled by apt-get.It can work directly when I use -k option.but It seems can not see the cpu call stack. And the metric in old version is less than newer version.The newest version of ncu still can not be used.

veraj · November 20, 2023, 8:09am

Hi, @728882065

Thanks for using Nsight Compute !
So you mean you can profile successfully with an old version 2021.1.1.0, but can’t with 2023.2.2.

Which command you use is successful ?
Which driver do you use ?
Is it possible to provide us a mini-repo ?

By the way, we recently have a new version 2023.3.1 released, can you check if the issue still exists on this version. Thanks !

veraj · January 17, 2024, 5:25am

Hi, @728882065

Have you tried with our latest public version ? Is this still an issue for you?

Topic		Replies	Views
Nsight Compute runs into undefined symbol error or Slurm PMI error when profiling NVSHMEM Nsight Compute ncu	6	142	August 19, 2025
Nsight-compute print "the application returned an error code (249)" Nsight Compute	5	1626	February 13, 2023
Option to profile only master process Nsight Compute cuda	23	4008	December 1, 2023
Error failed to profile kernel Nsight Compute cuda , nsight	2	912	May 18, 2023
Nsight Compute does not detect kernel launches for OpenMP offloaded code Nsight Compute profiling	10	1782	February 28, 2023
Unable to profile with NCU -- WARNING: No Kernels were profiled Nsight Compute cuda , nsight , deep-learning-profiler , profiling	3	2008	May 15, 2023
Question about profiling nccl kernels with Nsight Compute Nsight Compute	22	5926	October 10, 2025
Nsight-Compute returns “No kernels were profiled” warning Nsight Compute	9	1800	July 27, 2023
Cannot profile kernel from CUDA samples Nsight Compute	6	640	May 31, 2023
Ncu does not detect kernels, ==ERROR== The application returned an error code (11) Nsight Compute kernel , profiling	5	2261	December 13, 2023

Nsight compute can not profile when use openmpi and nvshmem in multi-gpus

→ Returned value No permission (-17) instead of ORTE_SUCCESS

==ERROR== LaunchFailed
==ERROR== Failed to profile “range” in process 3162
==PROF== Trying to shutdown target application
==ERROR== The application returned an error code (9).
==ERROR== An error occurred while trying to profile.
==WARNING== No ranges were profiled.

Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.

Nsight compute can not profile when use openmpi and nvshmem in multi-gpus

→ Returned value No permission (-17) instead of ORTE_SUCCESS

==ERROR== LaunchFailed ==ERROR== Failed to profile “range” in process 3162 ==PROF== Trying to shutdown target application ==ERROR== The application returned an error code (9). ==ERROR== An error occurred while trying to profile. ==WARNING== No ranges were profiled.

Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted.

Related topics

==ERROR== LaunchFailed
==ERROR== Failed to profile “range” in process 3162
==PROF== Trying to shutdown target application
==ERROR== The application returned an error code (9).
==ERROR== An error occurred while trying to profile.
==WARNING== No ranges were profiled.

Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.