Cannot profile on slurm environment

yuri_S · August 16, 2022, 4:25am

Hi,

I would like to use nsight compute with slurm on multi-node environment. I am using sbatch script and I added the command like below

srun ncu --target-processes all -o report_$OMPI_COMM_WORLD_RANK python parallel.py

but when it reached send(node0->node1) kernel, it cannot profile send kernel. I can’t see any error. It is just keep running but there’s no result for that kernel. Is there a way to profile on slurm environment?
(I can use nsight systems with the same way)

Thank you in advance.

felix_dt · August 16, 2022, 11:15am

From your description, it appears that the kernels in your application are communicating across process boundaries and require such communication in order to make forward progress. In your setup, multiple instances of Nsight Compute are launched, one per rank, but these instances won’t be communicating with each other. Since you’ve chosen the default set of metrics to collect, Nsight Compute will replay each kernel multiple times (multiple “passes”) in order to collect all the required metrics. Replaying kernels that require inter-process communication to complete won’t work, as the inter-process state won’t be restored between passes, and the passes won’t be synchronized across processes.

You can try several options, but all have limitations:

Collect selected GPU metric sampling with Nsight Systems, which doesn’t have the same replay requirements as Nsight Compute
Only collect single-pass metrics in Nsight Compute, e.g. gpc__cycles_elapsed.sum. You can use the tool’s section files or its --query-metrics functionality to find available metrics. You will need to try them in order to see if they can be collected in the single pass on your GPU architecture.
If that doesn’t work either, collect single-pass metrics but only for a selected MPI rank by executing a wrapper script, rather than your application directly, e.g.

To profile a single rank one can use a wrapper script. The following script (called “wrap.sh”) profiles rank 0 only:

#!/bin/bash
if [[ $OMPI_COMM_WORLD_RANK == 0 ]]; then
   ncu -o report_${OMPI_COMM_WORLD_RANK}  --target-processes all "$@"
else
   "$@"
fi

yuri_S · August 17, 2022, 1:06am

Thank you for your response. I’ll try to collect single-pass metrics in Nsight compute. I want to collect cache-related metrics so I hope these metrics are single-pass metrics.

and I have a questions about your awnser.
I am wondering what “$@” means in your code. do i need to write “python filename.py” instead of it?
I used your code in .sh script, but It couldn’t profile it. Furthermore, on report file name, I can’t see ${OMPI_COMM_WORLD_RANK} (like report_.ncu-rep).

++ I tried “srun ncu --target-processes all --replay-mode application -o report python filename.py”. It can profile even after send kernel, and I can get report.ncu-rep file. Is this way right?

Thank you.

Topic		Replies	Views
Using Nsight Compute (ncu) alongside srun Nsight Compute profiling	6	2997	April 24, 2023
Nsight compute can not profile when use openmpi and nvshmem in multi-gpus Nsight Compute cuda , kernel	3	1160	January 31, 2024
Nsight Compute with MPI: ‘No Kernels Were Profiled’ Warning and Hanging Issue Nsight Compute	3	123	March 31, 2025
Ncu profile file not created Nsight Compute	5	1114	September 1, 2021
Nsight Compute not reporting/profiling all kernels profiled by Nsight Systems Nsight Compute	9	570	March 27, 2024
Nsight Compute reporting "nan" for most values for Perlmutter profile Nsight Compute	2	1072	August 8, 2023
Multi Node Profiling with Nsight Systems Profiling Linux Targets	7	880	July 8, 2024
Nsight-compute print "the application returned an error code (249)" Nsight Compute	5	1460	February 13, 2023
Nsight Compute can't export the report Nsight Compute kernel	5	1472	July 20, 2022
How to profile a part of kernel function with Nsight Compute Nsight Compute	3	524	April 10, 2024

Cannot profile on slurm environment

Related topics