Nsys hangs frequently when run in parallel

uday1 · October 21, 2022, 2:26am

After switching to nsight-systems-2022.4.2 (from a 2022.1 version) on an Ubuntu 20.04.5 LTS installed via NVIDIA’s apt repo, I’ve noticed my tests that run several instances of nsys in parallel (from a typical cmake/make target) now lead to a situation where some of the nsys instances just hang: they run forever and are at close to 100% CPU utilization. This happens about 50% of the time:

(output of top)

4078426 user-+ 20 0 5436512 134672 24776 S 100.3 0.1 45:54.47 nsys
3881769 user-+ 20 0 5526624 211676 26144 S 100.0 0.2 100:00.58 nsys

Never had this issue with 2022.1.
Another 20.04 LTS system that has a similar configuration with nsys 2022.3.4.34-133b775 doesn’t have this issue as well.
I also tried the latest available version as of today from Getting Started with Nsight Systems | NVIDIA Developer downloaded and installed via the run file (NsightSystems-linux-public-2022.4.1.21-0db2c85.run), and it exhibits the same issue.
I noticed that the issue persists even when running a single instance – it’s just that running concurrently increases the chances of encountering it, so the former happens less frequently.

This appears like an issue with nsys 2022.4. I like the fact that nsys 2022.4 now supports reporting thread block sizes with GPU kernel summary rows, and I’d like to use it if it’s free of any issues.

Package: nsight-systems-2022.4.2
Version: 2022.4.2.1-df9881f

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05    Driver Version: 520.61.05    CUDA Version: 11.8     |

"Ubuntu 20.04.5 LTS"

Product Name                          : NVIDIA GeForce RTX 3090

hwilper · October 21, 2022, 2:27pm

@liuyis can you take a look at this when you get a chance?

liuyis · October 21, 2022, 3:06pm

Hi @uday1, what’s the Nsys command you used? Do you have the terminal outputs from Nsys when it’s hanging?

uday1 · October 21, 2022, 3:31pm

It’s an nsys profile followed by an nsys stats for each of those instances:

$ nsys profile --force-overwrite=true -o gpu_ <command> && echo "Perf: ... kernel takes `nsys stats --format csv --report gpukernsum --timeunit=msec gpu_.nsys-rep  | grep copy_global_memref_kernel | cut -f 2 -d ','` ms"

liuyis · October 21, 2022, 3:39pm

Could you try the following to see if the issue repro or not (this could help us locate which part went wrong):

Remove the subsequent nsys stats
nsys profile -t cuda -s none --cpuctxsw=none --force-overwrite=true -o gpu_ <command>
nsys profile -t osrt -s none --cpuctxsw=none --force-overwrite=true -o gpu_ <command>
nsys profile -t nvtx,opengl -s none --cpuctxsw=none --force-overwrite=true -o gpu_ <command>

Also, it is possible to share a report (nsys-rep file) that you captured on a successful run?

uday1 · October 21, 2022, 3:59pm

Sure, I am happy to share the report. I’ll have to switch back to the 2022.4 version and experiment with your suggestions - I should be able to get back in a day.

uday1 · October 23, 2022, 12:05am

Tracing the CUDA APIs is all I’m interested in, and I can confirm that:

The issue is reproducible with -t cuda -s none added as well.
The issue is not reproducible with -t cuda -s none --cpuctxsw=none.

I assume the latter set of flags that work are sufficient for my purpose.

A report of a successful run is attached.
gpu_.nsys-rep (312.8 KB)

liuyis · October 24, 2022, 3:38am

Thanks for sharing the information, glad we’ve got a WAR for your use case. For further investigation, is it possible to set up the appliation on our side, so we can reproduce the issue and debug?

If that’s not possible, could you help collecting debugging logs with the following steps:

Save the following content to nvlog.config:

+ 75iwef global

- quadd_verbose_

$ /tmp/nsight-sys.log

ForceFlush

Format $sevc$time|${name:0}|${tid:5}|${file:0}:${line:0}[${sfunc:0}]:$text

Add NVLOG_CONFIG_FILE=<path to 'nvlog.config'> to your Nsys CLI command line, for example NVLOG_CONFIG_FILE=/tmp/nvlog.config nsys profile --force-overwrite=true -o gpu_ <command>.
Run the command as usual, and if it works as expected, there should be a log file at /tmp/nsight-sys.log. Share the file to us and we will try to figure out why it could hang.
If you are running multiple instances, it will be best if you can only append NVLOG_CONFIG_FILE=<path to 'nvlog.config'> to one of the instances, otherwise the logs will be mixed and it will be harder to investigate. Also, make sure the log is collected on an instance where the hanging did happen.

Thanks!

Topic		Replies	Views
Nsys hangs when profile cuda applications Profiling Linux Targets	10	1140	March 8, 2024
Nsys hangs when profiling any cuda process Profiling Linux Targets cuda	1	311	August 11, 2025
NCU and Nsys hangs Indefinitely Profiling Linux Targets	2	159	March 27, 2025
Nsys hanging on slurm cluster Profiling Linux Targets hpc	6	1458	January 18, 2023
The nsys profile xxx (xxx. py, vllm command, serve command) has been stuck forever and there is no output Profiling Linux Targets	14	1335	August 20, 2025
Starting >5 nsys instances in parallel results in "Agent launcher failed" Profiling Linux Targets	8	1028	March 24, 2023
Nsight system fails to connect to daemon Profiling Linux Targets	25	3073	April 12, 2023
Nsight system hangs Profiling Linux Targets tensorflow , nsight	0	641	July 8, 2020
Nsight Systems 2025.3.1 Hangs at 99% on Windows When Using Python or directly with CUDA Profiling x86 Windows Targets	8	473	May 25, 2025
Nv-nsight-cu-cli hangs on any binary Nsight Compute	8	1162	September 24, 2021

Nsys hangs frequently when run in parallel

Related topics