The nsys profile xxx (xxx. py, vllm command, serve command) has been stuck forever and there is no output

No matter if I use nsys profile, any Python file, vllm command, or sglang command, when the command returns, it will always get stuck and there will be no output. I hope you can help solve it! Thank you. Here are some specific examples:

  1. Firstly, analyze the model inference of a VLLM engine

nsys profile vllm serve /mnt/workspace/Qwen(I have no problem running vllm serve/mnt/workspace/Qwen directly with this command, but as long as I add nsys profile, it will keep getting stuck)

In order to reduce the impact of Python file content on nsys blocking, I created a new main.exe with only print (1) content, as shown below:

  1. In addition, my nsys status - e can output normally, but it only outputs CPU information without GPU information (my nvidia smi command can display GPU information correctly). The content is as follows:

bash

root@notebook-tianhangyao-benchmarksyth-prd-pre:/mnt/workspace# nsys status -e
Timestamp counter supported: Yes
CPU Profiling Environment Check
Root privilege: enabled
Linux Kernel Paranoid Level = 2
Linux Distribution = Ubuntu
Linux Kernel Version = 5.10.134-16.3.al8.x86_64: OK
Linux perf_event_open syscall available: OK
Sampling trigger event available: OK
Intel(c) Last Branch Record support: Available
CPU Profiling Environment (process-tree): OK
CPU Profiling Environment (system-wide): OK
See the product documentation at Nsight Systems — Nsight Systems for more information,

Okay, a few questions.

What version of Nsys are you running? How long does the vllm command run?

OK!

nsys version:

root@notebook-tianhangyao-benchmarksyth-prd-pre:/# nsys version

NVIDIA Nsight Systems version 2025.3.1.90-253135822126v0

vllm running time:

The vllm command I ran is “vllm serve /usr/local/models/Qwen2.5-7B-Instruct/qwen/Qwen2.5-7B-Instruct --tensor-parallel-size 1 --host 127.0.0.1 --port 8000”. In addition, I used a mobile phone stopwatch to record the running time, which was 36.29 seconds from the start of the command to the successful launch.

To improve efficiency, I would like to add some of my configurations as follows:

1. GPU configuration

2.

2.Both vllm and sglang that I am using are the latest branches from their official websites.

@liuyis can you take a look

Hi @2014139571 , since it hangs even for a trivial python script, it might be related to a specific system cofig.

Just in case, our latest public release is 2025.5.1, could you try if there’s any difference from it? You can download from Nsight Systems - Get Started | NVIDIA Developer

If 2025.5.1 still doesn’t work, could you capture logs for us to debug?

  1. Save the following content to /tmp/nvlog.config
+ 100iwef   global
$ /tmp/nvlogs/nsight-sys-${pid}.log
ForceFlush
Format $sevc$time|${name:0}|PID${pid:0}|TID${tid:0}|${file:0}:${line:0}[${sfunc:0}]:$text

  1. mkdir /tmp/nvlogs

  2. Set the environment variable NVLOG_CONFIG_FILE=/tmp/nvlog.config when running Nsys. For example:

  • NVLOG_CONFIG_FILE=/tmp/nvlog.config nsys profile xxx
  1. Package the folder /tmp/nvlogs and share back with us.

@liuyis Hello, I followed your instructions and ran the command `NVLOG_CONFIG_FILE=/tmp/nvlog.config nsys profile ./main.py`. The `main.py` simply outputs `1` when executed. I have packaged the log files and would appreciate your help in reviewing and resolving this issue.

nvlogs.tar.gz (2.8 KB)

我的节点配置如下:

我的安装方式如下:

(1)从官网下载

(2)sudo dpkg -i nsight-systems-2025.5.1_2025.5.1.121-1_amd64.deb

Hi @2014139571 ,

The `main.py` simply outputs `1` when executed.

Does it mean the hanging issue does not reproduce when you enable logs? Because your original post mentioned that even this simple script will get stuck under Nsys.

Or does it mean that 2025.5.1 has fixed the hanging issue you were hitting with 2025.3.1?

@liuyis Oh, no!blame me for describing it wrong (actually a problem with the translation software), I mean the content of main.py is print(1), but it is still blocked, the logs are already packaged, I hope you can help me take a look, yesterday I copied the logs to gpt, let me reinstall, after many attempts, it still doesn’t work.

Thanks for the information. Somehow the logs you shared seemed incomplete, I can only see 2 log files in the package but normally there should be more. Could you repeat the steps and double check if more logs are generated?

image

Also, there are a few more experiments to try to locate the root cause:

  1. nsys profile echo 0
  2. nsys profile -t osrt python3 ./main.py
  3. nsys profile -t cuda python3 ./main.py
  4. nsys profile -t nvtx python3 ./main.py

Could you try them and share if each of them causes hanging?

@liuyis hello!

  1. NVLOGVNet FILE=/tmp/nvlog. config nsys profile python3./main.Py still blocks after running this command, and only 2 log files are generated in/tmp/nvlogs

logFile

  1. root@notebook-tianhangyao-benchmarksyth-prd-pre :/tmp # nsys profile echo 0 keeps blocking and no logs are generated

  2. root@notebook-tianhangyao-benchmarksyth-prd-pre :/tmp # nsys profile - t osrt python3./main.cy has been blocking and no logs have been generated

  3. root@notebook-tianhangyao-benchmarksyth-prd-pre :/mnt/workspace # nsys profile - t cuda python3./main.Py has been blocking and no logs have been generated

  4. nsys profile - t nvtx python3./main.Py has been blocking and no logs have been generated

I waited for about 1 minute for each of the above commands

  1. In addition, I saw error messages in the logs stating that ‘File’ nsys config. ini 'is not found and so on. Is this the cause of these errors? How can I solve it?

Thanks!

That’s very strange. What about nsys profile -t none python3 ./main.py? If it also blocks, what about nsys profile -t none -s none –cpuctxsw=none python3 ./main.py?

In addition, I saw error messages in the logs stating that ‘File’ nsys config. ini 'is not found and so on. Is this the cause of these errors? How can I solve it?

Not really, that’s some internal issue but should not be causing the hanging you are observing.

@liuyis Excuse me!

Nsys profile - t none - s none - cpuctxsw=none python3./main.Py is still blocked (I have been waiting for over a minute)and no logs are generated. Is it a problem with my node? I am currently using nsys in a company’s container (pod node)

Thanks for additional the information, that’s helpful. By analyzing the experiments you’ve done, and checking the logs you shared, I found the issue is related to boost::child. Specifically, an on_exit handler (Global on_exit - 1.66.0) that we registered never got invoked on your system, and that prevented Nsys from moving forward.

Apparently the issue only happens on the specific system you are using. I’m not very sure why it’s happening.

I created a simple test app for the boost::child and on_exit issue:

boost_child_on_exit_test.tar.gz (307.2 KB)

Could you run it on your system and share the results? That can help us understand if the same issue could reproduce with the simple app, and if that reproduces, you can possibly use it to debug why it’s not working on this particular system.

@liuyis Hello, I run boost_child_on_exit_test, results:

Thank you. Unfortunately that means the issue isn’t reproducing with the simple test app, so it’s not a generic issue with boost::child and could be specific to the usage in Nsys.

Is it possible for us to access the system to debug further?