No matter if I use nsys profile, any Python file, vllm command, or sglang command, when the command returns, it will always get stuck and there will be no output. I hope you can help solve it! Thank you. Here are some specific examples:
Firstly, analyze the model inference of a VLLM engine
nsys profile vllm serve /mnt/workspace/Qwen(I have no problem running vllm serve/mnt/workspace/Qwen directly with this command, but as long as I add nsys profile, it will keep getting stuck)
In order to reduce the impact of Python file content on nsys blocking, I created a new main.exe with only print (1) content, as shown below:
In addition, my nsys status - e can output normally, but it only outputs CPU information without GPU information (my nvidia smi command can display GPU information correctly). The content is as follows:
bash
root@notebook-tianhangyao-benchmarksyth-prd-pre:/mnt/workspace# nsys status -e
Timestamp counter supported: Yes
CPU Profiling Environment Check
Root privilege: enabled
Linux Kernel Paranoid Level = 2
Linux Distribution = Ubuntu
Linux Kernel Version = 5.10.134-16.3.al8.x86_64: OK
Linux perf_event_open syscall available: OK
Sampling trigger event available: OK
Intel(c) Last Branch Record support: Available
CPU Profiling Environment (process-tree): OK
CPU Profiling Environment (system-wide): OK
See the product documentation at Nsight Systems — Nsight Systems for more information,
root@notebook-tianhangyao-benchmarksyth-prd-pre:/# nsys version
NVIDIA Nsight Systems version 2025.3.1.90-253135822126v0
vllm running time:
The vllm command I ran is “vllm serve /usr/local/models/Qwen2.5-7B-Instruct/qwen/Qwen2.5-7B-Instruct --tensor-parallel-size 1 --host 127.0.0.1 --port 8000”. In addition, I used a mobile phone stopwatch to record the running time, which was 36.29 seconds from the start of the command to the successful launch.
To improve efficiency, I would like to add some of my configurations as follows:
If 2025.5.1 still doesn’t work, could you capture logs for us to debug?
Save the following content to /tmp/nvlog.config
+ 100iwef global
$ /tmp/nvlogs/nsight-sys-${pid}.log
ForceFlush
Format $sevc$time|${name:0}|PID${pid:0}|TID${tid:0}|${file:0}:${line:0}[${sfunc:0}]:$text
mkdir /tmp/nvlogs
Set the environment variable NVLOG_CONFIG_FILE=/tmp/nvlog.config when running Nsys. For example:
@liuyis Hello, I followed your instructions and ran the command `NVLOG_CONFIG_FILE=/tmp/nvlog.config nsys profile ./main.py`. The `main.py` simply outputs `1` when executed. I have packaged the log files and would appreciate your help in reviewing and resolving this issue.
Does it mean the hanging issue does not reproduce when you enable logs? Because your original post mentioned that even this simple script will get stuck under Nsys.
Or does it mean that 2025.5.1 has fixed the hanging issue you were hitting with 2025.3.1?
@liuyis Oh, no!blame me for describing it wrong (actually a problem with the translation software), I mean the content of main.py is print(1), but it is still blocked, the logs are already packaged, I hope you can help me take a look, yesterday I copied the logs to gpt, let me reinstall, after many attempts, it still doesn’t work.
Thanks for the information. Somehow the logs you shared seemed incomplete, I can only see 2 log files in the package but normally there should be more. Could you repeat the steps and double check if more logs are generated?
Also, there are a few more experiments to try to locate the root cause:
nsys profile echo 0
nsys profile -t osrt python3 ./main.py
nsys profile -t cuda python3 ./main.py
nsys profile -t nvtx python3 ./main.py
Could you try them and share if each of them causes hanging?
NVLOGVNet FILE=/tmp/nvlog. config nsys profile python3./main.Py still blocks after running this command, and only 2 log files are generated in/tmp/nvlogs
root@notebook-tianhangyao-benchmarksyth-prd-pre :/tmp # nsys profile echo 0 keeps blocking and no logs are generated
root@notebook-tianhangyao-benchmarksyth-prd-pre :/tmp # nsys profile - t osrt python3./main.cy has been blocking and no logs have been generated
root@notebook-tianhangyao-benchmarksyth-prd-pre :/mnt/workspace # nsys profile - t cuda python3./main.Py has been blocking and no logs have been generated
nsys profile - t nvtx python3./main.Py has been blocking and no logs have been generated
I waited for about 1 minute for each of the above commands
In addition, I saw error messages in the logs stating that ‘File’ nsys config. ini 'is not found and so on. Is this the cause of these errors? How can I solve it?
That’s very strange. What about nsys profile -t none python3 ./main.py? If it also blocks, what about nsys profile -t none -s none –cpuctxsw=none python3 ./main.py?
In addition, I saw error messages in the logs stating that ‘File’ nsys config. ini 'is not found and so on. Is this the cause of these errors? How can I solve it?
Not really, that’s some internal issue but should not be causing the hanging you are observing.
Nsys profile - t none - s none - cpuctxsw=none python3./main.Py is still blocked (I have been waiting for over a minute)and no logs are generated. Is it a problem with my node? I am currently using nsys in a company’s container (pod node)
Thanks for additional the information, that’s helpful. By analyzing the experiments you’ve done, and checking the logs you shared, I found the issue is related to boost::child. Specifically, an on_exit handler (Global on_exit - 1.66.0) that we registered never got invoked on your system, and that prevented Nsys from moving forward.
Apparently the issue only happens on the specific system you are using. I’m not very sure why it’s happening.
I created a simple test app for the boost::child and on_exit issue:
Could you run it on your system and share the results? That can help us understand if the same issue could reproduce with the simple app, and if that reproduces, you can possibly use it to debug why it’s not working on this particular system.
Thank you. Unfortunately that means the issue isn’t reproducing with the simple test app, so it’s not a generic issue with boost::child and could be specific to the usage in Nsys.
Is it possible for us to access the system to debug further?