When I use nsys profile --gpu-metrics-device=0 python3 examples/test_ngram.py
, I can capture GPU and CPU information. However, CUDA information cannot be captured.
When I use nsys profile python3 examples/test_ngram.py
, I can capture CUDA and CPU information. But I cannot capture GPU information.
What is the issue here?
Can you tell me a little more about the system you are running on?
What is the result of running
nsys status -e
Are you running in a container?
Also, could you share the two report files?
Thank you very much for your reply. I have investigated this issue and found a solution. I was analyzing the performance of vllm v1 version. When I added --trace-fork-before-exec=true --cuda-graph-trace=node
, I was able to capture CUDA information. However, I have encountered a new problem. Please advise.
I didn’t get the information for python, I hope to correlate cuda and python calls.
nsys profile --trace-fork-before-exec=true --cuda-graph-trace=node --force-overwrite true --cudabacktrace=kernel --python-backtrace=cuda python3 test_ngram.py
report5 (4).nsys-rep.zip (13.7 MB)
Hello, thank you for your reply. I have updated the question. Please take a look
Timestamp counter supported: Yes
CPU Profiling Environment Check
Root privilege: enabled
Linux Kernel Paranoid Level = 2
Linux Distribution = Ubuntu
nsys status -e
Linux Kernel Version = 5.4.54-1.0.0.std7c.el7.2.x86_64: OK
Linux perf_event_open syscall available: OK
Sampling trigger event available: OK
Intel(c) Last Branch Record support: Not Available
CPU Profiling Environment (process-tree): OK
CPU Profiling Environment (system-wide): OK
running in a container
Intel(c) Last Branch Record support: Not Available
This line is suspicious, further inspection reveals my CPU model is
Intel(R) Xeon(R) Platinum 8458P does not support LBR. Maybe that’s the key point of the issue.
To get this, I think you’ll need to enable the Python sampling feature, see the following two CLI options that are related:
--python-sampling=
Possible values are 'true' or 'false'.
Sample Python backtrace.
Default is 'false'.
Note: This feature provides meaningful backtraces for Python processes.
When profiling Python-only workflows, consider disabling the CPU sampling option to reduce overhead.
--python-sampling-frequency=
Specify Python sampling frequency.
Minimum supported frequency is '1' (Hz).
Maximum supported frequency is '2000' (Hz).
Default is '1000' (Hz).