Nsys cannot capture cuda information

When I use nsys profile --gpu-metrics-device=0 python3 examples/test_ngram.py, I can capture GPU and CPU information. However, CUDA information cannot be captured.
When I use nsys profile python3 examples/test_ngram.py, I can capture CUDA and CPU information. But I cannot capture GPU information.
What is the issue here?

Can you tell me a little more about the system you are running on?

What is the result of running

nsys status -e

Are you running in a container?

@liuyis

Also, could you share the two report files?

Thank you very much for your reply. I have investigated this issue and found a solution. I was analyzing the performance of vllm v1 version. When I added --trace-fork-before-exec=true --cuda-graph-trace=node, I was able to capture CUDA information. However, I have encountered a new problem. Please advise.

I didn’t get the information for python, I hope to correlate cuda and python calls.
nsys profile --trace-fork-before-exec=true --cuda-graph-trace=node --force-overwrite true --cudabacktrace=kernel --python-backtrace=cuda python3 test_ngram.py


report5 (4).nsys-rep.zip (13.7 MB)


python-backtrace is the information I hope to obtain

Hello, thank you for your reply. I have updated the question. Please take a look

Timestamp counter supported: Yes

CPU Profiling Environment Check

Root privilege: enabled

Linux Kernel Paranoid Level = 2

Linux Distribution = Ubuntu
nsys status -e

Linux Kernel Version = 5.4.54-1.0.0.std7c.el7.2.x86_64: OK

Linux perf_event_open syscall available: OK

Sampling trigger event available: OK

Intel(c) Last Branch Record support: Not Available

CPU Profiling Environment (process-tree): OK

CPU Profiling Environment (system-wide): OK

running in a container

Intel(c) Last Branch Record support: Not Available
This line is suspicious, further inspection reveals my CPU model is
Intel(R) Xeon(R) Platinum 8458P does not support LBR. Maybe that’s the key point of the issue.

To get this, I think you’ll need to enable the Python sampling feature, see the following two CLI options that are related:

        --python-sampling=

           Possible values are 'true' or 'false'.
           Sample Python backtrace.
           Default is 'false'.
           Note: This feature provides meaningful backtraces for Python processes.
           When profiling Python-only workflows, consider disabling the CPU sampling option to reduce overhead.

        --python-sampling-frequency=

           Specify Python sampling frequency.
           Minimum supported frequency is '1' (Hz).
           Maximum supported frequency is '2000' (Hz).
           Default is '1000' (Hz).