Missing CUDA runtime events from nsys report

rajeshshashikumar · April 14, 2025, 10:05pm

I’m using nsys (2025.2) to profile an application within apptainer (a containerization framework).

I can see that the GPU clock, DRAM bandwidth and other key metrics are being collected but there is no CUDA kernel information being tracked. What could be the reason for this?

My command line reads the following:

nsys profile \
  --cpu-core-metrics=0,2 \
  --gpu-metrics-devices=all \
  --cuda-um-cpu-page-faults=true \
  --cuda-um-gpu-page-faults=true \
  --event-sample=system-wide \
  -- \
  python benchmark_serving.py \
    --backend vllm \
    --dataset-name sharegpt \
    --dataset-path ./ShareGPT_V3_unfiltered_cleaned_split.json \
    --model neuralmagic/Meta-Llama-3.1-70B-Instruct-FP8 \
    --num-prompts 1000 \
    --endpoint /v1/completions \
    --tokenizer neuralmagic/Meta-Llama-3.1-70B-Instruct-FP8 \
    --save-result

hwilper · April 15, 2025, 5:07pm

@liuyis can you take a look a tthis.

liuyis · April 15, 2025, 6:53pm

Hi @rajeshshashikumar, you mentioned apptainer (a containerization framework), does it mean a container is being spawned and the actual application runs inside the container? If that is the case, then Nsys needs to run inside the container as well in order to get CUDA trace data, because Nsys needs to injection the target application process to get those data.

If that’s not the case, could you share the report file?

rajeshshashikumar · April 17, 2025, 6:53pm

@liuyis , Yes I am running nsys inside the cotnainer not from the outside. I still am not able to view the tracked CUDA trace data. Here’s the setup file for the container

Here is the attached nsys-report file
report2.nsys-rep.zip (75.8 MB)

liuyis · April 17, 2025, 7:17pm

Thanks for sharing the report. From the report, I can see that there were CUDA activities happening in process 3349650. However, this process was not launched by Nsys. The process launched by Nsys was PID 3350175, marked as green on the timeline.

In order for Nsys to capture the CUDA API & Kernel traces from a process, the process has to be launched by Nsys, because Nsys needs to inject it at launch time.

I assume process 3349650 is like a backgroud server process and 3350175 sends commands to the background process and triggers CUDA workload in it. Is there a way you can launch the background process 3349650 with Nsys as well?

rajeshshashikumar · April 17, 2025, 7:57pm

Does the above flag specify to capture all system activity?

Thank you, I will try to do that. But is there a way to attach nsys to a specific PID? I could not find that in the documentation

liuyis · April 17, 2025, 10:39pm

Not really, this only enables the “event sampling” feature, which indirectly enabled the “CPU sampling” feature and that’s why you can see the background Python process and the callstack in my screenshot. However, for trace features like CUDA trace, OSRT trace, there is no system-wide support and the process has to be launched by Nsys.

But is there a way to attach nsys to a specific PID? I could not find that in the documentation

Nsys does not support attaching to running processes. You’ll need to launch the process through Nsys, i.e. something like nsys profile --trace=osrt,cuda <the background app that run CUDA workload

system · May 1, 2025, 10:39pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Nsys can't capture anything (cuda programs only) Profiling Linux Targets	14	150	July 10, 2025
If nsys has an option similar to ‘–profile-all-processes’?(Not getting cuda information from child processes on Linux Profiling Linux Targets nsight	8	2016	July 12, 2024
Nsys cannot capture cuda information Profiling DRIVE Targets	9	270	April 21, 2025
Generating CUPTI_* tables with nsys Profiling Linux Targets cuda	25	1747	January 12, 2023
How to profile all CUDA activity on a system Profiling x86 Windows Targets	6	1810	November 1, 2022
Nsys Does not Show the kernels output Profiling Embedded Targets	21	3343	October 20, 2022
Nsys Does not Track CUDA Api events Profiling Linux Targets	5	1121	December 22, 2022
Nsys doesn't show cuda kernel and memory data Profiling Linux Targets cuda , kernel	10	465	December 7, 2024
Nsight nsys not collecting any CUDA kernel data (2023.1.2.43-32377213v0) Profiling Linux Targets	19	2765	September 14, 2023
Nsys hangs when profile cuda applications Profiling Linux Targets	10	988	March 8, 2024

Missing CUDA runtime events from nsys report

Related topics