"Missing Data" Issues in Nsight Systems Profiling


I have been using Nsight Systems for profiling my application, but recently I’ve encountered a problem where I get frequent “Missing Data” segments in my profiling timeline, which didn’t occur before. I’ve attached a screenshot for reference:

Details:

  • GPU: NVIDIA RTX 6000
  • Nsight Systems Version: Version 2023.2.2.0
  • OS: Ubuntu 22.04.3 LTS
  • Application: Python-based workload

Steps Taken So Far:

  1. Updated Drivers and Nsight Systems: Ensured both are the latest versions.
  2. Checked System Resources: Verified that there is no significant CPU, memory, or disk I/O contention during profiling.

Despite these steps, the issue persists. The frequent “Missing Data” makes it challenging to analyze the performance of my application effectively.

what can be the solution or problem in my situation?

Any insights or suggestions would be greatly appreciated.

Thank you!

@pkovalenko for the gpu metrics question.

When was the last time you didn’t encounter this issue? What changed since then - Nsys, GPU driver, the workload? Can share a minimal workload that reproduces the issue?

2023.2.2.0 is not the latest Nsys version, but I believe using the latest one won’t be enough.

A week ago, I ran the summarization stage using Mamba’s inferenc), and it worked fine.
However, this week I tried running the generation stage and encountered an error.

Now, even when I run the summarization stage again, the problem persists.
It seems like there might be an issue with the sampling buffer or some leftover data somewhere. Is there a way to check for this?

for additional information, this code runs porfiling on nsys
/usr/local/cuda/bin/nsys profile -t cuda,nvtx --sample=none --cpuctxsw=none --gpuctxsw=false --cuda-graph-trace=graph --gpu-metrics-device $GPU --force-overwrite=true --stats=true --output=$O_FILE /home/members/mxcim/miniconda3/envs/mamba_IJ/bin/python '/home/members/mxcim/mamba/benchmarks/benchmark_generation_mamba_simple.py' --model-name "state-spaces/mamba2-2.7b" --batch 1

Could you please confirm if the issue is reproducible with the latest Nsys from here? Nsight Systems - Get Started | NVIDIA Developer

If it is, please collect and upload a self report ({O_FILE}.nvtx):

$ sudo nsys profile -s cpu -t nvtx,osrt --cpuctxsw=system-wide -o ${O_FILE} -f true nsys profile -s none -t none --cpuctxsw=none --gpu-metrics-device=$GPU -o ${O_FILE}.nvtx -f true /home/members/mxcim/miniconda3/envs/mamba_IJ/bin/python '/home/members/mxcim/mamba/benchmarks/benchmark_generation_mamba_simple.py' --model-name "state-spaces/mamba2-2.7b" --batch 1