Nsight Systems Version: 2025.3.1 (CLI and GUI)
Platform:** Windows 11 Pro, 23H2, Build 26100
GPU: RTX 3090
CUDA Version: 12.8.93
Python Version: 3.12 (venv)
Profiler Target: Python script launching CUDA kernel via PyTorch and CuPy, also a compiled exe version of that kernel
Storage Target: Samsung 970 PRO NVMe used as system cache/temp drive
Issue:
Profiling stalls consistently at 99% in CLI mode and GUI mode. .nsys-rep file is generated but workflow halts indefinitely before finalization. Reproduced across:
Different Python environments
Reduced cases with time.sleep() only
Tracing with --trace none, --trace cuda, or --trace nvtx
Explicit --stop-on-exit=true, --duration, and --force-overwrite
What I Tried:
Reinstalling Nsight Systems (tested 2025.2.1, 2025.3.1)
Switching output paths to different drives (F:\ vs C:)
Forcing CUDA syncs and GC cleanup
WSL2 workaround (works flawlessly under Ubuntu 24.04 w/ same script)
Checking file locks, flushes, and syscalls via ProcMon
Expected:
Clean exit and full report generation, even on trivial workloads.
Observed:
Hang at 99%, indefinite, no error messages.
Please advise if there is a known workaround, or environment variable
So basically this works on a linux target (or a “linux” target) but fails on a Windows target.
@dofek is our Windows expert, so hopefully he will comment.
When nsys is running, data is spit into the .qdstrm file, after analysis is complete, there is a finalization step that creates the .nsys-rep file. Are you saying that the .nsys-rep is created but does not load into the GUI on Windows or that you are not getting a valid .nsys-rep at all?
I have seen before when the GUI loading hung at 99% and did not complete, usually when the .nsys-rep file was quite large and the available RAM on the system was not large. How big is your result file? Can you give me the exact command line you are using?
Thanks for the very detailed issue report, by the way.
I’m not getting a valid .nsys-rep file at all. The process hangs at 99% during the finalization step that creates the .nsys-rep file. The .qdstrm file is generated successfully, but the conversion to .nsys-rep never completes.
The .qdstrm files are relatively small as my test applications are simple CUDA examples. I’ve tried with both minimal examples (a sleep script) and our actual application exe.
Apologies for the late reply. @fake-news can you share the qdstrm file? If you’re uncomfortable loading the file to this open forum then you can send it via email it to devtools-support@nvidia.com