Nsight Systems 2025.3.1 Hangs at 99% on Windows When Using Python or directly with CUDA

Nsight Systems Version: 2025.3.1 (CLI and GUI)
Platform:** Windows 11 Pro, 23H2, Build 26100
GPU: RTX 3090
CUDA Version: 12.8.93
Python Version: 3.12 (venv)
Profiler Target: Python script launching CUDA kernel via PyTorch and CuPy, also a compiled exe version of that kernel
Storage Target: Samsung 970 PRO NVMe used as system cache/temp drive

Issue:
Profiling stalls consistently at 99% in CLI mode and GUI mode. .nsys-rep file is generated but workflow halts indefinitely before finalization. Reproduced across:

  • Different Python environments
  • Reduced cases with time.sleep() only
  • Tracing with --trace none, --trace cuda, or --trace nvtx
  • Explicit --stop-on-exit=true, --duration, and --force-overwrite

What I Tried:

  • Reinstalling Nsight Systems (tested 2025.2.1, 2025.3.1)
  • Switching output paths to different drives (F:\ vs C:)
  • Forcing CUDA syncs and GC cleanup
  • WSL2 workaround (works flawlessly under Ubuntu 24.04 w/ same script)
  • Checking file locks, flushes, and syscalls via ProcMon

Expected:
Clean exit and full report generation, even on trivial workloads.

Observed:
Hang at 99%, indefinite, no error messages.

Please advise if there is a known workaround, or environment variable

So basically this works on a linux target (or a “linux” target) but fails on a Windows target.

@dofek is our Windows expert, so hopefully he will comment.

When nsys is running, data is spit into the .qdstrm file, after analysis is complete, there is a finalization step that creates the .nsys-rep file. Are you saying that the .nsys-rep is created but does not load into the GUI on Windows or that you are not getting a valid .nsys-rep at all?

I have seen before when the GUI loading hung at 99% and did not complete, usually when the .nsys-rep file was quite large and the available RAM on the system was not large. How big is your result file? Can you give me the exact command line you are using?

Thanks for the very detailed issue report, by the way.

Thank you for your response! To clarify:

  1. I’m not getting a valid .nsys-rep file at all. The process hangs at 99% during the finalization step that creates the .nsys-rep file. The .qdstrm file is generated successfully, but the conversion to .nsys-rep never completes.
  2. The .qdstrm files are relatively small as my test applications are simple CUDA examples. I’ve tried with both minimal examples (a sleep script) and our actual application exe.
  3. Here are the exact command lines I’ve tried:
  4. Minimal test with just sleep:
nsys.exe profile --force-overwrite true -o F:\temp\test_min --trace none --duration 1 --stop-on-exit=true -- C:\projects\new_evaluator\.venv\Scripts\python.exe -c "import time; time.sleep(0.5)"
  1. Trying with different capture options:
nsys.exe profile -o profiling_results\test_profile --trace cuda --duration 5 --stop-on-exit=true -- C:\projects\new_evaluator\examples\stream_batch_example.exe

nsys.exe profile -o profiling_results\test_profile --trace cuda,nvtx --duration 5 --stop-on-exit=true -- C:\projects\new_evaluator\examples\stream_batch_example.exe

nsys.exe profile -o profiling_results\test_profile --trace cuda,nvtx,osrt --sample=process-tree --duration 5 --stop-on-exit=true -- C:\projects\new_evaluator\examples\stream_batch_example.exe
  1. Trying with direct export to different formats:
nsys.exe profile -o profiling_results\test_profile --export=sqlite --force-overwrite=true --trace cuda --duration 5 --stop-on-exit=true -- C:\projects\new_evaluator\examples\stream_batch_example.exe

nsys.exe profile -o profiling_results\test_profile --export=json --force-overwrite=true --trace cuda --duration 5 --stop-on-exit=true -- C:\projects\new_evaluator\examples\stream_batch_example.exe
  1. Trying to manually export the .qdstrm file after profiling:
nsys.exe export --force-overwrite true --type sqlite --output profiling_results\test_profile.sqlite profiling_results\test_profile.qdstrm

Can you try converting by hand using the qdstrmimporter utility?

There are details on that at User Guide — nsight-systems 2025.3 documentation (I know it looks like a general link, but that is direct to the CLI Troubleshooting section).

I’ve done some additional testing and found that the QdstrmImporter utility also hangs, but at 100% instead of 99%.

I confirmed that I can find the .qdstrm files in mytemp directory (f:\temp). These files are being generated successfully during profiling.

I tried using the QdstrmImporter utility directly with the command:

“C:\Program Files\NVIDIA Corporation\Nsight Systems 2025.3.1\host-windows-x64\QdstrmImporter.exe” -i “F:\temp\nsys-report-09da.qdstrm” -o “profiling_results\converted_09da.nsys-rep” -f

The QdstrmImporter process starts and shows progress, but then hangs at 100% completion. I tried multiple .qdstrm files with the same result.

@ushomroni - Doron is out on vacation now, do you have any thoughts here.

@fake-news - Does the qdstrmimporter failure happen on linux as well? It would be a stupid work around, but it might work.