Nsight systems failed generated report file

I’m encountering a problem with Nsight Systems CLI (version 2025.6.1.1-ubuntu22.04). When profiling my application, one or two processes consistently fail to generate report files (.nsys-rep) on each run.

Has anyone encountered this issue?

The log output is as follows:

[1/1] [0% ] profile_0_13623.nsys-repProcessing events…
[1/1] [0% ] profile_0_13621.nsys-repProcessing events…
[1/1] [0% ] profile_0_13621.nsys-repProcessing events…
[1/1] [0% ] profile_0_13622.nsys-repProcessing events…
[1/1] [===22% ] profile_0_13623.nsys-repProcessing events…
[1/1] [0% ] profile_0_13618.nsys-repProcessing events…
[1/1] [0% ] profile_0_13624.nsys-repProcessing events…
[1/1] [========================100%] profile_0_13623.nsys-rep
Generated:
/mnt/data/profiler/20260107/profile_0_13623.nsys-rep
[1/1] [========================100%] profile_0_13621.nsys-rep
Generated:
/mnt/data/profiler/20260107/profile_0_13621.nsys-rep
[1/1] [6% ] profile_0_13619.nsys-repProcessing events…
[1/1] [========================100%] profile_0_13617.nsys-rep
Generated:
/mnt/data/profiler/20260107/profile_0_13617.nsys-rep
[1/1] [========================100%] profile_0_13619.nsys-rep
Generated:
/mnt/data/profiler/20260107/profile_0_13619.nsys-rep
[1/1] [========================100%] profile_0_13622.nsys-rep
Generated:
/mnt/data/profiler/20260107/profile_0_13622.nsys-rep
[1/1] [========================100%] profile_0_13618.nsys-rep
Generated:
/mnt/data/profiler/20260107/profile_0_13618.nsys-rep
W0107 10:24:32.007000 140200798426240 torch/distributed/elastic/multiprocessing/api.py:858] Sending process 13618 closing signal SIGTERM
W0107 10:24:32.008000 140200798426240 torch/distributed/elastic/multiprocessing/api.py:858] Sending process 13620 closing signal SIGTERM
W0107 10:24:32.008000 140200798426240 torch/distributed/elastic/multiprocessing/api.py:858] Sending process 13624 closing signal SIGTERM
/data/develop/quadd_int/QuadD/Common/AgentAPI/Src/SessionImpl.cpp(21): rpc Shutdown(.Agent.ShutdownRequest) returns (.Agent.EmptyMessage);
is canceled because the timeout period is expired (30 sec).
/data/develop/quadd_int/QuadD/Common/AgentAPI/Src/SessionImpl.cpp(21): rpc Shutdown(.Agent.ShutdownRequest) returns (.Agent.EmptyMessage);
is canceled because the timeout period is expired (30 sec).

What is the exact command line you are using?

I’m running 8 processes on an H20 machine, each profiled with nsys using the following command:

nsys profile --start-later=false --capture-range=cudaProfilerApi
–trace cuda,nvtx,osrt --pytorch=autograd-nvtx,functions-trace
–trace-fork-before-exec=true --nic-metrics=true python test.py

In test.py, I use torch.cuda.profiler.start() and torch.cuda.profiler.stop()
to control the profiling collection.

@Guy_Sz can you please help with this.

The issue looks very similar to - Nsight systems profiler causes application crash after running for a while
I will update here once I can suggest a fix/workaround to the latter.

@ywsample it could be helpfule if you could verify that the issue doesn’t persist without --nic-metrics=true