Nsys can't capture anything (cuda programs only)

chenhongyu2048 · July 4, 2025, 6:22pm

Hello everyone, I hope to ask for help with the problem I encountered:
I wrote a piece of cuda code, compiled it with nvcc and hope to use nsys to profile it. However, nsys did not capture anything, and the analysis column showed:

No NVTX events collected. Does the process use NVTX?
No CUDA events collected. Does the process use CUDA?
No OS runtime libraries events collected. Does the process use OS runtime libraries?

I know clearly that my code has launched cuda kernels successfully, because the cudaevent timer has worked, and I also successfully profiled the kernel through ncu.
Strangely, I can only see a blank in the nsys timeline, not even a CPU trace.

But when I changed my execution program to python xxxx.py (which does some GPU operations through pytorch), profiling succeeded.

I’m currently using ubuntu24, cuda12.8, nsight-systems-2024.6.2, Driver Version: 570.133.20.
I can confirm that nsys was still working fine (for the exact same cuda program) just a day ago. But since I’m sharing a server with others, I have no way of knowing if they’ve made some changes to the system.

nsys status --all’s output:

Timestamp counter supported: Yes

CPU Profiling Environment Check
Root privilege: disabled
Linux Kernel Paranoid Level = 4
Linux Distribution = Ubuntu
Linux Kernel Version = 6.11.0-26-generic: OK
Linux perf_event_open syscall available: Fail
Sampling trigger event available: Fail
Intel(c) Last Branch Record support: Not Available
CPU Profiling Environment (process-tree): Fail
CPU Profiling Environment (system-wide): Fail

Network Profiling Environment Check
OFED version: Not Available
Network features' library dependencies: Fail

I would like to ask what I should do to check this problem? That is, nsys can profile a python program, but cannot profile a cuda executable file?

chenhongyu2048 · July 7, 2025, 3:13am

supplement: in the nsys-rep file’s diagnostics summary part, I can’t see anything like this:

Injection 177939 00:00.136 Common injection library initialized successfully.
Injection 177939 00:00.142 OS runtime libraries injection initialized successfully.
Analysis 00:02.465 Scheduling information is absent. The thread activity is deduced based on OS runtime libraries traces. This is inaccurate and does not take into account asynchronous interrupts and exception faults.
Analysis 177939 00:02.465 Number of NVTX events collected: 21.
Analysis 177939 00:02.465 Number of CUDA events collected: 2,360.
Analysis 177939 00:02.465 Number of OS runtime libraries events collected: 5,287.
Injection 177939 00:03.995 Buffers holding CUDA trace data will be flushed on CudaProfilerStop() call. See --flush-on-cudaprofilerstop to control this behavior.
Injection 177939 00:04.006 Loaded CUPTI library: /usr/local/cuda-12.8/nsight-systems-2024.6.2/target-linux-x64/libcupti.so.12.8
Injection 177939 00:04.245 CUDA injection initialized successfully.
Injection 177939 00:05.051 NVTX injection initialized successfully.
Injection 177939 00:06.464 Number of CUPTI events produced: 2,478, CUPTI buffers: 50.**strong text**

The above is from a python process that I successfully profiled, but none of the above injection content appears in the profile of a cuda-compiled executable.
So I wonder if it is possible that there is no injection when executing the cuda executable? Given the lack of relevant information on the Internet, I don’t know how to check this problem.

hwilper · July 8, 2025, 9:03pm

Can you give me the nsys command line you ran?

One thing that I notice is that the system has a linux kernel paranoid level of 4, which is going to stop essentially all of the CPU profiling information that we get from the linux perf subsystem. Do you know if htat was a recent change?

chenhongyu2048 · July 9, 2025, 12:28am

Thanks for your reply, here is the nsys command line I used:

nsys profile --trace=osrt,cuda,nvtx --trace-fork-before-exec=true --cuda-graph-trace=node ./grouped_gemm

with the below output:

WARNING: CPU IP/backtrace sampling not supported, disabling.
Try the 'nsys status --environment' command to learn more.

WARNING: CPU context switch tracing not supported, disabling.
Try the 'nsys status --environment' command to learn more.

Collecting data...
Average time for 10 runs: 0.548506 ms
Memory bandwidth: 3439.141061 GB/s
Generating '/home/shixuan/HLSS/.tmp/nsys-report-ef3a.qdstrm'
[1/1] [========================100%] report1.nsys-rep
Generated:
    /home/shixuan/HLSS/test/test_mlp_computation/report1.nsys-rep

Unfortunately, the diagnostic summary of nsys is:

Furthermore, if I execute a python program (torch based) that does some GPU operations, there is normal nsys-rep output. the command is like:

nsys profile --trace=osrt,cuda,nvtx --trace-fork-before-exec=true --cuda-graph-trace=node python run.py

This makes me somewhat confident that it’s maybe not a issue from linux kernel paranoid level. Thanks though, I’ll ask the admin about this.

hwilper · July 9, 2025, 2:29pm

Am I reading that correctly? Your runs are averaging 1/2 a millisecond? Can you do a longer run?

I’m wondering if the run is so short that the CUPTI library is not having time to fully initialize.

Because you aren’t specifically turning off the CPU side sampling, Nsys is trying to run it. However, the paranoid level presents that. But that isn’t your problem.

chenhongyu2048 · July 9, 2025, 2:41pm

Hi, the code actually runs the kernel 10 times, 0.5 ms each time. (5ms in total).
And I just re-tested and ran the kernel 1000 times, but the situation did not change.

I think the initialization of the CUPTI library should not be a problem? Because from my understanding, nsys will wait until the initialization is complete before starting to execute the real command.

hwilper · July 9, 2025, 2:57pm

@liuyis am I off base here?

liuyis · July 9, 2025, 3:14pm

Hi @chenhongyu2048, could you share the report file? Also, could you try a more recent Nsys version from Nsight Systems - Get Started | NVIDIA Developer just in case it’s something already fixed?

chenhongyu2048 · July 9, 2025, 3:29pm

Of course. The nsys-rep file is as below. I got it by nsys 2025.1.1
report1.zip (141.7 KB)

liuyis · July 9, 2025, 3:37pm

Thank you. The report does look strange in that everything is empty.

Could you collect logs from Nsys for us to take a deeper look?

Save the following to /tmp/nvlog.config

+ 100iwef   global
$ /tmp/nsight-sys.log
ForceFlush
Format $sevc$time|${name:0}|PID${pid:0}|TID${tid:0}|${file:0}:${line:0}[${sfunc:0}]:$text

Add environment variable NVLOG_CONFIG_FILE=/tmp/nvlog.config when running Nsys. E.g.

export NVLOG_CONFIG_FILE=/tmp/nvlog.config 
nsys profile ...

Run the collection.
There should be a log file at /tmp/nsight-sys.log. Share it to us.

chenhongyu2048 · July 9, 2025, 3:48pm

Thank you for your patience. The file is as follows:

nsight-sys.zip (135.1 KB)

liuyis · July 9, 2025, 3:58pm

Thank you. One thing I noticed from the log is that your system has TMPDIR environment variable set to “/home/shixuan/HLSS/.tmp”. Is that intentional?

Nsys should be able to handle even a non-default TMPDIR path like this, and I’m still checking if there’s anything wrong in our logic, but just sharing this initial finding in case that helps anything.

chenhongyu2048 · July 9, 2025, 4:03pm

Yes, I have set TMPDIR. But nsight-sys.log is placed in /tmp folder when I generated it.

liuyis · July 9, 2025, 6:19pm

I’m wondering if there’s some permission issue with the /home/shixuan/HLSS/.tmp folder that prevented the intermediate profiling files to be written and/or read. Could you try creating a different folder and set it as TMPDIR and see if there’s any difference? Or, if possible, could you try using Nsys with sudo and see if there’s any difference?

If above doesn’t help, could you try another experiment:

Run the following Nsys command:

nsys profile -t osrt -w false yes

In a different terminal, run the following command. Please replace <your TMPDIR path for Nsys> to the actual path.

ls -lR --time-style=full-iso <your TMPDIR path for Nsys>/nvidia/nsight_systems/quadd_session_*

Wait for 10 seconds and repeat step 2.
Attach the outputs from step 2 & 3. You can kill the Nsys command.

The reason is because I’m seeing the log says the intermidiate profiling files stored at <your TMPDIR path for Nsys>/nvidia/nsight_systems/quadd_session_* are older than the beginning of the collecion time and therefore is discarded. I’m trying to figure out if it is actually too old or if there’s some bug in Nsys.

chenhongyu2048 · July 10, 2025, 3:36am

I tried this again:

echo $TMPDIR
/home/shixuan/HLSS/test_tmp
echo $NVLOG_CONFIG_FILE
/tmp/nvlog.config

test_tmp is my newly created folder.
content in nvlog.config is not changed.

then I run:

/home/shixuan/nsight-systems-2025.1.1/bin/nsys profile --trace=osrt,cuda,nvtx --trace-fork-before-exec=true --cuda-graph-trace=node ./grouped_gemm

I got the nsight-sys.log as below.
nsight-sys.zip (98.1 KB)

Then, since no quadd_session_* files or folders are generated in the /home/shixuan/HLSS/test_tmp/nvidia/nsight_systems folder, I cannot perform subsequent operations.

Topic		Replies	Views
Nsys profile exception Profiling x86 Windows Targets cuda	5	88	August 26, 2025
Nsys Does not Show the kernels output Profiling Embedded Targets	21	3540	October 20, 2022
Nsight nsys not collecting any CUDA kernel data (2023.1.2.43-32377213v0) Profiling Linux Targets	19	2943	September 14, 2023
Nsys hangs when profiling any cuda process Profiling Linux Targets cuda	1	311	August 11, 2025
Nsys hangs when profile cuda applications Profiling Linux Targets	10	1136	March 8, 2024
Nsys profiling does not contain CUDA kernel data Profiling Linux Targets kernel	3	1007	November 5, 2023
Nsight Systems does not collect CUDA events Profiling Linux Targets	21	9873	January 11, 2023
Generating CUPTI_* tables with nsys Profiling Linux Targets cuda	25	1885	January 12, 2023
Problem when i launch nsys profile Profiling Linux Targets	2	1390	November 3, 2023
If nsys has an option similar to ‘–profile-all-processes’?(Not getting cuda information from child processes on Linux Profiling Linux Targets nsight	8	2117	July 12, 2024

Nsys can't capture anything (cuda programs only)

Related topics