I am trying to profile my application using ncu with the SourceCounters section enabled, as I am interested in the stall reasons. However, running the command sudo ncu --section SourceCounters myApplication Shows the following errors:
ERR Rule PC sampling data returned an error: Metric smsp__pcsamp_sample_count not found
----- --------------------------------------------------------------------------------------------------------------
ERR <built-in function IAction_metric_by_name> returned a result with an exception set
/home/myuser/Documents/NVIDIA Nsight Compute/2023.2.1/Sections/PCSamplingData.py:47
/usr/local/NVIDIA-Nsight-Compute-2023.2/target/linux-desktop-glibc_2_11_3-x64/../../sections/NvRules.py:2017
----- --------------------------------------------------------------------------------------------------------------
ERR Rule Uncoalesced Global Accesses returned an error: Metric memory_l2_theoretical_sectors_global not found
----- --------------------------------------------------------------------------------------------------------------
ERR <built-in function IAction_metric_by_name> returned a result with an exception set
/home/myuser/Documents/NVIDIA Nsight Compute/2023.2.1/Sections/UncoalescedAccess.py:70
/usr/local/NVIDIA-Nsight-Compute-2023.2/target/linux-desktop-glibc_2_11_3-x64/../../sections/NvRules.py:2017
----- --------------------------------------------------------------------------------------------------------------
ERR Rule Uncoalesced Shared Accesses returned an error: Metric memory_l1_wavefronts_shared not found
----- --------------------------------------------------------------------------------------------------------------
ERR <built-in function IAction_metric_by_name> returned a result with an exception set
/home/myuser/Documents/NVIDIA Nsight Compute/2023.2.1/Sections/UncoalescedSharedAccess.py:70
/usr/local/NVIDIA-Nsight-Compute-2023.2/target/linux-desktop-glibc_2_11_3-x64/../../sections/NvRules.py:2017
Interestingly, the command sudo ncu --section SourceCounters myApplication --print-summary per-gpu results in a segmentation fault (of ncu). In addition,sudo ncu --section SourceCounters myApplication --print-summary per-gpu --graph-profiling graph also segfaults, but only after a while (around 80 kernel launches). My application does not use CUDA graphs.
I am running ncu Version 2023.2.1.0 (build 33050884) (public-release) on Linux Mint 21 x86_64 with kernel 5.15.0-78-generic. The device is an RTX 3070.
I have attached the complete output of the first command. output_of_ncu.txt (4.9 KB)
Any tips on how to resolve this issue? I have already tried reinstalling ncu a few times, but I fear some files are still lingering on my system and getting mixed up.
The errors you are showing in your description are only a symptom of the problems encountered while profiling the third kernel in your log, which subsequently caused several metrics expected by these rules to be not available. The underlying error is
==ERROR== An error was reported by the driver
which indicates that the kernel hit a GPU exception (like an illegal memory access) while being replayed by the tool. There can be multiple reasons for this, with the most likely being:
An existing bug in the kernel that is triggered when being run under the profiler. You would want to run the different tools provided by compute-sanitizer on this kernel to ensure that’s not the case.
An issue introduced by the software-patching metrics that are part of the SourceCounters section.
Another problem in the combination of GPU, CUDA driver and tool. It will be useful if you can let us know your exact CUDA driver version (e.g. by providing the output of nvidia-smi)
Since you are looking for the stall reasons, which are collected using a different internal metric provider and not using software patching, you could request these independently from the rest to WAR the issue, e.g. by using a new section file in user user’s documents dir at /home/user/Documents/NVIDIA Nsight Compute/<version>/Sections with the following content and collecting only that:
and then opening this report in the UI. This will give you samples warp stalls, correlated with your source code. Since in your original command, you used --print-summary, it’s also possible you are only looking for values aggregated across the runtime of your kernel, in which case you would collect
ncu --section WarpStateStats …
instead. If you want to see the chart generated by this section on the command line, you’ll also have to use --print-details all.
If your application doesn’t use CUDA graph, using the option --graph-profiling graph won’t have any impact and is unlikely to trigger any errors. It’s however possible that the issue is non-deterministic and may occur at different kernel instances at different runs.
The Application returns no errors when run without profiler. I check all API call return values and compute sanitizer returns 0 errors. Is it possible to let Nsight Compute produce a core dump of the failed application? It does not appear to do so by default, or coredumpctl can not find it.
Thank you for your help on still getting the counters out. I can at least move forward :)
compute sanitizer returns 0 errors
Please make sure to also check its --tool racecheck sub-tool (by default, only memcheck is run, which checks for a different problem class).
This also does not return any errors or warnings. So the error must be introduced by the profiler, but not the sanitizer? I assume they have a bit of a similar functioning to observe the execution.