I’ve built my code with -lineinfo and run with --import-source yes.
The code I’m profiling makes HEAVY use of c++ templates so the function names are very long. I’ve run some very simple uses cases with templates that aren’t very long and the source shows up. Is there a trick to getting source for long function names? The only option in the source menu is SASS
Running version 2021.2.1
My command line is below.
ncu --export TEST --force-overwrite --target-processes application-only --replay-mode kernel --kernel-name-base function --launch-skip-before-match 0 --section ComputeWorkloadAnalysis --section InstructionStats --section LaunchStats --section MemoryWorkloadAnalysis --section MemoryWorkloadAnalysis_Chart --section MemoryWorkloadAnalysis_Deprecated --section MemoryWorkloadAnalysis_Tables --section Occupancy --section SchedulerStats --section SourceCounters --section SpeedOfLight --section SpeedOfLight_RooflineChart --section WarpStateStats --sampling-interval auto --sampling-max-passes 5 --sampling-buffer-size 33554432 --profile-from-start 1 --disable-profiler-start-stop --cache-control all --clock-control base --apply-rules yes --import-source yes --check-exit-code yes
Doesn’t look like there a workaround for this problem.
Is there a way to annotate the code somehow so that markers show up in the assembly code? Can I use something like nvtxRangePush() to mark where I am in my code? The kernel I have is very large, and so knowing the subroutine that ncu is highlighting would be extremely useful.
You can’t use nvtx to create such correlation information. Can you check with the cuobjdump tool that is part of the CUDA toolkit if you have lineinfo embedded in the ELF binary for your kernels?
/usr/local/cuda/bin/cuobjdump -elf <app-binary or -cubin>
You should be seeing .nv_debug_line_sass sections being reported for the respective kernels. If that’s not the case, this needs to be addressed in your build. Can you share more details on how you are building your application, specifically your CUDA kernels?
I would also recommend to use the latest Nsight Compute version, 2022.1, to get the latest bug fixes and features. It is backwards-compatible with older drivers and CUDA toolkits.
Yes, I asked the sys admins to install the latest ncu and I get the same problem.
I ran cuobjdump and it looks like I’m getting the .nv_debug_line_sass info for the kernels. A screen shot of some of the output is included below. Does this look reasonable?
Yes, that looks like your binary is built with lineinfo included. Unfortunately, one can’t tell from this if the specific kernels in questions are included in this lineinfo table or not. It seems that at least one of the kernels is from CUB. Are the kernels for which lineinfo is missing also using CUB?