[nsight-compute][source view] misaligned "live registers" with sources

Hello,

I am profiling a kernel with Nsight Compute. This kernel is heavily latency bound as it requires a lot of registers. I am trying to use the Live register count to optimize it.

However, some of the per-line register count don’t make sense to me. For instance, see the image attached where:

On line 551 register count is 48, braces opens. In the braces, the register count increases which makes sense, brace closed on line 559. Then, on line 561 which is a comment, register count suddenly bumps to 112.

This does not makes sense to me, and I wonder if there is a misalignment of the live register with the source, due to the call to the inlined function “calcul_xg_tetra”. Could it be the case that there is a misalignment ?

Is there a specific thing to do to take these inline functions into account ?

Thanks in advance,
Rémi

I am compiling the code with nvc++ from Nvidia HPC SDK 25.3, with cuda 12.8 with drivers 570.133.07, on a Ubuntu 24.04 machine with a NVIDIA RTX 6000 Ada GPU. Moreover, the kernel is written using the Kokkos framework. My compilation flags include -lineinfo and my ncu run uses -import-sources. ncu version is 2025.1.0.0 (build 35237751) (public-release)

We fixed an issue related to correlation on the Source page when some but not all of the source files were resolved. Please try if there is a change to correlation when opening this report with Nsight Compute 2025.2.

Hi Felix, thanks for your answer.

I have downloaded ncu 2025.2, loaded up my ncu file, and the issue remains.

Should I generate the report with ncu 2025.2 too ?

Thanks,
Rémi

Should I generate the report with ncu 2025.2 too ?

I would not expect it to make a difference, no.

Further things you can try:

  • Share the report with us if that is acceptable to you, and we can have a look at it.
  • Inspect the SASS correlated with these high-level lines, to understand at which instructions the Live Registers metric changes/increases. You normally see spikes at CALL instructions, as the ABI requires the caller registers to be saved, which is included in the metric.

Hi !

What is the best way to send my report ? it is to large for the forum (60Mo)

Thanks