Long cycles on stall_long_sb

I’ve used nsight compute to understand the kernel performance and bottlenecks. The detailed page has pointed to latency issue and wrap stall. The warp state statistics further indicates “stall long scoreboard” (80+ cycles per instruction). I then switched to the source page and sorted the “stall_long_sb” in the navigation. For the first two “stall_long_sb” hot spots, I’ve noticed that both of them don’t involve global memory access. I’ve attached the source code for these two hotspots. Anyone has insight why these two lines of code have high “stall_long_sb”. Thanks.

For long scoreboard stalls, the stall is not associated with the instruction issuing the load, but with the one that consumes the result of the load (i.e. the register the value is written into). That’s because the load instruction isn’t waiting until the result in transferred from e.g. global memory, but the next instruction consuming the result has to wait until the transfer finishes.

For example, in your first screenshot, you can see that the stalls are associated with ISETP, which loads from register R19. R19 is written by the previous LDG (load from global memory) instruction, so to find what is causing the stall, select this instruction and inspect the correlated high-level source.

Since version 2021.2, Nsight Compute can visualize register dependencies on the Source page. See the corresponding metric columns for your kernel.

Many thanks for the explanation. Now I understand both hotspot cases. I just downloaded the 2021.2 version. Can you point me where I can visualize register dependencies on the source page? I seems don’t find it.

Also, could you suggest a good resource to learn about SASS? I’ve long wanted to learn it, but didn’t find a good resource.

Register Dependencies are available when enabled in the Options > Report Source Page > Enable Register Dependencies setting. This is the default. On the Source page, they are likely at the end of the table, named Register Dependencies, Predicate Dependencies, etc. You can either scroll there or select them from the metrics drop down, which should bring that column into focus, too. If they aren’t visible, right-click any column header and enable them via the Column Chooser.

Also, could you suggest a good resource to learn about SASS?

We only provide CUDA Binary Utilities :: CUDA Toolkit Documentation

Now I can see it.The register dependencies etc columns are in the SASS section (right part of the page), but not at the source section (left part of the page). Thanks a lot. This is quite useful.