I vaguely remember that ‘stall long scoreboard’ refers to stalls caused by global memory, and ‘stall short scoreboard’ seems to refer to stalls caused by shared memory. Is there a clearer description? And why?
You should be able to find detailed descriptions in the tooltips for both when hovering over the labels in on chart. You can find similar information in the metrics reference documentation. The rule output above also has more details for Long Scoreboard (and Barrier Stalls).
1 Like
I got it!
Long scoreboard stalls are caused by global memory operations. The solution is to increase data locality and L2 cache hit rates, or move the data to shared memory.
Short scoreboard stalls are caused by shared memory operations. The solution is to reduce special math instructions (e.g., MUFU) or dynamic branching (e.g., BRX, JMX), or reduce bank conflicts.
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.