How to understand the long/short scoreboard of sass code on RTX3080 gpu?

I get line 1711 some information:
图片
Q1:
From the instructions of line 1711, we can see that the operand on this line actually comes from a distant location, and there is no memory access in this instruction. It shouldn’t be using a long/short scoreboard, but here we have this data. How should we understand this?


I get line 2099 information:
图片
Q2:
Why is the 2099 a long scoreboard ? there is actually no access to global memory here!And we can see that the operand on this line actually comes from a distant location.

@Robert_Crovella Could you help me? thank you very much!

What GPU are you using?

the gpu is RTX3080

In the “line 1711” example, R25 is being used by line 1708 SHFL.IDX, which is a half throughput instruction, so the delay on 1711 is probably waiting on completion there.

Edit: In the case of the second example, “line 2099”, line 2097 looks like it’s perhaps going to address 0x7f9033c5b360, which is not shown, so maybe R4 or R8 are involved there.

will it lead to long “short scoreboard”? It seems that shared memory will lead to “short scoreboard”.

address 0x7f9033c5b360 is:
图片

it don’t use R4 or R8.

I agree, but the Warp Stall Reasons section of the Profiling Guide mentions other possible causes - perhaps shuffle instructions are included.

No, but a few lines later it branches somewhere else.

Just to be clear, I’m not an expert in this and Nvidia don’t publish much comprehensive regarding SASS instructions, I’m just offering my take. : )

1 Like