I get line 1711 some information:
Q1:
From the instructions of line 1711, we can see that the operand on this line actually comes from a distant location, and there is no memory access in this instruction. It shouldn’t be using a long/short scoreboard, but here we have this data. How should we understand this?
I get line 2099 information:
Q2:
Why is the 2099 a long scoreboard ? there is actually no access to global memory here!And we can see that the operand on this line actually comes from a distant location.
In the “line 1711” example, R25 is being used by line 1708 SHFL.IDX, which is a half throughput instruction, so the delay on 1711 is probably waiting on completion there.
Edit: In the case of the second example, “line 2099”, line 2097 looks like it’s perhaps going to address 0x7f9033c5b360, which is not shown, so maybe R4 or R8 are involved there.