The upper left arrow shows bytes transferred from L2 to Shared Memory due to LDGSTS (Asynchronous Global to Shared Memcopy) instructions. The lower right line has a symmetrical one in the other direction (from Shared Memory to L2), and measures the bytes transferred between the two units due to TMA (Tensor Memory Access). You should be able to get more details on which exact metrics are used to compute either by hovering over the respective labels.
1 Like