The item circled in red with the question mark is local memory, not global memory (although it appears to me there is some global traffic as well.) Are you using any local memory? (i, index.y, index.x, cellCoord.x, cellCoord.y, cellCoord.z could all be local memory references.) Local memory usage that does not fit in registers may get stored in L1/L2/devicemem. Another possibility is register spilling.
The way I read this graph is that [local] <— [L1 cache] <—[L2 Cache] <—[Device memory] represents reads from device memory loading into registers (local memory). Register spillage I thought goes through [Global RO].
The kernel shows 65 registers used on Kepler GK110, which should be fine no?
Those blocks in the diagram immediately to the right of the “kernel” block refer to memory spaces and/or transaction types. You’ll note that there is no “register” block. Registers are not a memory space.
If the kernel reads from global memory space, it will show up as a transaction flowing through the “Global” box. If the kernel reads from the local memory space, it will show up as a transaction flowing through the “local” box, which is connected to the link you have circled. Both Global and Local traffic can flow through L1/L2/Devicemem.
You haven’t really answered my question about local memory usage, except to indicate what the register usage is, which does not answer the question.
One possible example of local memory usage could be something like this:
int localdata[1024];
in kernel code. Such a construct is in the “local” memory space, but will not get stored in registers, nor have a direct impact on register usage. It will be stored in device memory, and accesses to it will flow through L1/L2 as appropriate, and they are distinct from “global” memory transactions.