cuda constant cache and L2 cache

I am really confused about the relation between constant cache and L2 cache. This L2 cache is the one that global load will go through. I know there is another L2 constant cache which is shared in ont TPC. Could someone tell me what are the steps when accessing constant memory? Which cache it may go through? Thanks.

It’s definitely confusing but a shortcut to understanding what’s going on in your particular GPU is to look at the Nsight Performance Analysis output.

The Memory Statistics tab illustrates the target’s architecture. For example, an sm_35 (GK110/GK208) device looks like this:

Although this diagram doesn’t explicitly call out which parts are in the SM* vs. the GPC/TPC, you can find that out by skimming the various architectural whitepapers on Fermi, Kepler and Maxwell.

The only open question is if the diagram is always perfectly representative of the architecture! :)

Hi allanmac, thanks very much for your reply.
The Nsight looks very cool, but I don’t have it. Could you also provide a diagram for accessing constant memory? Thanks again.