I’ll be teaching a class on parallel computing this fall, and wanted to include GPGPUs in it. So I’m trying to learn the subject myself :-).
I’ve found no information at all about how GPUs connect their memory to their cores. So for, e.g., Pascal, you have 8 memory controllers; each controls .5MB of L2 cache, and each pair controls one HBM2 stack.
Questions: what does it mean for each memory controller to also control .5MB of L2? Do the 8 pieces of L2 each own a non-overlapping part of the GPU physical-address space? May two pieces of L2 each own the same line at any given time? Is there an equivalent of MESI somewhere, or is this all non-coherent?
In addition to the 8 memory controllers, there are 60 streaming multiprocessors (SMs).
Questions: how are the memory controllers, caches and SMs interconnected? Is there a ring-based structure? A subset of a crossbar? Something else? Link-based interconnect? Is this information purposely not disclosed?