Where does the PCIe interconnect exists on GPU architecture?

Hello, I have a question about GPU architecture.

For the figure 3 of co-work paper of NVIDIA (MICRO18), we can find the NVLink is connected to the GPU local interconnect(Core ↔ Memory partition). So we can assume that multi-gpus nodes can communicate through the external interconnect(NVLink) directly.

My question is the extension of the assumption.
How about PCIe interconnect?
Is the PCIe interconnect also connected to the GPU local interconnect?
If so, does the copied(received) data from the CPU or another GPU bypass the L2 cache bank and then store it in the off-chip memory directly? or core use it directly?

No, typically it does not bypass the L2. Simple tests can be constructed using the profiler to confirm this.

I won’t be able to answer your other questions.