P2P GPU Direct Communication

I’m having trouble understanding UVA on a granular level. At a high level, I understand that it provides a shared virtual address space for all devices. When used with P2P, does that mean that GPU-GPU communication does not occur with the CPU? Is the memory mapping done on each device to actually copy or access data? Or is the CPU involved?

Yes, that is what it means. One GPU issues a PCIE transaction (or perhaps NVLink). That transaction flows over the PCIE fabric to another GPU, and the other GPU creates a PCIE response.

For the communication itself, the CPU may be completely uninvolved if the PCIE fabric connects via e.g. PCIE switches that are not part of the CPU, or if the fabric in question is NVLink. Even if the CPU is “involved”, it means that the PCIE switch capability that is part of the PCIE root complex in the CPU is being used to forward traffic. The CPU is not otherwise “involved” at that point.

Certainly when it comes to things like memory allocation requests (e.g. cudaMalloc) or the harmonization of address spaces that is handled by the CUDA runtime at CUDA initialization, then the CPU is involved there.