Question about P2P & DMA

I’m looking to get a multi GPU server for 8 GPU’s, mainly for testing & learning. As I understand most GPU servers use PEX chips as PCIe switches. I’m only starting to learn about how software like Tensorflow utilises the hardware, hence my following questions;

When doing p2p communication on GPU’s that are connected to a single chip (eg. PEX8796), is the DMA engine on the PEX chips used at all? Or is that kind of traffic handled by the GPU’s DMA engine?

Is the DMA engine on the chip needed to enable GPUDirect RDMA?


Given that a) most PCIe switches don’t even have DMA engines and b) I can’t see any benefit over using the engine on one of the connected GPUs I’d be surprised if Nvidia spent any effort writing a driver for it.

GPUDirect P2P and RDMA don’t use dma engines on external switches