GPUDirect RDMA PCIe Topology

I am working on a hardware design with a CPU connected to a PCIe 3.0 switch over 4 PCIe lanes (x4) and an FPGA and GPU connected to the same PCIe switch over 16 lanes (x16). When performing a GPUDirect RDMA to transfer data between the FPGA and GPU, will the two devices use all 16 lanes, or will the CPU connected to the PCIe switch with only 4 lanes effect the transfer speed or number of lanes used between the FPGA and GPU?

Looking at the block diagram here:

If the hardware is GPUDirect compliant, the CPU should have no significant involvement.

I have a similar design.
I have x8 from CPU to switch and x16 between GPU and FPGA on a 48lane PCIe switch.
I get full bandwidth for RDMA between FPGA and GPU.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.