Peer-to-Peer Memory Access can suppport a system-wide max of 8 peer connections

SimonSimon · August 30, 2017, 7:35pm

According to Cuda Documentation,

3.2.6.4. Peer-to-Peer Memory Access
…
Peer-to-peer memory access must be enabled between two devices by calling cudaDeviceEnablePeerAccess() as illustrated in the following code sample. Each device can support a system-wide maximum of eight peer connections.

What does “Each device can support a max of eight peer connections” mean?
Are these 8 simultaneous connections?

From this article,

The author gave results for P2P enabled on a 10x GPU single-root system. They ran the p2pBandwidthLatencyTest.

Does this mean that P2P is capable on 8+ gpu systems?

Thanks!
Simon.

Robert_Crovella · August 30, 2017, 7:47pm

Yes, it is 8 simultaneous connections to a particular device. If you make a peer-to-peer association between devices A and B, and then disable that association but enable it between A and C, you can repeat this process for an arbitrarily large number of devices. But at any given moment, A cannot be peer-enabled to more than 8 other devices.

SimonSimon · August 30, 2017, 7:53pm

Hi txbob,

So from that article, the Nvidia p2pBandwidthLatencyTest which the author ran is only using one gpu pair at a time.

Is there any Nvidia supplied sample that can test 8 simultaneous peer-connected bandwidth and latency?

Robert_Crovella · August 30, 2017, 8:40pm

I haven’t doubled checked the source code lately, but I suspect that is what you would see if you looked at it.

I’m not aware of one. Such an app would have several levels of complexity:

simultaneous communication could have a lot of possible permutations
simultaneous communication will stress different PCIE topologies in different ways, leading to less “predictability” or probably a bettter word would be “consistency” of the results. Of course the results may be predictable given sufficient knowledge of the PCIE topology, but a great many users of this technology don’t really understand PCIE topology ramifications in great depth, so trying to interpret the results might be difficult.

Having said that, the CUDA sample apps first and foremost are designed to be teaching tools, not test or validation utilities (although they obviously serve that purpose to some degree as well). If you wanted to design your own simultaneous communication test app, the p2pBandwidthLatencyTest app should be a pretty good roadmap.

SimonSimon · August 30, 2017, 8:44pm

Thanks txbob!

Topic		Replies	Views
cuCtxEnablePeerAccess returns CUDA_ERROR_PEER_ACCESS_UNSUPPORTED even if cuDeviceCanAccessPeer returns 1 CUDA Programming and Performance	9	881	January 4, 2019
CUDA peer resources error when running on more than 8 K80s (AWS p2.16xlarge) CUDA Programming and Performance	15	7621	October 12, 2016
Multiple GPUs, Peer-to-Peer Question CUDA Setup and Installation	1	1300	October 21, 2016
P2P: How do I know if cudaMemcpy falls back to non-P2P? CUDA Programming and Performance	8	2653	October 12, 2021
How can I tell which NVIDIA GPUs will have P2P access to the same GPU on PCIe? CUDA Programming and Performance	6	9510	January 20, 2025
Two GPU cards on a single machine CUDA Programming and Performance	3	1229	September 29, 2016
Low Bandwidth and high latency Peer to Peer between V100 GPUs CUDA Programming and Performance	1	2275	August 8, 2018
peer2peer with 2 nvidia cards GeForce GTX 1080 Ti and TITAN X CUDA Programming and Performance	1	1582	September 12, 2017
P2p Bandwidth 150% higher than maximum achievable CUDA Programming and Performance cuda , ubuntu	10	3191	April 11, 2023
Peer-to-Peer Communication CUDA Programming and Performance	1	2524	October 13, 2014

Peer-to-Peer Memory Access can suppport a system-wide max of 8 peer connections

Related topics