Question about P2P transfer bandwidth between two RTX2080s

liuchengan2020 · November 2, 2023, 10:03am

I have a workstation with two RTX2080 GPUs running on Windows 10.
Initially, I used the CUDA bandwidthTest.exe program to conduct separate bandwidth tests on the GPUs. Here are the results:

Next, I used the simpleP2P.exe sample program to test the P2P functionality between the GPUs, without an NVLink connection. The test results are displayed below:

Lastly, I employed the p2pBandwidthLatencyTest.exe to conduct the following test:

Now, I have a few questions:

The simpleP2P test results indicate that Peer-to-Peer access is not supported between the GPUs. So, how is the transmission bandwidth between the two cards displayed in the p2pBandwidthLatencyTest.exe results being transferred? Is it through PCIe?
If the transfer is indeed through PCIe and the GPUs do not support P2P, is the data transfer between them accomplished using cudaMemcpyPeerAsync or through system memory caching, namely, first Device-to-Host (D2H) and then Host-to-Device (H2D)? If this is the case, does the performance of the system memory also impact the transmission bandwidth?
The p2pBandwidthLatencyTest.exe results demonstrate that the Unidirectional test results are nearly identical to the Bidirectional results. Why isn’t the Bidirectional result twice as fast as the Unidirectional result?

Robert_Crovella · November 2, 2023, 1:51pm

Yes.

The test may use that function call. That function call in a non-peer setting still works, it just uses a non-peer transfer path.

Yes, typically a non-peer device-to-device transfer flows through a system memory buffer.

It can, but most modern systems have enough CPU memory bandwidth so that the effect on a single transfer like this, with nothing else going on, is usually not that evident.

There could be a number of factors affecting this, including WDDM batching in the case of GPUs in WDDM mode. System topology could be a factor also. I won’t be able to offer a specific answer.

Topic		Replies	Views
Peer-to-Peer Memory Access can suppport a system-wide max of 8 peer connections CUDA Programming and Performance	4	1452	August 30, 2017
K80 peer-to-peer transfers: Slow bandwidth and high latency. CUDA Programming and Performance	7	3673	August 31, 2016
Low P2P GPU bandwidth performance between GeForce GPUs CUDA Programming and Performance	20	506	October 9, 2024
Basic question about 2-in-1 GPUs (ie GTX Titan Z or K80) CUDA Setup and Installation	2	1763	November 28, 2014
[p2pBandwidthLatencyTest] unknown test items. CUDA Programming and Performance	8	2090	February 20, 2019
How can I improve the 'p2p enabled' bandwidth when testing NCCL performance with two A5000 GPU using PCIe 4.0 x16? CUDA Programming and Performance cuda	2	1099	September 15, 2023
P2P peer communication is slower than the bandwidth between GPU and CPU CUDA Programming and Performance	0	3570	June 5, 2011
Erratic multi-gpu bandwidth CUDA Programming and Performance	8	2688	June 25, 2015
simpleMultiGPU seems to work fine but simpleP2P fails CUDA Setup and Installation	0	1063	December 14, 2017
Questions about p2pBandwidthLatencyTest CUDA Programming and Performance	2	836	July 16, 2019

Question about P2P transfer bandwidth between two RTX2080s

Related topics