I have a workstation with two RTX2080 GPUs running on Windows 10.
Initially, I used the CUDA bandwidthTest.exe program to conduct separate bandwidth tests on the GPUs. Here are the results:
Next, I used the simpleP2P.exe sample program to test the P2P functionality between the GPUs, without an NVLink connection. The test results are displayed below:
Lastly, I employed the p2pBandwidthLatencyTest.exe to conduct the following test:
Now, I have a few questions:
-
The simpleP2P test results indicate that Peer-to-Peer access is not supported between the GPUs. So, how is the transmission bandwidth between the two cards displayed in the p2pBandwidthLatencyTest.exe results being transferred? Is it through PCIe?
-
If the transfer is indeed through PCIe and the GPUs do not support P2P, is the data transfer between them accomplished using cudaMemcpyPeerAsync or through system memory caching, namely, first Device-to-Host (D2H) and then Host-to-Device (H2D)? If this is the case, does the performance of the system memory also impact the transmission bandwidth?
-
The p2pBandwidthLatencyTest.exe results demonstrate that the Unidirectional test results are nearly identical to the Bidirectional results. Why isn’t the Bidirectional result twice as fast as the Unidirectional result?