I want to use multiGPU P2P Access between a set of 4 Tesla K80, also along with UVA. The thing is that my program needs P2P Access between GPU0 and all the other ones. Unfortunately, this can’t be done and I don’t know why… If someone can explain me what’s happening here would be helpful.
Here is my execution of the simpleP2P example.
This shows that the first two K80 (0,1,2,3) can have P2P between themselves. And that the other two K80 (4,5,6,7) can have P2P access between themselves too. But not between all the group! Which is strange considering that these 4 cards are connected in the same server…
$ ./simpleP2P
[./simpleP2P] - Starting...
Checking for multiple GPUs...
CUDA-capable device count: 8
> GPU0 = " Tesla K80" IS capable of Peer-to-Peer (P2P)
> GPU1 = " Tesla K80" IS capable of Peer-to-Peer (P2P)
> GPU2 = " Tesla K80" IS capable of Peer-to-Peer (P2P)
> GPU3 = " Tesla K80" IS capable of Peer-to-Peer (P2P)
> GPU4 = " Tesla K80" IS capable of Peer-to-Peer (P2P)
> GPU5 = " Tesla K80" IS capable of Peer-to-Peer (P2P)
> GPU6 = " Tesla K80" IS capable of Peer-to-Peer (P2P)
> GPU7 = " Tesla K80" IS capable of Peer-to-Peer (P2P)
Checking GPU(s) for support of peer to peer memory access...
> Peer access from Tesla K80 (GPU0) -> Tesla K80 (GPU1) : Yes
> Peer access from Tesla K80 (GPU0) -> Tesla K80 (GPU2) : Yes
> Peer access from Tesla K80 (GPU0) -> Tesla K80 (GPU3) : Yes
> Peer access from Tesla K80 (GPU0) -> Tesla K80 (GPU4) : No
> Peer access from Tesla K80 (GPU0) -> Tesla K80 (GPU5) : No
> Peer access from Tesla K80 (GPU0) -> Tesla K80 (GPU6) : No
> Peer access from Tesla K80 (GPU0) -> Tesla K80 (GPU7) : No
> Peer access from Tesla K80 (GPU1) -> Tesla K80 (GPU0) : Yes
> Peer access from Tesla K80 (GPU1) -> Tesla K80 (GPU2) : Yes
> Peer access from Tesla K80 (GPU1) -> Tesla K80 (GPU3) : Yes
> Peer access from Tesla K80 (GPU1) -> Tesla K80 (GPU4) : No
> Peer access from Tesla K80 (GPU1) -> Tesla K80 (GPU5) : No
> Peer access from Tesla K80 (GPU1) -> Tesla K80 (GPU6) : No
> Peer access from Tesla K80 (GPU1) -> Tesla K80 (GPU7) : No
> Peer access from Tesla K80 (GPU2) -> Tesla K80 (GPU0) : Yes
> Peer access from Tesla K80 (GPU2) -> Tesla K80 (GPU1) : Yes
> Peer access from Tesla K80 (GPU2) -> Tesla K80 (GPU3) : Yes
> Peer access from Tesla K80 (GPU2) -> Tesla K80 (GPU4) : No
> Peer access from Tesla K80 (GPU2) -> Tesla K80 (GPU5) : No
> Peer access from Tesla K80 (GPU2) -> Tesla K80 (GPU6) : No
> Peer access from Tesla K80 (GPU2) -> Tesla K80 (GPU7) : No
> Peer access from Tesla K80 (GPU3) -> Tesla K80 (GPU0) : Yes
> Peer access from Tesla K80 (GPU3) -> Tesla K80 (GPU1) : Yes
> Peer access from Tesla K80 (GPU3) -> Tesla K80 (GPU2) : Yes
> Peer access from Tesla K80 (GPU3) -> Tesla K80 (GPU4) : No
> Peer access from Tesla K80 (GPU3) -> Tesla K80 (GPU5) : No
> Peer access from Tesla K80 (GPU3) -> Tesla K80 (GPU6) : No
> Peer access from Tesla K80 (GPU3) -> Tesla K80 (GPU7) : No
> Peer access from Tesla K80 (GPU4) -> Tesla K80 (GPU0) : No
> Peer access from Tesla K80 (GPU4) -> Tesla K80 (GPU1) : No
> Peer access from Tesla K80 (GPU4) -> Tesla K80 (GPU2) : No
> Peer access from Tesla K80 (GPU4) -> Tesla K80 (GPU3) : No
> Peer access from Tesla K80 (GPU4) -> Tesla K80 (GPU5) : Yes
> Peer access from Tesla K80 (GPU4) -> Tesla K80 (GPU6) : Yes
> Peer access from Tesla K80 (GPU4) -> Tesla K80 (GPU7) : Yes
> Peer access from Tesla K80 (GPU5) -> Tesla K80 (GPU0) : No
> Peer access from Tesla K80 (GPU5) -> Tesla K80 (GPU1) : No
> Peer access from Tesla K80 (GPU5) -> Tesla K80 (GPU2) : No
> Peer access from Tesla K80 (GPU5) -> Tesla K80 (GPU3) : No
> Peer access from Tesla K80 (GPU5) -> Tesla K80 (GPU4) : Yes
> Peer access from Tesla K80 (GPU5) -> Tesla K80 (GPU6) : Yes
> Peer access from Tesla K80 (GPU5) -> Tesla K80 (GPU7) : Yes
> Peer access from Tesla K80 (GPU6) -> Tesla K80 (GPU0) : No
> Peer access from Tesla K80 (GPU6) -> Tesla K80 (GPU1) : No
> Peer access from Tesla K80 (GPU6) -> Tesla K80 (GPU2) : No
> Peer access from Tesla K80 (GPU6) -> Tesla K80 (GPU3) : No
> Peer access from Tesla K80 (GPU6) -> Tesla K80 (GPU4) : Yes
> Peer access from Tesla K80 (GPU6) -> Tesla K80 (GPU5) : Yes
> Peer access from Tesla K80 (GPU6) -> Tesla K80 (GPU7) : Yes
> Peer access from Tesla K80 (GPU7) -> Tesla K80 (GPU0) : No
> Peer access from Tesla K80 (GPU7) -> Tesla K80 (GPU1) : No
> Peer access from Tesla K80 (GPU7) -> Tesla K80 (GPU2) : No
> Peer access from Tesla K80 (GPU7) -> Tesla K80 (GPU3) : No
> Peer access from Tesla K80 (GPU7) -> Tesla K80 (GPU4) : Yes
> Peer access from Tesla K80 (GPU7) -> Tesla K80 (GPU5) : Yes
> Peer access from Tesla K80 (GPU7) -> Tesla K80 (GPU6) : Yes
Enabling peer access between GPU0 and GPU1...
Checking GPU0 and GPU1 for UVA capabilities...
> Tesla K80 (GPU0) supports UVA: Yes
> Tesla K80 (GPU1) supports UVA: Yes
Both GPUs can support UVA, enabling...
Allocating buffers (64MB on GPU0, GPU1 and CPU Host)...
Creating event handles...
cudaMemcpyPeer / cudaMemcpy between GPU0 and GPU1: 7.42GB/s
Preparing host buffer and memcpy to GPU0...
Run kernel on GPU1, taking source data from GPU0 and writing to GPU1...
Run kernel on GPU0, taking source data from GPU1 and writing to GPU0...
Copy data back to host from GPU0 and verify results...
Disabling peer access...
Shutting down...
Test passed
Peer-to-peer requires that the participating GPUs are on the same PCIe root complex. Each x86 CPU provides its own PCIe root complex, so I would hypothesize that this is a dual-CPU machine, where GPUs 0-3 are coupled to one CPU, and GPUs 4-7 are coupled to the other CPU.
I have no experience with your kind of hardware setup, but I believe that if direct communication between GPUs across PCIe is not possible, there is a fallback path that moves the data through a host buffer (so GPUx → CPU → GPUy). Obviously that results in lower performance.
Some of the forum participants here have experience with high-end systems such as yours, my recommendation would be to wait for knowledgeable comments from them.
No, it’s not correct to say “the only way to work with all the GPUs would be using MPI”.
There are no significant issues with working with 2, 4, or 8 GPUs in your setup.
Yes, P2P will not work amongst any 2 GPUs. But if you equate that with an inability to work with GPUs, then you simply don’t understand one or both of the following:
How to work with multiple GPUs (e.g. see simpleMultiGPU, or cudaOpenMP sample codes, niether of which depend on P2P)
How P2P works, and what it means.
Since these topics have been covered extensively elsewhere, I’m not going to cover that ground. Feel free to use your google-fu.
So P2P between some GPUs won’t work between some GPUs and with other ones will…
I know how to work with multiple GPU, in fact I’m using P2P+UVA and OpenMP to work. And yes there is a significant working from 2 to 8, because in my program, the speedup gets better with more GPUs working. Now, P2P makes easier the coding and also the communication between GPUs. Said that, It would be easier if all the GPU were in the same PCIe to work with P2P and UVA.
I don’t really know what you’re trying to say there, but I would agree that many programs will benefit by using more GPUs, and there shouldn’t be much preventing you (no significant issues) from doing that in your setup.
That particular issue is not something that you’re going to be able to solve with software. As njuffa said already, it’s a hardware (topology) issue associated with the platform that you have these GPUs plugged into.
The test you’ve run is already a good one for demonstrating that. If they are on the same root complex, they will be able to establish P2P access with each other.
You could also study your motherboard documentation. It may provide such topology information.
And there are other tools you can use to discover it such as lspci, lstopo (part of hwloc), and nvidia-smi
For nvidia-smi try:
nvidia-smi topo -h
to get started. A possible command option might be:
nvidia-smi topo -m
connections labelled SOC indicate that the path between those GPUs involves a socket level link, which means those GPUs are on separate PCIE root complexes. The other connection types (PHB, PXB, PIX) all indicate connections that should support P2P.