P2P access hangs the system (simpleP2P doesn't work)

Hello everyone,
I have multi gpu program, data transferring in which implemented via p2p access. Both gpus are GTX1080 (Driver Version: 396.45), OS - arch linux (with i3), CUDA version is 9.2.148. However, it hangs without any progress. Similar behavior I have with CUDA simpleP2P - it doesn’t work too:

alexander@server /o/c/s/b/x/l/release> ./simpleP2P
[./simpleP2P] - Starting...
Checking for multiple GPUs...
CUDA-capable device count: 2
> GPU0 = "GeForce GTX 1080" IS  capable of Peer-to-Peer (P2P)
> GPU1 = "GeForce GTX 1080" IS  capable of Peer-to-Peer (P2P)

Checking GPU(s) for support of peer to peer memory access...
> Peer access from GeForce GTX 1080 (GPU0) -> GeForce GTX 108
> Peer access from GeForce GTX 1080 (GPU1) -> GeForce GTX 108
Enabling peer access between GPU0 and GPU1...
Checking GPU0 and GPU1 for UVA capabilities...
> GeForce GTX 1080 (GPU0) supports UVA: Yes
> GeForce GTX 1080 (GPU1) supports UVA: Yes
Both GPUs can support UVA, enabling...
Allocating buffers (64MB on GPU0, GPU1 and CPU Host)...
Creating event handles...

Then it hangs. Wait until it continues to work for a while does not help (in my case it was about half an hour). The more interesting thing - if I kill this process, it will lead to crash of i3 after some time of hanging.

Output of nvidia-smi:

alexander@server ~> nvidia-smi
Mon Aug  6 18:55:18 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.45                 Driver Version: 396.45                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1080    Off  | 00000000:0A:00.0  On |                  N/A |
| 26%   52C    P2    44W / 200W |    289MiB /  8116MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 1080    Off  | 00000000:0B:00.0 Off |                  N/A |
|  7%   53C    P2    45W / 180W |    187MiB /  8119MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0       698      G   /usr/lib/Xorg                                102MiB |
|    0      1700      C   ./simpleP2P                                  175MiB |
|    1      1700      C   ./simpleP2P                                  175MiB |
+-----------------------------------------------------------------------------+

Topology provided below:

alexander@server ~> nvidia-smi topo -m
        GPU0    GPU1    CPU Affinity
GPU0     X      PHB     0-15
GPU1    PHB      X      0-15

What I can try to do in this situation?

This sometimes happens when the system board/system BIOS is not correctly designed to support CUDA P2P traffic, or is misconfigured by the BIOS or OS. You’ve provided no details about the motherboard these are plugged into, but it probably wouldn’t help me much anyway (although I’m curious if it is skylake or not).

There are some threads somewhat similar like this:

https://devtalk.nvidia.com/default/topic/1029538/cuda-programming-and-performance/system-hangs-after-executing-p2p-bandwidth-test-on-tesla-k40-nvidia-gpus/

however your case does not appear to be identical - you are not running PLX bridges on your motherboard AFAICT.

Two possible suggestions to consider:

  1. if your motherboard is not running the latest system BIOS, I would try upgrading to the latest system BIOS. However if that doesn’t fix it there is probably not much you can do.

  2. Arch Linux is not an officially supported distro for CUDA. To have the best chance of success with CUDA features/capability, I would always recommend using a supported configuration. You can find the supported OS configurations in the CUDA linux install guide.

Tesla systems in OEM servers that are designed to support the Tesla products generally don’t have this issue. Otherwise your mileage may vary, if you are assembling your own system.