Hello everyone,
I have multi gpu program, data transferring in which implemented via p2p access. Both gpus are GTX1080 (Driver Version: 396.45), OS - arch linux (with i3), CUDA version is 9.2.148. However, it hangs without any progress. Similar behavior I have with CUDA simpleP2P - it doesn’t work too:
alexander@server /o/c/s/b/x/l/release> ./simpleP2P
[./simpleP2P] - Starting...
Checking for multiple GPUs...
CUDA-capable device count: 2
> GPU0 = "GeForce GTX 1080" IS capable of Peer-to-Peer (P2P)
> GPU1 = "GeForce GTX 1080" IS capable of Peer-to-Peer (P2P)
Checking GPU(s) for support of peer to peer memory access...
> Peer access from GeForce GTX 1080 (GPU0) -> GeForce GTX 108
> Peer access from GeForce GTX 1080 (GPU1) -> GeForce GTX 108
Enabling peer access between GPU0 and GPU1...
Checking GPU0 and GPU1 for UVA capabilities...
> GeForce GTX 1080 (GPU0) supports UVA: Yes
> GeForce GTX 1080 (GPU1) supports UVA: Yes
Both GPUs can support UVA, enabling...
Allocating buffers (64MB on GPU0, GPU1 and CPU Host)...
Creating event handles...
Then it hangs. Wait until it continues to work for a while does not help (in my case it was about half an hour). The more interesting thing - if I kill this process, it will lead to crash of i3 after some time of hanging.
Output of nvidia-smi:
alexander@server ~> nvidia-smi
Mon Aug 6 18:55:18 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.45 Driver Version: 396.45 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1080 Off | 00000000:0A:00.0 On | N/A |
| 26% 52C P2 44W / 200W | 289MiB / 8116MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 1080 Off | 00000000:0B:00.0 Off | N/A |
| 7% 53C P2 45W / 180W | 187MiB / 8119MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 698 G /usr/lib/Xorg 102MiB |
| 0 1700 C ./simpleP2P 175MiB |
| 1 1700 C ./simpleP2P 175MiB |
+-----------------------------------------------------------------------------+
Topology provided below:
alexander@server ~> nvidia-smi topo -m
GPU0 GPU1 CPU Affinity
GPU0 X PHB 0-15
GPU1 PHB X 0-15
What I can try to do in this situation?