Hi, I’m new to NCCL.
Lately I’ve been testing the examples given in https://docs.nvidia.com/deeplearning/sdk/nccl-developer-guide/docs/examples.html#example-2-one-device-per-process-or-thread
This how I run two processes:
mpirun -n 2 ./my_nccl
in which my_nccl is the excutable file compiled from the example.
I added sleep(600) right after ncclAllReduce function so that I can monitor the network status.Then I discovered this:
tcp 0 0 <IB ip>:50928 0.0.0.0:* LISTEN 42060/my_nccl
tcp 0 0 0.0.0.0:1024 0.0.0.0:* LISTEN 42059/my_nccl
tcp 0 0 <IB ip>:56193 0.0.0.0:* LISTEN 42059/my_nccl
tcp 0 0 0.0.0.0:1025 0.0.0.0:* LISTEN 42060/my_nccl
tcp 0 0 127.0.0.1:40708 127.0.0.1:35753 ESTABLISHED 42060/my_nccl
tcp 0 0 <IB ip>:51550 <IB ip>:56193 ESTABLISHED 42060/my_nccl
tcp 0 0 <IB ip>:36210 <IB ip>:50928 ESTABLISHED 42059/my_nccl
tcp 0 0 <IB ip>:56193 <IB ip>:51550 ESTABLISHED 42059/my_nccl
tcp 0 0 127.0.0.1:40710 127.0.0.1:35753 ESTABLISHED 42059/my_nccl
tcp 0 0 <IB ip>:50928 <IB ip>:36210 ESTABLISHED 42060/my_nccl
unix 2 [ ACC ] SEQPACKET LISTENING 3185032501 42059/my_nccl @cuda-uvmfd-4026531836-42059
unix 2 [ ACC ] SEQPACKET LISTENING 3185036696 42060/my_nccl @cuda-uvmfd-4026531836-42060
unix 3 [ ] STREAM CONNECTED 3185033392 42060/my_nccl
unix 3 [ ] STREAM CONNECTED 3185032487 42059/my_nccl
unix 3 [ ] STREAM CONNECTED 3185033393 42060/my_nccl
unix 3 [ ] STREAM CONNECTED 3185032488 42059/my_nccl
unix 3 [ ] STREAM CONNECTED 3185033388 42060/my_nccl
unix 3 [ ] STREAM CONNECTED 3185032494 42059/my_nccl
unix 3 [ ] STREAM CONNECTED 3185032495 42059/my_nccl
unix 3 [ ] STREAM CONNECTED 3185033385 42060/my_nccl
unix 3 [ ] STREAM CONNECTED 3185033389 42060/my_nccl
unix 3 [ ] STREAM CONNECTED 3185032490 42059/my_nccl
unix 3 [ ] STREAM CONNECTED 3185032491 42059/my_nccl
unix 3 [ ] STREAM CONNECTED 3185033386 42060/my_nccl
By deleting NCCL functions in the example code, I can tell that ip 0.0.0.0 and 127.0.0.1 are both listened by OpenMPI.
And by reading the NCCL code in ncclGetUniqueId and ncclCommInitRank, I think the is used by NCCL for transporting uniqueId, process info, etc.(I haven’t read the code of ncclAllReduce because I can’t fully understand it right now.)
[b]1.So here’s my question: I can’t confirm whether NCCL is doing AllReduce through IB interface or PCIe between GPUs in the same host.
According to the doc https://docs.nvidia.com/deeplearning/sdk/nccl-developer-guide/docs/env.html#nccl-socket-ifname, I can control which network interface to use.
But I can’t fully understand https://docs.nvidia.com/deeplearning/sdk/nccl-developer-guide/docs/env.html#nccl-p2p-level about useing PCIe.
What’s the usage difference between them in NCCL?
2.If AllReduce is done through PCIe in the same host, is there any evidence or is there any way that I can monitor this?
3.Can I choose IB interface to do AllReduce or AllGather by NCCL Environment Variables?[/b]
Here’s my environment and device:
Ubuntu 16.04.6 LTS
NCCL version 2.4.8+cuda10.0
cuda Driver Version: 410.48
GeForce GTX 1080 Ti
Thank you !