NCCL 2.0 support inter-node communication using Sockets?

Hi
I trying to test the difference between NCCL and NCCL2 on my project, and NCCL 2 does not work correctly.
I’m running the program on http://docs.nvidia.com/deeplearning/sdk/nccl-developer-guide/index.html#onedevprothrd
When I call commInitRank() in my program, this error will occur.

*server name* [0] include/socket.h:185 WARN Call to connect failed : Connection refused
Failed, NCCL error nvidia-sample.cu:88 'unhandled system error'

This error occur only when I try to make communicator with inter-node environment, and when i try to make communicator with only one node this error will no be occurred.

And my question is that NCCL 2.0 is supporting inter-node communication using Sockets or it supports only with InfiniBand.
I don’t have InfiniBand environment so I haven’t test my program with InfiniBand, so I’m not sure if my program is wrong or my test environment is not supporting inter-node communication.

This error was occurred by NCCL2 environment setting
NCCL2 was trying to use virtual network IF for docker, and it made it impossible to communicate among each node.

I added

NCCL_SOCKET_IFNAME=^docker0

to system environment variable and it worked

This works! Thanks pakio!
I was running distributed tensorflow with horovod using NCCL 2 and docker. I saw the same error message and solved it following your method.
Really appreciate your experience!

I am facing the following error “include/socket.h:369 NCCL WARN Call to connect timeout : Connection refused”, while trying to run distributed pytorch training across 3 nodes. what is the fix?
Here is the complete log :
distributed init (rank 20): tcp://172.31.10.218:9218
| distributed init (rank 18): tcp://172.31.10.218:9218
| distributed init (rank 16): tcp://172.31.10.218:9218
| distributed init (rank 22): tcp://172.31.10.218:9218
| distributed init (rank 23): tcp://172.31.10.218:9218
| distributed init (rank 17): tcp://172.31.10.218:9218
| distributed init (rank 19): tcp://172.31.10.218:9218
| distributed init (rank 21): tcp://172.31.10.218:9218
ip-172-31-9-87:17672:17672 [4] NCCL INFO NET : Using interface lo:127.0.0.1<0>
ip-172-31-9-87:17672:17672 [4] NCCL INFO NET/IB : Using interface lo for sideband communication
ip-172-31-9-87:17676:17676 [3] NCCL INFO NET : Using interface lo:127.0.0.1<0>
ip-172-31-9-87:17672:17672 [4] NCCL INFO Using internal Network Socket
ip-172-31-9-87:17672:17672 [4] NCCL INFO rank 20 nranks 24
ip-172-31-9-87:17676:17676 [3] NCCL INFO NET/IB : Using interface lo for sideband communication
ip-172-31-9-87:17676:17676 [3] NCCL INFO Using internal Network Socket
ip-172-31-9-87:17676:17676 [3] NCCL INFO rank 19 nranks 24
ip-172-31-9-87:17672:18605 [4] NCCL INFO comm 0x7f3cdc0551f0 rank 20 nranks 24
ip-172-31-9-87:17676:18606 [3] NCCL INFO comm 0x7fbd600551f0 rank 19 nranks 24
ip-172-31-9-87:17672:18605 [4] NCCL INFO NET : Using interface lo:127.0.0.1<0>
ip-172-31-9-87:17672:18605 [4] NCCL INFO NET : Using interface ens5:172.31.9.87<0>
ip-172-31-9-87:17672:18605 [4] NCCL INFO NET/Socket : 2 interfaces found
ip-172-31-9-87:17676:18606 [3] NCCL INFO NET : Using interface lo:127.0.0.1<0>
ip-172-31-9-87:17676:18606 [3] NCCL INFO NET : Using interface ens5:172.31.9.87<0>
ip-172-31-9-87:17676:18606 [3] NCCL INFO NET/Socket : 2 interfaces found
ip-172-31-9-87:17677:17677 [2] NCCL INFO NET : Using interface lo:127.0.0.1<0>
ip-172-31-9-87:17677:17677 [2] NCCL INFO NET/IB : Using interface lo for sideband communication
ip-172-31-9-87:17677:17677 [2] NCCL INFO Using internal Network Socket
ip-172-31-9-87:17677:17677 [2] NCCL INFO rank 18 nranks 24
ip-172-31-9-87:17677:18607 [2] NCCL INFO comm 0x7fa2400551f0 rank 18 nranks 24
ip-172-31-9-87:17677:18607 [2] NCCL INFO NET : Using interface lo:127.0.0.1<0>
ip-172-31-9-87:17677:18607 [2] NCCL INFO NET : Using interface ens5:172.31.9.87<0>
ip-172-31-9-87:17677:18607 [2] NCCL INFO NET/Socket : 2 interfaces found
ip-172-31-9-87:17671:17671 [0] NCCL INFO NET : Using interface lo:127.0.0.1<0>
ip-172-31-9-87:17671:17671 [0] NCCL INFO NET/IB : Using interface lo for sideband communication
ip-172-31-9-87:17671:17671 [0] NCCL INFO Using internal Network Socket
ip-172-31-9-87:17671:17671 [0] NCCL INFO rank 16 nranks 24
ip-172-31-9-87:17671:18608 [0] NCCL INFO comm 0x7fa3ac0551f0 rank 16 nranks 24
ip-172-31-9-87:17671:18608 [0] NCCL INFO NET : Using interface lo:127.0.0.1<0>
ip-172-31-9-87:17671:18608 [0] NCCL INFO NET : Using interface ens5:172.31.9.87<0>
ip-172-31-9-87:17671:18608 [0] NCCL INFO NET/Socket : 2 interfaces found
ip-172-31-9-87:17674:17674 [1] NCCL INFO NET : Using interface lo:127.0.0.1<0>
ip-172-31-9-87:17674:17674 [1] NCCL INFO NET/IB : Using interface lo for sideband communication
ip-172-31-9-87:17674:17674 [1] NCCL INFO Using internal Network Socket
ip-172-31-9-87:17674:17674 [1] NCCL INFO rank 17 nranks 24
ip-172-31-9-87:17674:18611 [1] NCCL INFO comm 0x7f8fa80551f0 rank 17 nranks 24
ip-172-31-9-87:17674:18611 [1] NCCL INFO NET : Using interface lo:127.0.0.1<0>
ip-172-31-9-87:17674:18611 [1] NCCL INFO NET : Using interface ens5:172.31.9.87<0>
ip-172-31-9-87:17674:18611 [1] NCCL INFO NET/Socket : 2 interfaces found
ip-172-31-9-87:17678:17678 [6] NCCL INFO NET : Using interface lo:127.0.0.1<0>
ip-172-31-9-87:17678:17678 [6] NCCL INFO NET/IB : Using interface lo for sideband communication
ip-172-31-9-87:17678:17678 [6] NCCL INFO Using internal Network Socket
ip-172-31-9-87:17678:17678 [6] NCCL INFO rank 22 nranks 24
ip-172-31-9-87:17678:18612 [6] NCCL INFO comm 0x7f25900551f0 rank 22 nranks 24
ip-172-31-9-87:17678:18612 [6] NCCL INFO NET : Using interface lo:127.0.0.1<0>
ip-172-31-9-87:17678:18612 [6] NCCL INFO NET : Using interface ens5:172.31.9.87<0>
ip-172-31-9-87:17678:18612 [6] NCCL INFO NET/Socket : 2 interfaces found

ip-172-31-9-87:17672:18605 [4] include/socket.h:369 NCCL WARN Call to connect timeout : Connection refused
ip-172-31-9-87:17672:18605 [4] NCCL INFO transport/net_socket.cu:138 -> 2
ip-172-31-9-87:17672:18605 [4] NCCL INFO bootstrap.cu:19 -> 2
ip-172-31-9-87:17672:18605 [4] NCCL INFO bootstrap.cu:195 -> 2
ip-172-31-9-87:17672:18605 [4] NCCL INFO init.cu:446 -> 2
ip-172-31-9-87:17672:18605 [4] NCCL INFO init.cu:593 -> 2
ip-172-31-9-87:17672:18605 [4] NCCL INFO misc/group.cu:69 -> 2 [Async thread]

ip-172-31-9-87:17671:18608 [0] include/socket.h:369 NCCL WARN Call to connect timeout : Connection refused
ip-172-31-9-87:17671:18608 [0] NCCL INFO transport/net_socket.cu:138 -> 2
ip-172-31-9-87:17671:18608 [0] NCCL INFO bootstrap.cu:19 -> 2
ip-172-31-9-87:17671:18608 [0] NCCL INFO bootstrap.cu:195 -> 2
ip-172-31-9-87:17671:18608 [0] NCCL INFO init.cu:446 -> 2
ip-172-31-9-87:17671:18608 [0] NCCL INFO init.cu:593 -> 2
ip-172-31-9-87:17671:18608 [0] NCCL INFO misc/group.cu:69 -> 2 [Async thread]

ip-172-31-9-87:17674:18611 [1] include/socket.h:369 NCCL WARN Call to connect timeout : Connection refused
ip-172-31-9-87:17674:18611 [1] NCCL INFO transport/net_socket.cu:138 -> 2
ip-172-31-9-87:17674:18611 [1] NCCL INFO bootstrap.cu:19 -> 2
ip-172-31-9-87:17674:18611 [1] NCCL INFO bootstrap.cu:195 -> 2
ip-172-31-9-87:17674:18611 [1] NCCL INFO init.cu:446 -> 2
ip-172-31-9-87:17674:18611 [1] NCCL INFO init.cu:593 -> 2
ip-172-31-9-87:17674:18611 [1] NCCL INFO misc/group.cu:69 -> 2 [Async thread]
ip-172-31-9-87:17675:17675 [7] NCCL INFO NET : Using interface lo:127.0.0.1<0>
ip-172-31-9-87:17675:17675 [7] NCCL INFO NET/IB : Using interface lo for sideband communication
ip-172-31-9-87:17675:17675 [7] NCCL INFO Using internal Network Socket
ip-172-31-9-87:17675:17675 [7] NCCL INFO rank 23 nranks 24
ip-172-31-9-87:17675:18614 [7] NCCL INFO comm 0x7fc7500551f0 rank 23 nranks 24
ip-172-31-9-87:17675:18614 [7] NCCL INFO NET : Using interface lo:127.0.0.1<0>
ip-172-31-9-87:17675:18614 [7] NCCL INFO NET : Using interface ens5:172.31.9.87<0>
ip-172-31-9-87:17675:18614 [7] NCCL INFO NET/Socket : 2 interfaces found

ip-172-31-9-87:17678:18612 [6] include/socket.h:369 NCCL WARN Call to connect timeout : Connection refused
ip-172-31-9-87:17678:18612 [6] NCCL INFO transport/net_socket.cu:138 -> 2
ip-172-31-9-87:17678:18612 [6] NCCL INFO bootstrap.cu:19 -> 2
ip-172-31-9-87:17678:18612 [6] NCCL INFO bootstrap.cu:195 -> 2
ip-172-31-9-87:17678:18612 [6] NCCL INFO init.cu:446 -> 2
ip-172-31-9-87:17678:18612 [6] NCCL INFO init.cu:593 -> 2
ip-172-31-9-87:17678:18612 [6] NCCL INFO misc/group.cu:69 -> 2 [Async thread]

ip-172-31-9-87:17675:18614 [7] include/socket.h:369 NCCL WARN Call to connect timeout : Connection refused
ip-172-31-9-87:17675:18614 [7] NCCL INFO transport/net_socket.cu:138 -> 2
ip-172-31-9-87:17675:18614 [7] NCCL INFO bootstrap.cu:19 -> 2
ip-172-31-9-87:17675:18614 [7] NCCL INFO bootstrap.cu:195 -> 2
ip-172-31-9-87:17675:18614 [7] NCCL INFO init.cu:446 -> 2
ip-172-31-9-87:17675:18614 [7] NCCL INFO init.cu:593 -> 2
ip-172-31-9-87:17675:18614 [7] NCCL INFO misc/group.cu:69 -> 2 [Async thread]
ip-172-31-9-87:17673:17673 [5] NCCL INFO NET : Using interface lo:127.0.0.1<0>
ip-172-31-9-87:17673:17673 [5] NCCL INFO NET/IB : Using interface lo for sideband communication
ip-172-31-9-87:17673:17673 [5] NCCL INFO Using internal Network Socket
ip-172-31-9-87:17673:17673 [5] NCCL INFO rank 21 nranks 24
ip-172-31-9-87:17673:18616 [5] NCCL INFO comm 0x7f84ec0551f0 rank 21 nranks 24
ip-172-31-9-87:17673:18616 [5] NCCL INFO NET : Using interface lo:127.0.0.1<0>
ip-172-31-9-87:17673:18616 [5] NCCL INFO NET : Using interface ens5:172.31.9.87<0>
ip-172-31-9-87:17673:18616 [5] NCCL INFO NET/Socket : 2 interfaces found

ip-172-31-9-87:17673:18616 [5] include/socket.h:369 NCCL WARN Call to connect timeout : Connection refused
ip-172-31-9-87:17673:18616 [5] NCCL INFO transport/net_socket.cu:138 -> 2
ip-172-31-9-87:17673:18616 [5] NCCL INFO bootstrap.cu:19 -> 2
ip-172-31-9-87:17673:18616 [5] NCCL INFO bootstrap.cu:195 -> 2
ip-172-31-9-87:17673:18616 [5] NCCL INFO init.cu:446 -> 2
ip-172-31-9-87:17673:18616 [5] NCCL INFO init.cu:593 -> 2
ip-172-31-9-87:17673:18616 [5] NCCL INFO misc/group.cu:69 -> 2 [Async thread]