One of the greatest features of NCCL2 is multi-node communication.
I read the NCCL2 developer guide and those examples.
It seems that the examples only show the way how NCCL2 works in a single server in different ways. It doesn’t show the way how to communicate with multi servers.
For example I have 2 servers, each has 8 GPUs. 2 servers are connected with in 10Gbps Ethernet.
How to use NCCL2 to get those 16 GPUs works togather?