How to use NCCL2 to communicate other server?

One of the greatest features of NCCL2 is multi-node communication.

I read the NCCL2 developer guide and those examples.
It seems that the examples only show the way how NCCL2 works in a single server in different ways. It doesn’t show the way how to communicate with multi servers.

For example I have 2 servers, each has 8 GPUs. 2 servers are connected with in 10Gbps Ethernet.
How to use NCCL2 to get those 16 GPUs works togather?

Hi Celebi,

the examples show how to do that using MPI.

However, have you had any news about how the inter-GPU communication using NCCL2 across different hosts without using MPI works?

Thanks.