How to configure two HCAs in a compute node?

Hi,

I am trying to install 2 Connect-X3 HCAs in a compute node. Each HCA was inserted to corresponding PCIe slot of a CPU and I have installed Ubuntu 16.04 LTS.

Ubuntu detects the HCAs and I can see them with sudo lspci | grep “Mellanox”.

I am doing this because I would like to install 4 Xeon Phi co-processor to the node and each Phi will be linked to respective HCA of the same CPU. Each Phi needs an IP in the cluster.

For example,

HCA1 ib0 192.168.8.7, Phi_0 192.168.8.70, Phi_1 192.168.8.71

HCA2 ib0 192.168.8.8, Phi_2 192.168.8.80, Phi_3 192.168.8.81

I will install MLNX_OFED_LINUX-3.4-2.0.0.0-ubuntu16.04-x86_64 on the node, so what is the best practice to achieve such configuration?

My experience with phi installation on Ubuntu 16.04 is successful.

How to setup 4 Xeon Phi 7120P in a single node via Infiniband? Software - Intel Communities

Thank you,

Rolly

Hi Rolly,

Could you clarify if there is something not working properly and is Mellanox related?

Hi alkx,

Thank you.

The problem of Xeon Phi is Intel related. I am having compatibility issues on OS, Intel MPSS and Mellanox OFED.

Now I have Ubuntu 16.04, Intel MPSS 3.8.1, Mellanox OFED 4.0.1.0.1.0 but this combination seems not fully functional.

I am planning to change to CentOS 7, Intel MPSS 3.8.1, Openfabric OFED 3.18

But if I have 2 HCAs in a node, what should I modified in the /etc/hostname so I can assess the node via both HCAs to enhance the bandwidth?

Best,

Rolly

I would recommend to stay with configuration that supported by Intel and use MOFED/OS combination mentioned in documentation.

Hi Rolly,

Unfortunately, Intel MPSS is out of the Mellanox support scope. However, I can help you with HCA/MOFED questions. If you use TCP/IP, for Ethernet you can configure bonding in 802.3ad (LAG) mode in order to have more bandwidth. For InfiniBand it is different - the only active/backup configuration is supported. So, if you need to use single IP and need to use bonding module.

Hi alkx,

Thanks. I see what you mean. I will stick to IPoIB and manually assign each IP to respective HCA ports.

So, it will look like:

ib0: 192.168.8.70

ib1: 192.168.8.71

ib2: 192.168.8.80

ib3: 192.168.8.81

But, the host itself will be also connected via IPoIB in the cluster, I am not sure if this works with the Intel ofed-mic?

I have switched to CentOS 7.3 in order to get the Intel MPSS 3.8.1 installed, but it seems demanding MLNX-OFED 2.4 which is outdated for CentOS 7.3.

Besides, OFED-3.18.2 failed to compile for CentOS 7.3 too .

I am stuck a the moment.

CentOS 7.3 fail to build ofed-mic with MPSS-3.8.1 Software - Intel Communities

Could you please take a look?

Thank you,

Rolly