I am trying to install 2 Connect-X3 HCAs in a compute node. Each HCA was inserted to corresponding PCIe slot of a CPU and I have installed Ubuntu 16.04 LTS.
Ubuntu detects the HCAs and I can see them with sudo lspci | grep “Mellanox”.
I am doing this because I would like to install 4 Xeon Phi co-processor to the node and each Phi will be linked to respective HCA of the same CPU. Each Phi needs an IP in the cluster.
HCA1 ib0 192.168.8.7, Phi_0 192.168.8.70, Phi_1 192.168.8.71
HCA2 ib0 192.168.8.8, Phi_2 192.168.8.80, Phi_3 192.168.8.81
I will install MLNX_OFED_LINUX-3.4-188.8.131.52-ubuntu16.04-x86_64 on the node, so what is the best practice to achieve such configuration?
My experience with phi installation on Ubuntu 16.04 is successful.
How to setup 4 Xeon Phi 7120P in a single node via Infiniband? https://software.intel.com/en-us/forums/intel-many-integrated-core/topic/712607
Could you clarify if there is something not working properly and is Mellanox related?
The problem of Xeon Phi is Intel related. I am having compatibility issues on OS, Intel MPSS and Mellanox OFED.
Now I have Ubuntu 16.04, Intel MPSS 3.8.1, Mellanox OFED 184.108.40.206.1.0 but this combination seems not fully functional.
I am planning to change to CentOS 7, Intel MPSS 3.8.1, Openfabric OFED 3.18
But if I have 2 HCAs in a node, what should I modified in the /etc/hostname so I can assess the node via both HCAs to enhance the bandwidth?
I would recommend to stay with configuration that supported by Intel and use MOFED/OS combination mentioned in documentation.
Unfortunately, Intel MPSS is out of the Mellanox support scope. However, I can help you with HCA/MOFED questions. If you use TCP/IP, for Ethernet you can configure bonding in 802.3ad (LAG) mode in order to have more bandwidth. For InfiniBand it is different - the only active/backup configuration is supported. So, if you need to use single IP and need to use bonding module.
Thanks. I see what you mean. I will stick to IPoIB and manually assign each IP to respective HCA ports.
So, it will look like:
But, the host itself will be also connected via IPoIB in the cluster, I am not sure if this works with the Intel ofed-mic?
I have switched to CentOS 7.3 in order to get the Intel MPSS 3.8.1 installed, but it seems demanding MLNX-OFED 2.4 which is outdated for CentOS 7.3.
Besides, OFED-3.18.2 failed to compile for CentOS 7.3 too .
I am stuck a the moment.
CentOS 7.3 fail to build ofed-mic with MPSS-3.8.1 https://software.intel.com/en-us/forums/intel-many-integrated-core/topic/715108
Could you please take a look?