Getting eIPoIB to work ?

OK, I’ve been trying to setup eIPoIB. I have my infiniband network up, ib0 is setup for ipoib I can see the new eth2 interface (virtual eIPoIB device) but nothing in the vifs, so I can’t ping from compute node to compute node.

Welcome to Ubuntu 12.04.2 LTS (GNU/Linux 3.2.0-39-generic x86_64) * Documentation: https://help.ubuntu.com/ System information as of Thu May 9 22:31:11 EST 2013 System load: 1.83 Users logged in: 0 Usage of /: 2.7% of 59.39GB IP address for eth0: 192.168.10.101 Memory usage: 0% IP address for ib0: 10.10.10.101 Swap usage: 0% IP address for eth2: 20.20.20.101 Processes: 112 Graph this data and manage this system at https://landscape.canonical.com/ Last login: Thu May 9 21:55:22 2013 from maas.local ubuntu@blade01:~$ sudo su - root@blade01:~# cat /sys/class/net/eth2/eth/vifs root@blade01:~# root@blade01:~# root@blade01:~# ibstat CA 'mlx4_0' CA type: MT25418 Number of ports: 2 Firmware version: 2.8.0 Hardware version: a0 Node GUID: 0x001b78ffff33ee58 System image GUID: 0x001b78ffff33ee5b Port 1: State: Active Physical state: LinkUp Rate: 20 Base lid: 5 LMC: 0 SM lid: 3 Capability mask: 0x02510868 Port GUID: 0x001b78ffff33ee59 Link layer: InfiniBand Port 2: State: Active Physical state: LinkUp Rate: 20 Base lid: 2 LMC: 0 SM lid: 1 Capability mask: 0x02510868 Port GUID: 0x001b78ffff33ee5a Link layer: InfiniBand root@blade01:~# ifconfig ib0 ib0 Link encap:UNSPEC HWaddr A0-00-01-00-FE-80-00-00-00-00-00-00-00-00-00-00 inet addr:10.10.10.101 Bcast:10.10.10.255 Mask:255.255.255.0 inet6 addr: fe80::21b:78ff:ff33:ee59/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:65520 Metric:1 RX packets:6 errors:0 dropped:0 overruns:0 frame:0 TX packets:30 errors:0 dropped:8 overruns:0 carrier:0 collisions:0 txqueuelen:1024 RX bytes:1515 (1.5 KB) TX bytes:5752 (5.7 KB) root@blade01:~# ping 10.10.10.102 PING 10.10.10.102 (10.10.10.102) 56(84) bytes of data. 64 bytes from 10.10.10.102: icmp_req=2 ttl=64 time=2.24 ms 64 bytes from 10.10.10.102: icmp_req=3 ttl=64 time=0.033 ms ^C --- 10.10.10.102 ping statistics --- 3 packets transmitted, 2 received, 33% packet loss, time 2000ms rtt min/avg/max/mdev = 0.033/1.141/2.249/1.108 ms root@blade01:~# ifconfig eth2 eth2 Link encap:Ethernet HWaddr 00:1b:78:33:ee:59 inet addr:20.20.20.101 Bcast:20.20.20.255 Mask:255.255.255.0 inet6 addr: fe80::21b:78ff:fe33:ee59/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) root@blade01:~# ping 20.20.20.102 PING 20.20.20.102 (20.20.20.102) 56(84) bytes of data. ^C --- 20.20.20.102 ping statistics --- 3 packets transmitted, 0 received, 100% packet loss, time 2015ms root@blade01:~# cat /sys/class/net/eth2/eth/vifs root@blade01:~# ???? root@blade01:~# ethtool -i eth2 driver: eth_ipoib version: 1.0.0 firmware-version: 1 bus-info: ib0 supports-statistics: yes supports-test: no supports-eeprom-access: no supports-register-dump: no root@blade01:~#

Thanks for the info , I don’t have access to my lab on the weekend but I’ll have to check it out when back in the office. Do you have the details of what you changed on your ubuntu system ?

iliyasa@xenon.com.au Infrastructure & Networking - NVIDIA Developer Forums - Ugh, that sounds like a bug. It might be worth explicitly emailing the Mellanox support team (support@mellanox.com mailto:support@mellanox.com ) to let them know about it. (eddie.notz Infrastructure & Networking - NVIDIA Developer Forums - That’s the correct approach isn’t it?)

In the meantime, with the ipoibd daemon startup bit that’s rejecting the OS… does it seem like a binary file or does it look like a shell script gets run first? Kind of thinking the OS check might be in a shell script before the proper daemon gets launched (reasonably common). If that’s the case you might be able to edit the script for now to accept your OS version. (make a backup of it first, etc)

To bypass ipoibd service, please follow this document:

Infrastructure & Networking - NVIDIA Developer Forums https://community.mellanox.com/s/article/eipoib-manual-configuration

Ok I’ll test it on Monday and post back the results, thanks again

ok thanks to a google with another 3 commands on each node I can now using the eipoib interfaces.

the first command creates the ib0 sub interface in this case ib0.1

root@blade01:~# echo .1 > /sys/class/net/ib0/create_child root@blade01:~# ifconfig ib0.1 ib0.1 Link encap:UNSPEC HWaddr A0-00-01-10-FE-80-00-00-00-00-00-00-00-00-00-00 UP BROADCAST RUNNING SLAVE MULTICAST MTU:2044 Metric:1 RX packets:1986136 errors:0 dropped:0 overruns:0 frame:0 TX packets:5610181 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1024 RX bytes:79455932 (79.4 MB) TX bytes:24752145636 (24.7 GB) root@blade01:~#

then the last two commands completed the enslavement

root@blade01:~# echo +ib0.1 > /sys/class/net/eth2/eth/slaves root@blade01:~# echo +ib0.1 00:1b:78:33:6e:95 > /sys/class/net/eth2/eth/vifs

And speed tests are similar to normal ipoib withour larger mtu or SDP setup

root@blade01:~# netperf -H 20.20.20.102 -c -C -- -m 1400 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 20.20.20.102 (20.20.20.102) port 0 AF_INET : demo Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB 87380 65536 1400 10.00 5388.11 21.91 25.73 1.333 1.565 root@blade01:~#

It was basically the initial distribution test, and then i removed the full paths in all of the calls to external commands.

Hi drolfe,

Indeed, the ipoib daemon (ipoibd) should create and enslave the ibX.Y interfaces automatically. No need for user interference.

Please make sure that it’s running.

You can also try to edit /etc/init.d/ipoibd and enable the debug flag to see if it’s really running, and whether it has any unexpected issues.

www.mellanox.com/page/products_dyn?product_family=26

First start with the mellanox driver 2.0 which now has support for ubuntu out of the box

the binary path issue should be fixed in the next release I’m told

Thanks nldesai,

We’ve captured your feedback, and will fix ipoibd in the next release.

My question now is why wasn’t the sub ib0.1 interface enslaving setup for me as the user manual says the below:

The IPoIB daemon (ipoibd) detects the new VIFs and creates a new IPoIB instances, as a result number of IPoIB interfaces (ibX.Y) are shown as being created/destroyed, and are being enslaved to the corresponding ethX interface to serve any active VIF in the system according to the set configuration, This process is done automatically by the ipoibd service.”

or do you just have to manually do the PIF but the VIF’s are auto setup ?

Unfortunately I have an Infinihost device with chipset MT23108…

From release notes I understood that it is deprecated from mlnx 1.5.3 onward … see below release notes for 2.0, 1.5.3 and 1.5.2

Does anyone know if inifinihos MT23108 are really really unsopported after 1.5.2?

What can I do to make eth_ipoib work with infinihost devices?

Thanks,

Giovanni.

===============================================================================

http://www.mellanox.com/related-docs/prod_software/Mellanox_OFED_Linux_Release_Notes_2_0-2_0_5.txt http://www.mellanox.com/related-docs/prod_software/Mellanox_OFED_Linux_Release_Notes_2_0-2_0_5.txt

MLNX_OFED_LINUX 2.0 supports the following adapters:

  • Mellanox Technologies HCAs:

  • ConnectX-3 (Rev 2.11.0500 and above)

  • ConnectX-2 (Rev 2.9.1200 and above)

  • Connect-IB (Rev 10.0.2400 and above)

===============================================================================

http://www.mellanox.com/related-docs/prod_software/Mellanox_OFED_Linux_Release_Notes_1_5_3-4_0_35.txt http://www.mellanox.com/related-docs/prod_software/Mellanox_OFED_Linux_Release_Notes_1_5_3-4_0_35.txt

Mellanox supports the following adapters with MLNX_OFED_LINUX 1.5.3:

  • Mellanox Technologies HCAs (SDR and DDR Modes are Supported):

  • ConnectX-3 (fw-4099 Rev 2.11.0500)

  • ConnectX-2/ConnectX-2 EN (fw-ConnectX2 Rev 2.9.1200)

===============================================================================

Mellanox supports the following adapters with OFED 1.5.2:

  • Mellanox Technologies HCAs (SDR and DDR Modes are Supported):

  • InfiniHost(R) (fw-23108 Rev 3.5.000)

  • InfiniHost(R) III Ex (MemFree: fw-25218 Rev 5.3.000

with memory: fw-25208 Rev 4.8.200)

  • InfiniHost(R) III Lx (fw-25204 Rev 1.2.000)

  • ConnectX(R) and ConnectX EN (fw-25408 Rev 2.8.0600)

  • ConnectX-2 (fw-ConnectX2 Rev 2.8.0600)

  • ConnectX-2 EN (fw-ConnectX2 Rev 2.8.0600)

Note: InfiniHost adapters will be deprecated in the next MLNX_OFED release.

Hi Justin,

CentOS 6.3 is in the accepted OS. As well it gets mentioned in the release notes as compatible OS.

I have already created a case, so I’ll see what I get there.

Hi drolfe,

I’m trying too to get eipoib to work on ubuntu 12.04.

In the end, did you succeed to have eipoib working? Can you add your infiniband hca as an interface for a bridge device?

I’m stuck in compiling eth_ipoib kernel module…

Did you start from OFED download right? which version?

Can you please paste here the procedure or better your linux history of commands to download, edit, and compile both the kernel module and the ipoibd daemon?

Thanks very much in advance,

Giovanni

ipoibd is in the public beta is broken on ubuntu. I needed to hack it in order to make it run. Even when it is properly configured, the daemon immediately exits due to a check if the operating system is supported. Ubuntu is not. If you comment out this guard, the daemon starts, but has a bunch of hardcoded paths to executables that the daemon is calling that are in different locations on ubuntu than on redhat. Once these are fixed, the daemon works properly.

Hi,

I would also like to get eIPoIB up&running. I have a couple of servers with CentOS 6.4 (2.6.32-358.6.2.el6.x86_64) and ConnectX-3 dual-port adapters. I’ve installed latest OFED-2 (I had to add my kernel support with “./mlnx_add_kernel_support.sh -m . -v”). IPoIB works fine.

Then I followed the instructions in the manual and enabled eIPoIB by adding “E_IPOIB_LOAD=yes” to /etc/infiniband/openib.conf and restarted InfiniBand drivers by /etc/init.d/openibd restart.

Then manual says “When eth_ipoib is loaded,”… ok, how do I know if “eth_ipoib is loaded”? Is this supposed to be a kernel module?

modprobe eth_ipoib

FATAL: Module eth_ipoib not found

Also, OFED didn’t install /etc/init.d/ipoibd, but I found one in /usr/src/ofa_kernel-2.0/ofed_scripts/ipoibd …

Obviously something hasn’t been installed correctly? Do I have to compile&install it manually from /usr/src/ofa_kernel-2.0/drivers/net/eipoib/?

Thanks for help!

Hi,

I’m having the same issue as kenshiro. Has there been any updates? Although I’m using CentOS 6.3 64bit with MLNX_OFED_LINUX-2.0-2.0.5-rhel6.3-x86_64.

What I found was when trying to manually start the ipoibd daemon is gives me an error “This OS is not supported”.

Thanks,

Iliyas.

Hi, Iliyasa … have found a solution for building eIPoIB modules and drivers on centos63/centos64? I would really like to get this working so any help would be very much appreciated! Thanks!

Thanks ill check it out :-/