Hello, I’m trying to use the mlx4_en driver instead of mlx4_ib to hopefully increase bandwidth in a filtering device that uses ConnectX-2 MT26428 adapters).
I did a fresh install of CentOS-6.2 since the infiniband adapters are running firmware 2.8 and the mlnx drivers that support this version have been tested with RHEL-6.2 according to the release notes. Kernel used is 2.6.32-358.2.1.el6.x86_64.
With the standard centos drivers (mlx4_core 1.1), I saw this in dmesg:
command 0xc failed: fw status = 0x40
And ‘modprobe mlx4_en’ didn’t create any ethernet devices. Modprobing mlx4_ib ib_sa ib_cm ib_umad ib_addr ib_uverbs ib_ipoib ib_ipath resulted in ib0 and ib1 showing up.
I downloaded mlnx_en-1.5.8.3.tgz (mlx4_1.5.7.2) from the download archives and mlx4_en still doesn’t create an ethernet device, but the error in dmesg is:
mlx4_en: Port: 1, invalid mac burned: 0x0, quiting
I’ve compiled and installed all versions of the drivers by
extracting the tgz from mellanox
rpm2cpio SRPMS/mellanox-mlnx-en-x.y.z.tgz
extract, run scripts/mlnx_en_patch.sh; 2 errors:
kernel_patches/backport/2.6.32-EL6.2/dma_mapping*.patch does not exist
kernel_patches/backport/2.6.32-EL6.2/memtrack*.patch does not exist
make → no errors, .ko files resulted
I’ve also tried mlnx_en drivers 1.5.9 and 1.5.10 from an ubuntu-server 12.04 (kernel 3.2) with the same results. Using the mlx4_ib driver, I could do netperf tests across two servers and the devices were functional.
Is there any other setup step required for using the mlx4_en driver ?
But ifconfig -a still doesn’t show any additional adapters.
Firmware is still at 2.8 on both adapters, will try updating them next using mstflint since I can’t figure out which flag makes mlxburn accept a simple .bin firmware (that I can get from [1]).
Thanks, the flint -mac hints were spot on. I can now see the two eth* devices in ifconfig, but when doing:
machine1$ ifconfig eth0 192.192.168.1.1
machine2$ ifconfig eth0 192.192.168.1.2
this comes up in dmesg:
mlx4_en 0000:06:00.0: Activating port:1
mlx4_en: eth0: Using 16 TX rings
mlx4_en: eth0: Using 16 RX rings
mlx4_en: eth0: Initializing port
ADDRCONF(NETDEV_UP): eth0: link is not ready
And ping doesn’t work between the machines.
When setting port types to ib and using the ib0 devices, ping works, netperf tests work, etc (I started an opensm service on one machine). I’ve tried “ifconfig eth0 down” then back up on the machines with no success. I checked with ibdev2netdev that I was using the right eth device (ib0 <==> eth0).
I did “yum groupinstall Infiniband\ Support” and then configured things in /etc/rdma but got the same result (“link is not ready”).
Is there anything I missed ? The MLNX_OFED manual doesn’t list anything besides ifconfig under “4.1.9 A detailed example”.
There could be few things going on here. here is my list ordered with most reasonable at top and go from there:
Your HCA is configured to work with Infiniband and not with Eth. you will need to load the MellanoxOFED stack (because the tool we need for flipping this HCA back to Eth is there). then use tool “connectx_port_config” to configure both ports to be in Eth mode
use the latest FW available for this card. the one you have (2.8.X) is too old and might give you grief later on.
Hmmm, it kind of sounds like you’re just wanting to run the adapters in native 10 GbE mode instead of in IB mode.
That’s super simple to do if you’re using the Infiniband stuff that comes with CentOS. (the “Infiniband Support” yum group.) It’s just a setting you change in one of the /etc/rdma/ conf files.
Time for a dumb question from my side: the ib ports go into an “HP 4X QDR InfiniBand Switch Module for c-Class BladeSystem” (part number 489184-B21). Since I haven’t seen any mention in the switch’s specs for supporting ethernet mode, should I even attempt to use mlx4_en ? It seems to me that special support would be required in the switch ports for that, and the HP QDR switch is (ipo)ib - only.
In any case, the performance over ipoib went up from 3Gb/s to 11Gb/s curiously after I did this:
run a netperf via ipoib → 3Gb/s (100% CPU)
run a netperf via ipoib with SDP → 18Gb/s (100% CPU)
run a netperf via ipoib, no SDP → 11Gb/s (80% CPU)
so maybe enabling SDP flipped some setting that now gets me 11Gb/s at ~80% CPU. Which is plenty for what we need.
Thanks yairi, Sorin and Justin for your persistence in helping me get this set up.
Just updated firmware to 2.9.1000. ‘mstflint’ warned that PSID’s didn’t match (originally I had HP_0160000009, it got burned to MT_0D70110009). Same dmesg error about ‘invalid mac burned’ with this firmware and mlx4_en 1.5.8.3.
I can backup the HP_0160000009 firmware from another board and re-burn it if needed.
I did pass that port_type_array option to modprobe when testing.
Try updating the FW of the card to the latest. i know that there should be an option in the newer Firmware to create the MAC our of the card’s number when the FW starts.