ConnectX-2: mlx4_en: Port: 1, invalid mac burned: 0x0, quiting

Hello, I’m trying to use the mlx4_en driver instead of mlx4_ib to hopefully increase bandwidth in a filtering device that uses ConnectX-2 MT26428 adapters).

I did a fresh install of CentOS-6.2 since the infiniband adapters are running firmware 2.8 and the mlnx drivers that support this version have been tested with RHEL-6.2 according to the release notes. Kernel used is 2.6.32-358.2.1.el6.x86_64.

With the standard centos drivers (mlx4_core 1.1), I saw this in dmesg:

command 0xc failed: fw status = 0x40

And ‘modprobe mlx4_en’ didn’t create any ethernet devices. Modprobing mlx4_ib ib_sa ib_cm ib_umad ib_addr ib_uverbs ib_ipoib ib_ipath resulted in ib0 and ib1 showing up.

I downloaded mlnx_en-1.5.8.3.tgz (mlx4_1.5.7.2) from the download archives and mlx4_en still doesn’t create an ethernet device, but the error in dmesg is:

mlx4_core: Mellanox ConnectX core driver v1.0-mlnx_ofed1.5.3 (November 3, 2011)

mlx4_en: Mellanox ConnectX HCA Ethernet driver v1.5.7.2 (Dec 2011)

mlx4_en: Port: 1, invalid mac burned: 0x0, quiting

I’ve compiled and installed all versions of the drivers by

  • extracting the tgz from mellanox

  • rpm2cpio SRPMS/mellanox-mlnx-en-x.y.z.tgz

  • extract, run scripts/mlnx_en_patch.sh; 2 errors:

  • kernel_patches/backport/2.6.32-EL6.2/dma_mapping*.patch does not exist

  • kernel_patches/backport/2.6.32-EL6.2/memtrack*.patch does not exist

  • make → no errors, .ko files resulted

I’ve also tried mlnx_en drivers 1.5.9 and 1.5.10 from an ubuntu-server 12.04 (kernel 3.2) with the same results. Using the mlx4_ib driver, I could do netperf tests across two servers and the devices were functional.

Is there any other setup step required for using the mlx4_en driver ?

Just to add to yairi’s note, you can also flip from IB to Eth by setting this module parameter:

options mlx4_core port_type_array=“2,2”

or write directly in procfs:

echo eth > /sys/bus/pci/devices/0000:20:00.0/mlx4_port2

echo eth > /sys/bus/pci/devices/0000:20:00.0/mlx4_port1

Actually, it’s pretty easy to set/change the MAC (page 29 on http://www.mellanox.com/pdf/MFT/MFT_user_manual.pdf http://www.mellanox.com/pdf/MFT/MFT_user_manual.pdf )

mst start

Query the mac:

flint -d /dev/mst/mt25418_pci_cr0 -qq q

Change the mac:

flint -d /dev/mst/mt25418_pci_cr0 -mac 02c90abcdef0 sg

Change the GUID:

flint -d /dev/mst/mt25418_pci_cr0 -guid 0002c9000abcdef0 sg

flint -d /dev/mst/mt25418_pci_cr0 -qq q

Thanks for the hint. I just did that on both connected servers and then rmmod mlx4_en mlx4_core, then modprobe mlx4_en.

No mac-related errors in dmesg this time, with these versions:

ConnectX core driver v1.0-mlnx_ofed1.5.3 (November 3, 2011)

ConnectX HCA Ethernet driver v1.5.8.3 (June 2012)

But ifconfig -a still doesn’t show any additional adapters.

Firmware is still at 2.8 on both adapters, will try updating them next using mstflint since I can’t figure out which flag makes mlxburn accept a simple .bin firmware (that I can get from [1]).

[1] Mellanox Technologies: Support Firmware for ConnectX®-2 IB/VPI

Thanks, the flint -mac hints were spot on. I can now see the two eth* devices in ifconfig, but when doing:

machine1$ ifconfig eth0 192.192.168.1.1

machine2$ ifconfig eth0 192.192.168.1.2

this comes up in dmesg:

mlx4_en 0000:06:00.0: Activating port:1

mlx4_en: eth0: Using 16 TX rings

mlx4_en: eth0: Using 16 RX rings

mlx4_en: eth0: Initializing port

ADDRCONF(NETDEV_UP): eth0: link is not ready

And ping doesn’t work between the machines.

When setting port types to ib and using the ib0 devices, ping works, netperf tests work, etc (I started an opensm service on one machine). I’ve tried “ifconfig eth0 down” then back up on the machines with no success. I checked with ibdev2netdev that I was using the right eth device (ib0 <==> eth0).

I did “yum groupinstall Infiniband\ Support” and then configured things in /etc/rdma but got the same result (“link is not ready”).

Is there anything I missed ? The MLNX_OFED manual doesn’t list anything besides ifconfig under “4.1.9 A detailed example”.

Hi,

There could be few things going on here. here is my list ordered with most reasonable at top and go from there:

  1. Your HCA is configured to work with Infiniband and not with Eth. you will need to load the MellanoxOFED stack (because the tool we need for flipping this HCA back to Eth is there). then use tool “connectx_port_config” to configure both ports to be in Eth mode

  2. use the latest FW available for this card. the one you have (2.8.X) is too old and might give you grief later on.

Mellanox Technologies: Firmware Download NVIDIA Networking Firmaware Downloads

  1. recommending on using driver version 1.5.9 or.10 (but get above #1 and 2 done first).

let me know how things go…

Does this show eth after you reloaded the modules?

cat /sys/bus/pci/devices/0000:20:00.0/mlx4_port1

Hmmm, it kind of sounds like you’re just wanting to run the adapters in native 10 GbE mode instead of in IB mode.

That’s super simple to do if you’re using the Infiniband stuff that comes with CentOS. (the “Infiniband Support” yum group.) It’s just a setting you change in one of the /etc/rdma/ conf files.

and also (persistently) with /etc/infiniband/openib.con

The correct answer to “mlx4_en: Port: 1, invalid mac burned: 0x0, quiting”

is to write a MAC address into the firmware: flint -d /dev/mst/mt25418_pci_cr0 -mac 02c90abcdef0 sg

Installing the latest firmware does not solve that.

Runtime value is stored and can be changed in sysfs, it’s either datagram or connected

cat /sys/class/net/ib0/mode

connected

echo datagram > /sys/class/net/ib0/mode

Time for a dumb question from my side: the ib ports go into an “HP 4X QDR InfiniBand Switch Module for c-Class BladeSystem” (part number 489184-B21). Since I haven’t seen any mention in the switch’s specs for supporting ethernet mode, should I even attempt to use mlx4_en ? It seems to me that special support would be required in the switch ports for that, and the HP QDR switch is (ipo)ib - only.

In any case, the performance over ipoib went up from 3Gb/s to 11Gb/s curiously after I did this:

  • run a netperf via ipoib → 3Gb/s (100% CPU)

  • run a netperf via ipoib with SDP → 18Gb/s (100% CPU)

  • run a netperf via ipoib, no SDP → 11Gb/s (80% CPU)

so maybe enabling SDP flipped some setting that now gets me 11Gb/s at ~80% CPU. Which is plenty for what we need.

Thanks yairi, Sorin and Justin for your persistence in helping me get this set up.

Apparently the port-type gets reset to ib after I rmmod/modprobe mlx4_core:

echo eth > /sys/bus/pci/devices/0000:06:00.0/mlx4_port2

echo eth > /sys/bus/pci/devices/0000:06:00.0/mlx4_port1

dmesg | tail -n 3

mlx4_en: Mellanox ConnectX HCA Ethernet driver v1.5.8.3 (June 2012)

mlx4_en 0000:06:00.0: Activating port:1

mlx4_en: 0000:06:00.0: Port 1: Port: 1, invalid mac burned: 0x0, quiting

rmmod mlx4_en mlx4_core

modprobe mlx4_core

modprobe mlx4_en

cat /sys/bus/pci/devices/0000:06:00.0/mlx4_port*

ib

ib

Yes, that’s normal. You need to pass that parameter to modprobe:

modprobe mlx4_core port_type_array=“2,2”

Otherwise it defaults to “ib”

You can also set that options in /etc/modprobe.d/*.conf (or something similar for CentOS) so you don’t have to specify it every time you use modprobe.

But you still get that “invalid mac burned” error, which might be related to firmware.

Just updated firmware to 2.9.1000. ‘mstflint’ warned that PSID’s didn’t match (originally I had HP_0160000009, it got burned to MT_0D70110009). Same dmesg error about ‘invalid mac burned’ with this firmware and mlx4_en 1.5.8.3.

I can backup the HP_0160000009 firmware from another board and re-burn it if needed.

I did pass that port_type_array option to modprobe when testing.

I looked at the source code, and as odd as it sounds it looks like your adapter does not have a MAC address (or it’s set to 0x0)

ofa_kernel-1.5.3/drivers/net/mlx4/mlx4_en.h:

#define ILLEGAL_MAC(addr)(addr == 0xffffffffffffULL || addr == 0x0)

ofa_kernel-1.5.3/drivers/net/mlx4/en_netdev.c

priv->mac = mdev->dev->caps.def_mac[priv->port];

if (ILLEGAL_MAC(priv->mac)) {

mlx4_err(mdev, “Port: %d, invalid mac burned: 0x%llx, quiting\n”,

priv->port, priv->mac);

The serial number reported by lspci should include the mac address, I am curious if that is zero too.

lspci -vvv -s 06:00.0 | grep Serial

Capabilities: [148] Device Serial Number 00-02-c9-03-00-12-83-44

Try updating the FW of the card to the latest. i know that there should be an option in the newer Firmware to create the MAC our of the card’s number when the FW starts.

give it a try…

Are the ports connected to an Ethernet switch now?

Can you check the configuration of the switch ports, speed, autonegotiation, and make sure the switchports are “no shutdown”.

You can also crossover connect two machines, and if that works then the switch might be the problem.

Sorry for the dumb question but why is your ipv4 address contains 5 segments?

was this a mistake?

machine1$ ifconfig eth0 192.192.168.1.1

machine2$ ifconfig eth0 192.192.168.1.2

Because more is better J

I edited the commands in the reply, trying to make it clear what I did. Ifconfig doesn’t accept 5-segment ipv4 addresses (it outputs ‘unknown host’).

Thanks for the list of things to try w.r.t. the switch. The issue is most likely there, I’ll let you know how things progress.

EDIT: I compiled earlier a libsdp.so and started netperf with the lib LD_PRELOAD’ed. The result was about 18Gb/s (at 100%CPU), which was encouraging.