ConnectX-En 10G crash under load :-(

Dear sir,

I built a small server for development + office purpose. That server has:

1x Supermicro X9DRD-7ln4f

2x xeon e5 2620.

96gb ram.

5x Lsi sas2008.

4x intel i350 1gbit

1x mellanox connect-x en 10gbit.

Esxi 5.1.

OpenIndiana,Centos, Ubuntu, Win2k8 run as VMs.

Server’s uptime is ~ 27 days.

It shows PSOD on 4-4-2013, during our devs stress test session.

Crash dump says: mlx4_en: Internal error detected

We didn’t have esxi support, so i can’t raise this issue to vmware.

I upload crash dump here, hoping someone at Mellanox come by, and give some hints to prevent this issue from happening again.

Thank you.

mlx4_en.7z.zip (70.7 KB)

Dear yairi,

It is MNPH29D-XTR.

I’ve flashed to this firmware:

http://www.mellanox.com/downloads/firmware/fw-ConnectX2-rel-2_9_1200-MNPH29D_A2-A5-FlexBoot-3.3.400.bin.zip http://www.mellanox.com/downloads/firmware/fw-ConnectX2-rel-2_9_1200-MNPH29D_A2-A5-FlexBoot-3.3.400.bin.zip

I’m using latest esxi driver for connectx-2 card:

https://my.vmware.com/web/vmware/details/dt_esxi50_mellanox_connectx/dHRAYnRqdEBiZHAlZA== https://my.vmware.com/web/vmware/details/dt_esxi50_mellanox_connectx/dHRAYnRqdEBiZHAlZA==

Thanks.

yes, should be fine. use the version i provided to you (1.8.1). It includes the Ethernet adapter driver within.

Hi dualamd,

Please refer to Mellanox Technologies: VMware: Firmware - Driver Compatibility Matrix http://www.mellanox.com/page/vmware_matrix?mtag=vmware_drivers for the latest links for Firmware/Driver.

If you want to use your adapter as an Ethernet adapter, then the version you originally used is fine (1.6.1), any PSOD you get with this version should be reported to support@mellanox.com mailto:support@mellanox.com .

If you want to use the adapter as an InfiniBand adapter, then please use version 1.8.1 as yairi suggested (it’s also listed i the matrix url above), the module responsible for showing the network adapters (to make use of vmnic uplink over InfiniBand fabric) is called ipoib, the user manual under Mellanox Products: Mellanox OFED Driver for VMware® ESXi Server http://www.mellanox.com/page/products_dyn?&product_family=36&mtag=vmware_drivers has instructions on how to enable/use it. please also note you must have a functional InfiniBand fabric with Subnet Manager entity (more details in the user manual).

The link you provided for ESXi points to an older Mellanox driver from 2011. You should try using the latest one:

Mellanox Products: Mellanox OFED Driver for VMware® ESXi Server http://www.mellanox.com/page/products_dyn?&product_family=36&mtag=vmware_drivers

Hi Dualamd,

what is the Mellanox ConnectX card model number? do you know what FW level it runs?

also, what is the Mellanox driver version you are using?

for all the above, i suggest using the latest versions; that would probably help.

I’m using esxi driver from this page:

http://www.mellanox.com/page/products_dyn?product_family=29 http://www.mellanox.com/page/products_dyn?product_family=29

Your link is for IB/VPI adapter. Could IB driver be used for ethernet adapter?

Thanks.

I have tried to install 1.8.1 ofed driver on a freshly installed esxi machine. Esxi shows mt26448 as storage adapter, nothing appears on network adapter.

What should i do now?

Regards.

I have another PSOD last week, i have to pull all connectx2-en out of company servers.

We only have 10G ethernet switch, therefore 1.6.1.2 is my only choice.

I will try to reproduce this issue.

Thank you ali.

Hi Dualamd,

Please open a ticket with Mellanox support mailto://support@mellanox.com/ for looking into the PSOD.

can you post the PSOD output here?