Trouble with ConnectX-3 VPI adapter card over XenServer 6.2 (Service Pack 1)

Hi all,

we are trying to use a ConnectX-3 VPI adapter card over XenServer 6.2, for this we have followed the following steps:

  1. Install XenServer 6.2 on my system (Supermicro 1027GR-TRF)

  2. Install XenServer 6.2 updates (service pack 1) following the directions from http://support.citrix.com/article/CTX138115#XenServer 6.2 http://support.citrix.com/article/CTX138115#XenServer%206.2

  3. Install MLNX_OFED 2.1-1.0.6 and update firmware (./mlnxofedinstall --force-fw-update). Installation and update have been completed without errors. The firmware version is 2.30.8000.

After that, we have restarted the system and the openibd service but the InfiniBand network interfaces have not been detected:

[root@xenserver ~]# /etc/init.d/openibd restart

hostname: `Host’ unknown

Unloading HCA driver: [ OK ]

Loading HCA driver and Access Layer: [ OK ]

Setting up InfiniBand network interfaces:

Setting up service network . . . [ done ]

I.e. the ib0 network interface has not been detected by the system.

Do you know if XenServer 6.2 works with ConnectX-3? Could you please get some information about that? We have devoted a lot of time and now we wonder whether it really works.

Hi iliyasa, thank you for your reply.

I can not install the version 1.5.3-4.0.42. I have installed the Driver Development Kit for XenServer 6.2.0 (Service Pack 1) and I’m trying to build ofed modules against the 2.6.32.43-0.4.1.xs1.8.0.847.170785xen kernel but I get the following error:

./mlnx_add_kernel_support.sh -m /mnt/tmp/MLNX_OFED_LINUX-1.5.3-4.0.42-xenserver-i686/ -t /mnt/tmp/temp --make-tgz -v

Note: This program will create MLNX_OFED_LINUX TGZ for rhel5.7 under /tmp directory.

All Mellanox, OEM, OFED, or Distribution IB packages will be removed.

Do you want to continue?[y/N]:y

See log file /tmp/mlnx_ofed_iso.4237.log

Detected MLNX_OFED_LINUX-1.5.3-4.0.42

Running cp -a /mnt/tmp/MLNX_OFED_LINUX-1.5.3-4.0.42-xenserver-i686/ /mnt/tmp/temp/mlnx_iso.4237/MLNX_OFED_LINUX-1.5.3-4.0.42-rhel5.7-i686

Running tar xzf /mnt/tmp/temp/mlnx_iso.4237/MLNX_OFED_LINUX-1.5.3-4.0.42-rhel5.7-i686/src/MLNX_OFED_SRC-1.5.3-4.0.42.tgz

Building OFED RPMs. Please wait…

Running MLNX_OFED_SRC-1.5.3-4.0.42/install.pl -c /mnt/tmp/temp/mlnx_iso.4237/ofed.conf --kernel 2.6.32.43-0.4.1.xs1.8.0.847.170785xen --kernel-sources /lib/modules/2.6.32.43-0.4.1.xs1.8.0.847.170785xen/build/ --builddir /mnt/tmp/temp/mlnx_iso.4237 --disable-kmp

ERROR: Failed executing “MLNX_OFED_SRC-1.5.3-4.0.42/install.pl -c /mnt/tmp/temp/mlnx_iso.4237/ofed.conf --kernel 2.6.32.43-0.4.1.xs1.8.0.847.170785xen --kernel-sources /lib/modules/2.6.32.43-0.4.1.xs1.8.0.847.170785xen/build/ --builddir /mnt/tmp/temp/mlnx_iso.4237 --disable-kmp”

ERROR: See /tmp/mlnx_ofed_iso.4237.log

The log file does not show additional info:

cxgb3 is not available on this platform

qib is not available on this platform

knem is not available on this platform

ib-bonding is not available on this platform

Below is the list of OFED packages that you have chosen

(some may have been added by the installer due to package dependencies):

ofed-scripts

kernel-ib

kernel-ib-devel

kernel-mft

Build ofed-scripts RPM

Running rpmbuild --rebuild --define ‘_topdir /mnt/tmp/temp/mlnx_iso.4237/OFED_topdir’ --define ‘dist %{nil}’ --target i386 --define ‘_prefix /usr’ --define ‘_exec_prefix /usr’ --define ‘_sysconfdir /etc’ --define ‘_usr /usr’ /mnt/tmp/temp/mlnx_iso.4237/MLNX_OFED_SRC-1.5.3-4.0.42/SRPMS/ofed-scripts-1.5.3-OFED.1.5.3.4.0.42.src.rpm

Install ofed-scripts RPM:

Running rpm -iv /mnt/tmp/temp/mlnx_iso.4237/MLNX_OFED_SRC-1.5.3-4.0.42/RPMS/centos-release-5-7.el5.centos/i686/ofed-scripts-1.5.3-OFED.1.5.3.4.0.42.i386.rpm

Build ofa_kernel RPM

Running rpmbuild --rebuild --define ‘_topdir /mnt/tmp/temp/mlnx_iso.4237/OFED_topdir’ --define ‘_target_cpu i686’ --nodeps --define ‘_dist .rhel5u7’ --define ‘configure_options --with-core-mod --with-user_mad-mod --with-user_access-mod --with-addr_trans-mod --with-mthca-mod --with-mlx4-mod --with-mlx4_en-mod --with-mlx4_ib-mod --with-mlx4_vnic-mod --with-nes-mod --with-ipoib-mod --with-sdp-mod --with-srp-mod --with-rds-mod’ --define ‘build_kernel_ib 1’ --define ‘build_kernel_ib_devel 1’ --define ‘KVERSION 2.6.32.43-0.4.1.xs1.8.0.847.170785xen’ --define ‘K_SRC /lib/modules/2.6.32.43-0.4.1.xs1.8.0.847.170785xen/build/’ --define ‘network_dir /etc/sysconfig/network-scripts’ --define ‘_prefix /usr’ --define ‘__arch_install_post %{nil}’ /mnt/tmp/temp/mlnx_iso.4237/MLNX_OFED_SRC-1.5.3-4.0.42/SRPMS/ofa_kernel-1.5.3-OFED.1.5.3.4.0.42.g3cb72fe.src.rpm

kernel-ib was not created

Hello Iliyasa,

yes, I installed the additional packages required by the driver, I used the CentOS Base.repo for this, is that correct?

There seems to be a problem loading the mlx4_core module, dmesg looks like this:

[ 2668.552338] mlx4_core: Mellanox ConnectX core driver v1.1 (Apr 29 2014)

[ 2668.552341] mlx4_core: Initializing 0000:01:00.0

[ 2668.552429] mlx4_core 0000:01:00.0: PCI INT A → GSI 16 (level, low) → IRQ 16

[ 2668.552538] mlx4_core 0000:01:00.0: setting latency timer to 64

[ 2674.974496] mlx4_core 0000:01:00.0: get owner: 7ff0

[ 2674.976285] mlx4_core 0000:01:00.0: irq 1265 (313) for MSI/MSI-X

[ 2674.976287] mlx4_core 0000:01:00.0: get owner: 7ff0

[ 2674.977975] mlx4_core 0000:01:00.0: irq 1264 (312) for MSI/MSI-X

[ 2674.977977] mlx4_core 0000:01:00.0: get owner: 7ff0

[ 2674.979637] mlx4_core 0000:01:00.0: irq 1263 (311) for MSI/MSI-X

[ 2674.979639] mlx4_core 0000:01:00.0: get owner: 7ff0

[ 2674.981309] mlx4_core 0000:01:00.0: irq 1262 (310) for MSI/MSI-X

[ 2674.981311] mlx4_core 0000:01:00.0: get owner: 7ff0

[ 2674.982980] mlx4_core 0000:01:00.0: irq 1261 (309) for MSI/MSI-X

[ 2674.982982] mlx4_core 0000:01:00.0: get owner: 7ff0

[ 2674.984659] mlx4_core 0000:01:00.0: irq 1260 (308) for MSI/MSI-X

[ 2674.984661] mlx4_core 0000:01:00.0: get owner: 7ff0

[ 2674.986341] mlx4_core 0000:01:00.0: irq 1259 (307) for MSI/MSI-X

[ 2674.986343] mlx4_core 0000:01:00.0: get owner: 7ff0

[ 2674.988054] mlx4_core 0000:01:00.0: irq 1258 (306) for MSI/MSI-X

[ 2674.988056] mlx4_core 0000:01:00.0: get owner: 7ff0

[ 2674.989742] mlx4_core 0000:01:00.0: irq 1257 (305) for MSI/MSI-X

[ 2674.989744] mlx4_core 0000:01:00.0: get owner: 7ff0

[ 2674.991413] mlx4_core 0000:01:00.0: irq 1256 (304) for MSI/MSI-X

[ 2735.064254] mlx4_core 0000:01:00.0: command CONF_SPECIAL_QP (0x23) timed out: in_param=0x0, in_mod=0x40, op_mod=0x0, get_status err=0, status_reg=0x23006000, go_bit=0, t_bit=1, toggle=0x0

[ 2735.064266] mlx4_core 0000:01:00.0: Failed to initialize queue pair table (err=1), aborting.

[ 2735.106905] mlx4_core 0000:01:00.0: get owner: 7ff0

[ 2735.108107] mlx4_core 0000:01:00.0: get owner: 7ff0

[ 2735.109302] mlx4_core 0000:01:00.0: get owner: 7ff0

[ 2735.110502] mlx4_core 0000:01:00.0: get owner: 7ff0

[ 2735.111699] mlx4_core 0000:01:00.0: get owner: 7ff0

[ 2735.112901] mlx4_core 0000:01:00.0: get owner: 7ff0

[ 2735.114100] mlx4_core 0000:01:00.0: get owner: 7ff0

[ 2735.115299] mlx4_core 0000:01:00.0: get owner: 7ff0

[ 2735.116498] mlx4_core 0000:01:00.0: get owner: 7ff0

[ 2735.117701] mlx4_core 0000:01:00.0: get owner: 7ff0

Have you had any similar problem?

Thanks in advance.

Hi,

Actually I meant MLNX_OFED_LINUX-2.2-1.0.1-xenserver6.x-i686. I have tried this driver and xenserver was able to pick up the card but the driver has put this into Ethernet mode. I’m trying to see why it has done this but this maybe a a separate issue or related. I’ll try with a different card and follow-up.

Edit: I try to change it to IB but it tells me it is an Illegal port configuration attempted.

Hi !

you’re right. after installing OFED 1.5.4, my system can detect ib0, and it works!

thanks for your help

Hi,

I don’t think your card is supported. Please check you card model and psid and goto the firmware download page for its latest firmware.

Updating Firmware for Single Port InfiniHost™ III Lx MemFree PCI Express HCA Cards - Mellanox Technologies http://www.mellanox.com/page/firmware_table_IH3Lx

Thanks.

Hi, I have tried with MLNX_OFED_LINUX-2.2-1.0.1-xenserver6.x-i686 and the firmware update, but I have obtained the same results that was shown on the first post. So I have some questions:

Did you use XenServer 6.2 (Service Pack 1)?

What’s the command used to install the ofed? (I have used ./mlnxofedinstall --force-firmware-update)

Did you use another additional driver or only the mlnx_ofed software stack?

Thank you in advance.

Ok I’ve had success with a different card as I was Ethernet card before.

[root@xenserver6 ~]# /root/MLNX_OFED_LINUX-2.2-1.0.1-xenserver6.x-i686/mlnxofedinstall --fw-update-only

Logs dir: /tmp/MLNX_OFED_LINUX-2.2-1.0.1.10792.logs

Attempting to perform Firmware update…

Querying Mellanox devices firmware …

Device #1:


Device Type: ConnectX3

Part Number: MCX353A-FCB_A2-A5

Description: ConnectX-3 VPI adapter card; single-port QSFP; FDR IB (56Gb/s) and 40GigE; PCIe3.0 x8 8GT/s; RoHS R6

PSID: MT_1100120019

PCI Device Name: 0000:04:00.0

Versions: Current Available

FW 2.30.8000 2.31.5050

PXE 3.4.0151 3.4.0225

Status: Update required


Found 1 device(s) requiring firmware update…

Device #1: Updating FW … Done

Restart needed for updates to take effect.

Log File: /tmp/MLNX_OFED_LINUX-2.2-1.0.1.10792.logs/fw_update.log

Please reboot your system for the changes to take effect.

[root@xenserver6 ~]#

[root@xenserver6 ~]# /sbin/connectx_port_config -s --------------------------------

Port configuration for PCI device: 0000:04:00.0 is:

auto (ib)


[root@xenserver6 ~]#

[root@xenserver6 ~]# flint -d /dev/mst/mt4099_pci_cr0 q

Image type: FS2

FW Version: 2.31.5050

FW Release Date: 30.4.2014

Product Version: 02.31.50.50

Rom Info: type=PXE version=3.4.225 devid=4099 proto=VPI

Device ID: 4099

Description: Node Port1 Port2 Sys image

GUIDs: 0002c903001e8a20 0002c903001e8a21 0002c903001e8a22 0002c903001e8a23

MACs: 0002c91e8a20 0002c91e8a21

VSD:

PSID: MT_1100120019

[root@xenserver6 ~]#

[root@xenserver6 ~]# ibstat

CA ‘mlx4_0’

CA type: MT4099

Number of ports: 1

Firmware version: 2.31.5050

Hardware version: 1

Node GUID: 0x0002c903001e8a20

System image GUID: 0x0002c903001e8a23

Port 1:

State: Down

Physical state: Polling

Rate: 10

Base lid: 0

LMC: 0

SM lid: 0

Capability mask: 0x02514868

Port GUID: 0x0002c903001e8a21

Link layer: InfiniBand

[root@xenserver6 ~]#

[root@xenserver6 ~]# ifconfig -a

eth0 Link encap:Ethernet HWaddr 00:25:90:C9:31:44

UP BROADCAST MTU:1500 Metric:1

RX packets:0 errors:0 dropped:0 overruns:0 frame:0

TX packets:0 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:1000

RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)

eth1 Link encap:Ethernet HWaddr 00:25:90:C9:31:45

inet6 addr: fe80::225:90ff:fec9:3145/64 Scope:Link

UP BROADCAST RUNNING PROMISC MTU:1500 Metric:1

RX packets:731 errors:0 dropped:0 overruns:0 frame:0

TX packets:382 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:1000

RX bytes:154168 (150.5 KiB) TX bytes:50319 (49.1 KiB)

ib0 Link encap:InfiniBand HWaddr A0:00:01:00:FE:80:00:00:00:00:00:00:00:0 0:00:00:00:00:00:00

BROADCAST MULTICAST MTU:4092 Metric:1

RX packets:0 errors:0 dropped:0 overruns:0 frame:0

TX packets:0 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:128

RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)

lo Link encap:Local Loopback

inet addr:127.0.0.1 Mask:255.0.0.0

inet6 addr: ::1/128 Scope:Host

UP LOOPBACK RUNNING MTU:16436 Metric:1

RX packets:24 errors:0 dropped:0 overruns:0 frame:0

TX packets:24 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:0

RX bytes:14979 (14.6 KiB) TX bytes:14979 (14.6 KiB)

xenbr0 Link encap:Ethernet HWaddr 00:25:90:C9:31:44

inet6 addr: fe80::225:90ff:fec9:3144/64 Scope:Link

UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1

RX packets:0 errors:0 dropped:0 overruns:0 frame:0

TX packets:6 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:0

RX bytes:0 (0.0 b) TX bytes:468 (468.0 b)

xenbr1 Link encap:Ethernet HWaddr 00:25:90:C9:31:45

inet addr:192.168.50.1 Bcast:192.168.50.255 Mask:255.255.255.0

inet6 addr: fe80::225:90ff:fec9:3145/64 Scope:Link

UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1

RX packets:732 errors:0 dropped:0 overruns:0 frame:0

TX packets:389 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:0

RX bytes:154228 (150.6 KiB) TX bytes:52753 (51.5 KiB)

[root@xenserver6 ~]#

Have you tried version 1_5_3-4_0_42 instead? You may need to contact Mellanox support for XenServer driver.

Yes, XenServer SP1 with all updates. And just by running ./mlnxofedinstall

Might I ask did you install the additional packages required by the driver? Before you install MLNX_OFED 2.2.

yum install pciutils python libxml2-python libnl expat glib2 tcl bc libstdc++ tk

Hi. I have same problem. can you help me?

I also use XenServer 6.2 SP1 and installed MLNX_OFED_LINUX-2.2-1.0.1-xenserver6.x-i686.

but my server couldn’t find ib0.

heres some output :

[root@Epiclesis MLNX_OFED_LINUX-2.2-1.0.1-xenserver6.x-i686]# ./mlnxofedinstall --fw-update-only

Logs dir: /tmp/MLNX_OFED_LINUX-2.2-1.0.1.10272.logs

Attempting to perform Firmware update…

Querying Mellanox devices firmware …

Device #1:

----------

Device Type: InfiniHostIIILx

Part Number: –

Description:

PSID:

PCI Device Name: 0000:07:00.0

Versions: Current Available

FW –

Status: Failed to open device

---------

-E- Failed to query 0000:07:00.0 device, error : No such file or directory MFE_OLD_DEVICE_TYPE

Log File: /tmp/MLNX_OFED_LINUX-2.2-1.0.1.10272.logs/fw_update.log

Failed to update Firmware.

See /tmp/MLNX_OFED_LINUX-2.2-1.0.1.10272.logs/fw_update.log

To load the new driver, run:

/etc/init.d/openibd restart

[root@Epiclesis MLNX_OFED_LINUX-2.2-1.0.1-xenserver6.x-i686]# mst start

Starting MST (Mellanox Software Tools) driver set

[warn] mst_pci is already loaded, skipping

[warn] mst_pciconf is already loaded, skipping

Create devices

-W- Missing “lsusb” command, skipping MTUSB devices detection

[root@Epiclesis MLNX_OFED_LINUX-2.2-1.0.1-xenserver6.x-i686]# mst status

MST modules:

------------

MST PCI module loaded

MST PCI configuration module loaded

MST devices:

------------

/dev/mst/mt25204_pciconf0 - PCI configuration cycles access.

domain:bus:dev.fn=0000:07:00.0 addr.reg=88 data.reg=92

Chip revision is: A0

/dev/mst/mt25204_pci_cr0 - PCI direct access.

domain:bus:dev.fn=0000:07:00.0 bar=0xc4100000 size=0x100000

Chip revision is: A0

[root@Epiclesis MLNX_OFED_LINUX-2.2-1.0.1-xenserver6.x-i686]# flint -d /dev/mst/mt25204_pci_cr0 q

-E- Cannot open Device: /dev/mst/mt25204_pci_cr0. Operation not permitted MFE_OLD_DEVICE_TYPE

[root@Epiclesis MLNX_OFED_LINUX-2.2-1.0.1-xenserver6.x-i686]# hca_self_test.ofed

---- Performing Adapter Device Self Test ----

Number of CAs Detected … 1

PCI Device Check … PASS

Kernel Arch … i686

Host Driver Version … MLNX_OFED_LINUX-2.2-1.0.1 (OFED-2.2-1.0.0): 2.6.32.43-0.4.1.xs1.8.0.847.170785xen

Host Driver RPM Check … PASS

Firmware on CA #0 HCA … v1.2.0

Firmware Check on CA #0 (HCA) … NA

REASON: NO required fw version

Host Driver Initialization … PASS

Number of CA Ports Active … 1

Port State of Port #1 on CA #0 (HCA)… UP 4X (InfiniBand)

Error Counter Check on CA #0 (HCA)… PASS

Kernel Syslog Check … PASS

Node GUID on CA #0 (HCA) … 00:08:f1:04:03:99:2c:d4

------------------ DONE ---------------------

[root@Epiclesis MLNX_OFED_LINUX-2.2-1.0.1-xenserver6.x-i686]# ibstat

CA ‘mthca0’

CA type: MT25204

Number of ports: 1

Firmware version: 1.2.0

Hardware version: a0

Node GUID: 0x0008f10403992cd4

System image GUID: 0x0008f10403992cd7

Port 1:

State: Active

Physical state: LinkUp

Rate: 10

Base lid: 2

LMC: 0

SM lid: 1

Capability mask: 0x02510a68

Port GUID: 0x0008f10403992cd5

Link layer: InfiniBand

[root@Epiclesis MLNX_OFED_LINUX-2.2-1.0.1-xenserver6.x-i686]# lspci|grep Mella

07:00.0 InfiniBand: Mellanox Technologies MT25204 [InfiniHost III Lx HCA] (rev a0)

I really appreciate any help. thanks.

Hi all,

Finally the problem was fixed. After several tests, it seems that the problem was due to hardware incompatibility.

We have repeated the same steps in other node (with different integrated board) and all seems to be ok.