mlnx_add_kernel_support.sh and magic version errors.

I have successfully built and installed ofed 3.2-2.0.0.0 on an Oracle Enterprise Linux 6.7 system. I tested this ofed installation in ethernet emulation mode with iperf3 and a ConnectX-4 port looping back to the other ConnectX-4 port on the same system. I disabled the internal loopback and configured nat to force the data to be issued over the physical IB cable.

I used the same build and install sequence on a simulator system with a custom kernel based on Oracle Enterprise Linux 7.1. This system is failing “mst start”

I used MLNX_OFED_LINUX-3.2-2.0.0.0-oel7.1-x86_64.tgz and the mlnx_add_kernel_support.sh completed without issue.

Here is the uname info for the custom kernel the system is running:

uname -a

Linux 4.1.12-32.el7uek-axnp.debug.070000.009400 #1 SMP Fri Jan 29 12:18:44 PST 2016 x86_64 x86_64 x86_64 GNU/Linux

The extra modules directory is populated with what seems like the correct directories and files:

ls /usr/lib/modules/4.1.12-32.el7uek-axnp.debug.070000.009400/extra

dmadriver_api_mod.ko knem pdgCommon.ko slbmgr_api_mod.ko

i40e.ko ksimod.ko pdgPm8018.ko slbmgrmod.ko

iser mlnx-ofa_kernel pdgQlSan.ko srp

kernel-mft nvdimm.ko psg_services.ko tdsmod.ko

The mst modules are present:

ls ./usr/lib/modules/4.1.12-32.el7uek-axnp.debug.070000.009400/extra/kernel-mft/

mst_pciconf.ko mst_pci.ko

However “mst start” fails for “version magic”:

mst start

Starting MST (Mellanox Software Tools) driver set

Loading MST PCI modulemodprobe: ERROR: could not insert ‘mst_pci’: Exec format error

  • Failure: 1

Loading MST PCI configuration modulemodprobe: ERROR: could not insert ‘mst_pciconf’: Exec format error

  • Failure: 1

Create devices

mst_pci driver not found

Unloading MST PCI module (unused) - Success

Unloading MST PCI configuration module (unused) - Success

From dmesg:

[ 2264.895260] mst_pci: version magic '4.1.12-32.el7uek-axnp.debug SMP mod_unload ’ should be '4.1.12-32.el7uek-axnp.debug.070000.009400 SMP mod_unload ’

[ 2265.048972] mst_pciconf: version magic '4.1.12-32.el7uek-axnp.debug SMP mod_unload ’ should be '4.1.12-32.el7uek-axnp.debug.070000.009400 SMP mod_unload ’

It looks like the .070000.009400 is being truncated off the “version magic” in the mst_pci and mst_pciconf modules.

Is there a string length limit for the kernel versioning in mlnx_add_kernel_support.sh?

I’m not sure how this issue was marked with a “correct answer.”

I downloaded and tried the stand-a-lone 4.3.0 mft utilities and there is no difference in behavior. “mst start” still fails for the “version magic” number.

[root@co-sanfs2sim-01 bin]# ./mlxup --query

Querying Mellanox devices firmware …

Device #1:


Device Type: ConnectX4

Part Number: MCX456A-ECA_Ax

Description: ConnectX-4 VPI adapter card; EDR IB (100Gb/s) and 100GbE; dual-port QSFP28; PCIe3.0 x16; ROHS R6

PSID: MT_2190110032

PCI Device Name: 0000:90:00.0

Base GUID: 7cfe900300726e9a

Base MAC: 00007cfe90726e9a

Versions: Current Available

FW 12.14.2036 12.14.2036

Status: Up to date

[root@co-sanfs2sim-01 mft-4.3.0-25]# uname -a

Linux co-sanfs2sim-01.us.oracle.com 4.1.12-32.el7uek-axnp.debug.070000.009400 #1 SMP Fri Jan 29 12:18:44 PST 2016 x86_64 x86_64 x86_64 GNU/Linux

[root@co-sanfs2sim-01 mft-4.3.0-25]# ./install.sh

-I- Removing all installed mft packages: mft kernel-mft

-I- Building the MFT kernel binary RPM…

-I- Installing the MFT RPMs…

Preparing… ################################# [100%]

Updating / installing…

1:kernel-mft-4.3.0-4.1.12_32.el7uek################################# [100%]

Preparing… ################################# [100%]

Updating / installing…

1:mft-4.3.0-25 ################################# [100%]

-I- In order to start mst, please run “mst start”.

[root@co-sanfs2sim-01 mft-4.3.0-25]# mst start

Starting MST (Mellanox Software Tools) driver set

Loading MST PCI modulemodprobe: ERROR: could not insert ‘mst_pci’: Exec format error

  • Failure: 1

Loading MST PCI configuration modulemodprobe: ERROR: could not insert ‘mst_pciconf’: Exec format error

  • Failure: 1

Create devices

mst_pci driver not found

Unloading MST PCI module (unused) - Success

Unloading MST PCI configuration module (unused) - Success

from dmesg.

[ 5898.964905] mst_pci: version magic '4.1.12-32.el7uek-axnp.debug SMP mod_unload ’ should be '4.1.12-32.el7uek-axnp.debug.070000.009400 SMP mod_unload ’

[ 5899.116921] mst_pciconf: version magic '4.1.12-32.el7uek-axnp.debug SMP mod_unload ’ should be '4.1.12-32.el7uek-axnp.debug.070000.009400SMP mod_unload ’

I’m not sure how this issue was marked with a “correct answer.”

I downloaded and tried the stand-a-lone 4.3.0 mft utilities and there is no difference in behavior. “mst start” still fails for the “version magic” number.

[root@co-sanfs2sim-01 bin]# ./mlxup --query

Querying Mellanox devices firmware …

Device #1:


Device Type: ConnectX4

Part Number: MCX456A-ECA_Ax

Description: ConnectX-4 VPI adapter card; EDR IB (100Gb/s) and 100GbE; dual-port QSFP28; PCIe3.0 x16; ROHS R6

PSID: MT_2190110032

PCI Device Name: 0000:90:00.0

Base GUID: 7cfe900300726e9a

Base MAC: 00007cfe90726e9a

Versions: Current Available

FW 12.14.2036 12.14.2036

Status: Up to date

[root@co-sanfs2sim-01 mft-4.3.0-25]# uname -a

Linux co-sanfs2sim-01.us.oracle.com 4.1.12-32.el7uek-axnp.debug.070000.009400 #1 SMP Fri Jan 29 12:18:44 PST 2016 x86_64 x86_64 x86_64 GNU/Linux

[root@co-sanfs2sim-01 mft-4.3.0-25]# ./install.sh

-I- Removing all installed mft packages: mft kernel-mft

-I- Building the MFT kernel binary RPM…

-I- Installing the MFT RPMs…

Preparing… ################################# [100%]

Updating / installing…

1:kernel-mft-4.3.0-4.1.12_32.el7uek################################# [100%]

Preparing… ################################# [100%]

Updating / installing…

1:mft-4.3.0-25 ################################# [100%]

-I- In order to start mst, please run “mst start”.

[root@co-sanfs2sim-01 mft-4.3.0-25]# mst start

Starting MST (Mellanox Software Tools) driver set

Loading MST PCI modulemodprobe: ERROR: could not insert ‘mst_pci’: Exec format error

  • Failure: 1

Loading MST PCI configuration modulemodprobe: ERROR: could not insert ‘mst_pciconf’: Exec format error

  • Failure: 1

Create devices

mst_pci driver not found

Unloading MST PCI module (unused) - Success

Unloading MST PCI configuration module (unused) - Success

from dmesg.

[ 5898.964905] mst_pci: version magic '4.1.12-32.el7uek-axnp.debug SMP mod_unload ’ should be '4.1.12-32.el7uek-axnp.debug.070000.009400 SMP mod_unload ’

[ 5899.116921] mst_pciconf: version magic '4.1.12-32.el7uek-axnp.debug SMP mod_unload ’ should be '4.1.12-32.el7uek-axnp.debug.070000.009400 SMP mod_unload ’

I have looked through the mlnx_add_kernel_support.sh and mlnxofedinstall to try get this to install properly. So far I have been unsuccessful.

I have successfully installed ofed on the following kernel:

4.1.12-32.1.2.el7uek.x86_64

This is the kernel that gets the magic number issue and I don’t have control over changing this number format.

4.1.12-32.el7uek-axnp.debug.070000.009400

mlx_compat: version magic '4.1.12-32.el7uek-axnp.debug SMP mod_unload ’ should be '4.1.12-32.el7uek-axnp.debug.070000.009400 SMP mod_unload ’

It looks like “mlnx_add_kernel_support.sh” has a kernel name format requirement. Can you point to the area of the script that does this. I found the uname portion, but it just reads in the kernel number and uses it. It doesn’t manipulate it.

Hi Nathan,

Did you actually download and install the MFT package version 4.3.0 for Linux, which is required if you wanted to use the MFT utilities.

mlxup - Mellanox Update and Query Utility mlxup - Mellanox Update and Query Utility

NOTE: Version 4.3.0 is the latest.

Thank you,

Sophie.

I didn’t download the MFT package. I must have used the MFT supplied in

MLNX_OFED_LINUX-3.2-2.0.0.0-oel7.1-x86_64.tgz.

It looks like I have mft 4.3

mst version*

*mst, mft 4.3.0-25, built on Jan 25 2016, 19:10:21. Git SHA Hash: 7465f26

I followed this doc to setup ofed.

https://community.mellanox.com/s/article/getting-started-with-connectx-4-100gb-s-adapter-for-linux

*These are the steps I followed. It worked for standard OEL 6.7 with the

proper 6.7 *tar ball.

  1. Download ofed 3.2-2.0.0.0 tar ball to /tmp

http://www.mellanox.com/page/products_dyn?product_family=26&mtag=linux_sw_drivers http://www.mellanox.com/page/products_dyn?product_family=26&mtag=linux_sw_drivers

MLNX_OFED_LINUX-3.2-2.0.0.0-oel7.1-x86_64.tgz for OEL 7.1

  1. Untar MLNX_OFED_LINUX-3.2-2.0.0.0-oel7.1-x86_64.tgz

tar zxvf MLNX_OFED_LINUX-3.2-2.0.0.0-oel7.1-x86_64.tgz

creates /tmp/MLNX_OFED_LINUX-3.2-2.0.0.0-oel7.1-x86_64

  1. Install other required software packages.

yum install rpm-build gcc-gfortran

  1. Generate the ofed binaries for your specific kernel from

/tmp/MLNX_OFED_LINUX-3.2-2.0.0.0-oel7.1-x86_64.

./mlnx_add_kernel_support.sh --make-tgz --mlnx_ofed

/tmp/MLNX_OFED_LINUX-3.2-2.0.0.0-oel7.1-x86_64 --ofed-sources

/tmp/MLNX_OFED_LINUX-3.2-2.0.0.0-oel7.1-x86_64/src/MLNX_OFED_SRC-3.2-2.0.0.0.tgz–skip-repo

-t /tmp

SCREEN OUPUT:

Note: This program will create MLNX_OFED_LINUX TGZ for oel7.1 under

/usr/tmp directory.

Do you want to continue?[y/N]:y

See log file /tmp/mlnx_ofed_iso.1651.log

Building OFED RPMS . Please wait…

Created /tmp/MLNX_OFED_LINUX-3.2-2.0.0.0-oel7.1-x86_64-ext.tgz

  1. Untar the ofed package that was just created.

tar zxvf MLNX_OFED_LINUX-3.2-2.0.0.0-oel7.1-x86_64-ext.tgz

  1. navigate to the direcotry that hold the package for the specific

system and run install script.

cd MLNX_OFED_LINUX-3.2-2.0.0.0-oel7.1-x86_64-ext

./mlnxofedinstall

SCREEN OUPUT:

Logs dir: /tmp/MLNX_OFED_LINUX-3.2-2.0.0.0.3782.logs

This program will install the MLNX_OFED_LINUX package on your machine.

Note that all other Mellanox, OEM, OFED, or Distribution IB

packages will be removed.

Do you want to continue?[y/N]:

.

.

.

Device (90:00.0):

90:00.0 Infiniband controller: Mellanox Technologies MT27700

Family

Link Width: x16

PCI Link Speed: 8GT/s

Device (90:00.1):

90:00.1 Infiniband controller: Mellanox Technologies MT27700

Family

Link Width: x16

PCI Link Speed: 8GT/s

Installation finished successfully.

Preparing… #################################

Updating / installing…

1:mlnx-fw-updater-3.2-2.0.0.0 #################################

Added 'RUN_FW_UPDATER_ONBOOT=no to /etc/infiniband/openib.conf

Attempting to perform Firmware update…

Querying Mellanox devices firmware …

Device #1: