I have successfully built and installed ofed 3.2-2.0.0.0 on an Oracle Enterprise Linux 6.7 system. I tested this ofed installation in ethernet emulation mode with iperf3 and a ConnectX-4 port looping back to the other ConnectX-4 port on the same system. I disabled the internal loopback and configured nat to force the data to be issued over the physical IB cable.
I used the same build and install sequence on a simulator system with a custom kernel based on Oracle Enterprise Linux 7.1. This system is failing “mst start”
I used MLNX_OFED_LINUX-3.2-2.0.0.0-oel7.1-x86_64.tgz and the mlnx_add_kernel_support.sh completed without issue.
Here is the uname info for the custom kernel the system is running:
uname -a
Linux 4.1.12-32.el7uek-axnp.debug.070000.009400 #1 SMP Fri Jan 29 12:18:44 PST 2016 x86_64 x86_64 x86_64 GNU/Linux
The extra modules directory is populated with what seems like the correct directories and files:
ls /usr/lib/modules/4.1.12-32.el7uek-axnp.debug.070000.009400/extra
dmadriver_api_mod.ko knem pdgCommon.ko slbmgr_api_mod.ko
i40e.ko ksimod.ko pdgPm8018.ko slbmgrmod.ko
iser mlnx-ofa_kernel pdgQlSan.ko srp
kernel-mft nvdimm.ko psg_services.ko tdsmod.ko
The mst modules are present:
ls ./usr/lib/modules/4.1.12-32.el7uek-axnp.debug.070000.009400/extra/kernel-mft/
mst_pciconf.ko mst_pci.ko
However “mst start” fails for “version magic”:
mst start
Starting MST (Mellanox Software Tools) driver set
Loading MST PCI modulemodprobe: ERROR: could not insert ‘mst_pci’: Exec format error
- Failure: 1
Loading MST PCI configuration modulemodprobe: ERROR: could not insert ‘mst_pciconf’: Exec format error
- Failure: 1
Create devices
mst_pci driver not found
Unloading MST PCI module (unused) - Success
Unloading MST PCI configuration module (unused) - Success
From dmesg:
[ 2264.895260] mst_pci: version magic '4.1.12-32.el7uek-axnp.debug SMP mod_unload ’ should be '4.1.12-32.el7uek-axnp.debug.070000.009400 SMP mod_unload ’
[ 2265.048972] mst_pciconf: version magic '4.1.12-32.el7uek-axnp.debug SMP mod_unload ’ should be '4.1.12-32.el7uek-axnp.debug.070000.009400 SMP mod_unload ’
It looks like the .070000.009400 is being truncated off the “version magic” in the mst_pci and mst_pciconf modules.
Is there a string length limit for the kernel versioning in mlnx_add_kernel_support.sh?