Error /hca_self_test.ofed: line 165: [: too many arguments

I am running Centos 6.5 x86_64 I have installed the Mellanox driver (MLNX_OFED_LINUX-2.2-1.0.1-rhel6.5-x86_64) when running the following hca_self_test.ofed I receive the following output

/hca_self_test.ofed: line 165: [: too many arguments REASON: no RPMs found for currently booted kernel 2.6.32-431.20.3.el6.x86_64

Below is the output of ibstatus

ibstatus

Infiniband device ‘mlx4_0’ port 1 status:

default gid: fe80:0000:0000:0000:f452:1403:0033:7fc1

base lid: 0x0

sm lid: 0x0

state: 2: INIT

phys state: 5: LinkUp

rate: 56 Gb/sec (4X FDR)

link_layer: InfiniBand

I can restart the driver successfully but now always receive the error above when running hca_self_test.ofed, I have also made sure selinux is running in permissive, getenforce responds with Permissive. Anything else concerning the setup I can post that might help someone help me!

Thanks

In the end, we found out that this customer - without my knowledge - had actually bumped up their kernel with errata updates via the Red Hat Network. Once we reconfigured OFED and built a custom ISO, the driver was re-installed and this command now works quite well.

It is possible that the original poster of this thread has something other than the default, Red Hat/CentOS kernel (provided on DVD or ISO) and the Mellanox OFED drivers were not reconfigured and recompiled to support the kernel updates.

Symptoms do not present during the installation script, which makes this process more confusing - no errors of any kind. But once certain features are utilized then the problems begin.

If the kernel has any updates of any kind following Linux ISO or DVD installation, then Mellanox OFED must be reconfigured to support the new kernel and the resulting custom ISO must be mounted and used.

I will try this in about an hour

Sent from my iPhone

Hi All,

Just wanted to drop by and mention again that this script simply queries basic configuration from the server. you can simply run basic IB commands like ibstat to query the physical / logical state of the link. Regardless the issue you are describing is fixed in the next MOFED release (2.3)

Hi, Yes. this script simply check basic code revisions and link connectivity. which can be queried by other commands coming from the driver / infiniband-diags package

All:

Same exact symptoms here when running MLNX OFED 2.1-1.0.6 on RHEL 6.5 x86. No errors during mlnxofedinstall at all, nothing indicating that all RPM’s did not install.

I am going to attempt updating the adapter to the latest firmware and re-check to see if anything is corrected, but I am seeing the same things as the original poster of this thread.

Output looks like this (ignore the Spanish - it means “too many arguments.”)

[root@master ~]# hca_self_test.ofed

---- Performing Adapter Device Self Test ----

Number of CAs Detected … 1

PCI Device Check … PASS

/usr/bin/hca_self_test.ofed: línea 165: [: demasiados argumentos

Host Driver RPM Check … FAIL

REASON: no RPMs found for currently booted kernel 2.6.32-431.23.3.el6.x86_64

Kernel Arch … x86_64

Host Driver Version … NA

Firmware Check on CA #0 (VPI) … NA

Host Driver Initialization … NA

Number of CA Ports Active … NA

Error Counter Check … NA

Kernel Syslog Check … NA

Node GUID on CA #0 (VPI) … 00:02:c9:03:00:38:ed:60

------------------ DONE ---------------------

Output follows

rpm -qf /lib/modules/2.6.32-431.20.3.el6.x86_64/weak-updates/mlnx-ofa_kernel/drivers/net/ethernet/mellanox/mlx4/mlx4_core.ko

file /lib/modules/2.6.32-431.20.3.el6.x86_64/weak-updates/mlnx-ofa_kernel/drivers/net/ethernet/mellanox/mlx4/mlx4_core.ko is not owned by any package

changed file to the following gives the same result

KER_RPM=`rpm -qf $mlx4_core_ko 2> /dev/null | grep -E “kernel-ib|ofa_kernel”

Thanks

Hi,

My guess is it probably because the script cant find the exact kernel build 2.6.32-431.20.3. I dont think you should put any attention to it at this moment. if you are trying to verify your fabric health your ibstatus shows that the port is in INIT because there is no SM in the fabric. unless there is a switch that is supposed to manage the network you can simply start opensm locally : “/etc/init.d/opensmd start”.

Output Below

modinfo mlx4_core

filename: /lib/modules/2.6.32-431.20.3.el6.x86_64/weak-updates/mlnx-ofa_kernel/drivers/net/ethernet/mellanox/mlx4/mlx4_core.ko

version: 1.1

license: Dual BSD/GPL

description: Mellanox ConnectX HCA low-level driver

author: Roland Dreier

srcversion: 9A90DAE92A2E75BF5F67A24

alias: pci:v000015B3d00001010svsdbcsci*

alias: pci:v000015B3d0000100Fsvsdbcsci*

alias: pci:v000015B3d0000100Esvsdbcsci*

alias: pci:v000015B3d0000100Dsvsdbcsci*

alias: pci:v000015B3d0000100Csvsdbcsci*

alias: pci:v000015B3d0000100Bsvsdbcsci*

alias: pci:v000015B3d0000100Asvsdbcsci*

alias: pci:v000015B3d00001009svsdbcsci*

alias: pci:v000015B3d00001008svsdbcsci*

alias: pci:v000015B3d00001007svsdbcsci*

alias: pci:v000015B3d00001006svsdbcsci*

alias: pci:v000015B3d00001005svsdbcsci*

alias: pci:v000015B3d00001004svsdbcsci*

alias: pci:v000015B3d00001003svsdbcsci*

alias: pci:v000015B3d00001002svsdbcsci*

alias: pci:v000015B3d0000676Esvsdbcsci*

alias: pci:v000015B3d00006746svsdbcsci*

alias: pci:v000015B3d00006764svsdbcsci*

alias: pci:v000015B3d0000675Asvsdbcsci*

alias: pci:v000015B3d00006372svsdbcsci*

alias: pci:v000015B3d00006750svsdbcsci*

alias: pci:v000015B3d00006368svsdbcsci*

alias: pci:v000015B3d0000673Csvsdbcsci*

alias: pci:v000015B3d00006732svsdbcsci*

alias: pci:v000015B3d00006354svsdbcsci*

alias: pci:v000015B3d0000634Asvsdbcsci*

alias: pci:v000015B3d00006340svsdbcsci*

depends: compat

vermagic: 2.6.32-431.el6.x86_64 SMP mod_unload modversions

parm: set_4k_mtu:(Obsolete) attempt to set 4K MTU to all ConnectX ports (int)

parm: debug_level:Enable debug tracing if > 0 (int)

parm: msi_x:0 - don’t use MSI-X, 1 - use MSI-X, >1 - limit number of MSI-X irqs to msi_x (non-SRIOV only) (int)

parm: enable_sys_tune:Tune the cpu’s for better performance (default 0) (int)

parm: block_loopback:Block multicast loopback packets if > 0 (default: 1) (int)

parm: num_vfs:Either single value (e.g. ‘5’) or triplet (e.g. ‘10,11,12’) to define uniform num_vfs value for all devices functions.

If a single value is given, this value will be used in order to define dual port virtual functions are probed.

Alternatively, a string to map device function numbers to their probe_vf values

(e.g. ‘0000:04:00.0-3,002b:1c:0b.a-13;12;11’) could be given.

Hexadecimal digits for the device function (e.g. 002b:1c:0b.a) and decimal for probe_vf value (e.g. 13 or 1;2;3). (string)

parm: log_num_mgm_entry_size:log mgm size, that defines the num of qp per mcg, for example: 10 gives 248.range: 7 <= log_num_mgm_entry_size <= 12. To activate device managed flow steering when available, set to -1 (int)

parm: high_rate_steer:Enable steering mode for higher packet rate (default off) (int)

parm: fast_drop:Enable fast packet drop when no recieve WQEs are posted (int)

parm: enable_64b_cqe_eqe:Enable 64 byte CQEs/EQEs when the the FW supports this if non-zero (default: 1) (int)

parm: log_num_mac:Log2 max number of MACs per ETH port (1-7) (int)

parm: log_num_vlan:(Obsolete) Log2 max number of VLANs per ETH port (0-7) (int)

parm: log_mtts_per_seg:Log2 number of MTT entries per segment (0-7) (default: 0) (int)

parm: port_type_array:Either pair of values (e.g. ‘1,2’) to define uniform port1/port2 types configuration for all devices functions

or a string to map device function numbers to their pair of port types values (e.g. ‘0000:04:00.0-1;2,002b:1c:0b.a-1;1’).

Valid port types: 1-ib, 2-eth, 3-auto, 4-N/A

In case that only one port is available use the N/A port type for port2 (e.g ‘1,4’). (string)

parm: log_num_qp:log maximum number of QPs per HCA (default: 19) (int)

parm: log_num_srq:log maximum number of SRQs per HCA (default: 16) (int)

parm: log_rdmarc_per_qp:log number of RDMARC buffers per QP (default: 4) (int)

parm: log_num_cq:log maximum number of CQs per HCA (default: 16) (int)

parm: log_num_mcg:log maximum number of multicast groups per HCA (default: 13) (int)

parm: log_num_mpt:log maximum number of memory protection table entries per HCA (default: 19) (int)

parm: log_num_mtt:log maximum number of memory translation table segments per HCA (default: max(20, 2*MTTs for register all of the host memory limited to 30)) (int)

parm: enable_qos:Enable Quality of Service support in the HCA (default: off) (bool)

parm: internal_err_reset:Reset device on internal errors if non-zero (default 0) (int)

Second command

modinfo mlx4_core |grep filename | awk '{print $NF}

/lib/modules/2.6.32-431.20.3.el6.x86_64/weak-updates/mlnx-ofa_kernel/drivers/net/ethernet/mellanox/mlx4/mlx4_core.ko

Thanks

Thank You, so this error will not cause the mellanox “not to work” once final configuration is in place?

The following did work as you mentioned

“/etc/init.d/opensmd start”.

ibstatus now reports the following

ibstatus

Infiniband device ‘mlx4_0’ port 1 status:

default gid: fe80:0000:0000:0000:f452:1403:0033:7fc1

base lid: 0x1

sm lid: 0x1

state: 4: ACTIVE

phys state: 5: LinkUp

rate: 56 Gb/sec (4X FDR)

link_layer: InfiniBand

Thanks

Could you run the following commands

modinfo mlx4_core

modinfo mlx4_core |grep filename | awk ‘{print $NF}’

and provide the output? The last command probably return zero length string and cause to script failure.

and what is the output of

rpm -qf /lib/modules/2.6.32-431.20.3.el6.x86_64/weak-updates/mlnx-ofa_kernel/drivers/net/ethernet/mellanox/mlx4/mlx4_core.ko

If should return the RPM name

And could you check if script will work if you change this line (163)

KER_RPM=rpm -qf $mlx4_core_ko 2> /dev/null | grep -E "kernel-ib|ofa_kernel"

to this

KER_RPM=rpm -qf $mlx4_core_ko 2> /dev/null | grep -E "kernel-ib\|ofa_kernel"

It is strange. There should be RPM. Could you re-install MOFED by executing mlnxofedinstall script and check that there is no errors during installation?

Reran install as you asked and saw the following, not sure these warnings will be a problem, while I had time I also ran the install on another machine with the same specs and it worked normally after initial install and before a yum update, after yum update it also has the same problem.

Device (42:00.0):

42:00.0 Network controller: Mellanox Technologies MT27500 Family

Link Width: 8x

PCI Link Speed: Unknown

Device (42:00.0):

42:00.0 Network controller: Mellanox Technologies MT27500 Family

WARNING - device 42:00.0 The MaxReadRequest size is set too low (512 bytes) and will affect performance.

Please consult your server’s vendor and if possible change BIOS settings or use setpci to configure MaxReadReq to 4096 bytes.

  1. /sbin/setpci -s 42:00.0 68.W

2xxx

Change to 4096 bytes:

  1. /sbin/setpci -s 42:00.0 68.W=5xxx

Installation finished successfully.

Attempting to perform Firmware update…

Querying Mellanox devices firmware …

Device #1: