OS: RHEL/Rocky 8.7
Kernel: 4.18.0-425.10.1.el8_7.x86_64
RPM version: 4.14.3
When trying to build kernel support for a more recent kernel than those supported by default, the installation/compilation script can fail because the default value for MLNX_PYTHON_EXECUTABLE
is set to python
when policy in RHEL-based distributions uses python2
or python3
. This can easily be solved without touching sources by setting
export MLNX_PYTHON_EXECUTABLE=$(which python3)
However, the phase executing rpmbuild with mlx-ofa-kernel fails with
./mlnxofedinstall --add-kernel-support --upstream-libs
(...)
(in mlnxofarpmbuild.log)
mangling shebang in /usr/bin/mlnx_qos from /usr/bin/python3 to #!/usr/libexec/platform-python
mangling shebang in /usr/bin/tc_wrap.py from /usr/bin/python3 to #!/usr/libexec/platform-python
mangling shebang in /usr/bin/mlnx_perf from /usr/bin/python3 to #!/usr/libexec/platform-python
mangling shebang in /usr/bin/mlnx_qcn from /usr/bin/python3 to #!/usr/libexec/platform-python
mangling shebang in /usr/bin/mlnx_dump_parser from /usr/bin/python3 to #!/usr/libexec/platform-python
mangling shebang in /usr/bin/mlx_fs_dump from /usr/bin/python3 to #!/usr/libexec/platform-python
mangling shebang in /usr/sbin/mlnx_tune from /usr/bin/env python2 to #!/usr/bin/python2
*** ERROR: ambiguous python shebang in /usr/sbin/ib2ib_setup: #!/usr/bin/env python. Change it to python3 (or python2) explicitly.
*** WARNING: ./usr/src/ofa_kernel-4.9/backport_includes/2.6.16_sles10_sp3/include/src/idr.c is executable but has no shebang, removing executable bit
*** WARNING: ./usr/src/ofa_kernel-4.9/backport_includes/2.6.18-EL5.2/include/src/idr.c is executable but has no shebang, removing executable bit
*** WARNING: ./usr/src/ofa_kernel-4.9/net/sunrpc/xprtrdma/_makefile_ is executable but has no shebang, removing executable bit
*** WARNING: ./usr/src/ofa_kernel-4.9/drivers/nvme/_makefile_ is executable but has no shebang, removing executable bit
*** WARNING: ./usr/src/ofa_kernel-4.9/drivers/infiniband/ulp/isert/_makefile_ is executable but has no shebang, removing executable bit
*** WARNING: ./usr/src/ofa_kernel-4.9/drivers/infiniband/ulp/iser/_makefile_ is executable but has no shebang, removing executable bit
*** WARNING: ./usr/src/ofa_kernel-4.9/drivers/infiniband/ulp/srp/_makefile_ is executable but has no shebang, removing executable bit
mangling shebang in /usr/src/ofa_kernel-4.9/ofed_scripts/mlnx_tune from /usr/bin/env python2 to #!/usr/bin/python2
*** ERROR: ambiguous python shebang in /usr/src/ofa_kernel-4.9/ofed_scripts/utils/mlnx_dump_parser: #!/usr/bin/python. Change it to python3 (or python2) explicitly.
*** ERROR: ambiguous python shebang in /usr/src/ofa_kernel-4.9/ofed_scripts/utils/mlnx_mcg: #!/usr/bin/python. Change it to python3 (or python2) explicitly.
*** WARNING: ./usr/src/ofa_kernel-4.9/ofed_scripts/utils/setup.py is executable but has no shebang, removing executable bit
*** ERROR: ambiguous python shebang in /usr/src/ofa_kernel-4.9/ofed_scripts/utils/mlnx_qos: #!/usr/bin/python. Change it to python3 (or python2) explicitly.
*** ERROR: ambiguous python shebang in /usr/src/ofa_kernel-4.9/ofed_scripts/utils/mlnx_perf: #!/usr/bin/python. Change it to python3 (or python2) explicitly.
*** ERROR: ambiguous python shebang in /usr/src/ofa_kernel-4.9/ofed_scripts/utils/mlnx_qcn: #!/usr/bin/python. Change it to python3 (or python2) explicitly.
*** ERROR: ambiguous python shebang in /usr/src/ofa_kernel-4.9/ofed_scripts/utils/tc_wrap.py: #!/usr/bin/python. Change it to python3 (or python2) explicitly.
*** ERROR: ambiguous python shebang in /usr/src/ofa_kernel-4.9/ofed_scripts/utils/mlx_fs_dump: #!/usr/bin/python. Change it to python3 (or python2) explicitly.
*** WARNING: ./usr/src/ofa_kernel-4.9/ofed_scripts/mlnx_en/scripts/mlnx_en_uninstall.sh is executable but has no shebang, removing executable bit
*** ERROR: ambiguous python shebang in /usr/src/ofa_kernel-4.9/ofed_scripts/ib2ib/ib2ib_setup: #!/usr/bin/env python. Change it to python3 (or python2) explicitly.
mangling shebang in /usr/src/ofa_kernel/default/ofed_scripts/mlnx_tune from /usr/bin/env python2 to #!/usr/bin/python2
*** ERROR: ambiguous python shebang in /usr/src/ofa_kernel*** ERROR: ambiguous python shebang in /usr/src/ofa_kernel/default/ofed_scripts/utils/mlnx_qos: #!/usr/bin/python. Change it to python3 (or python2) explicitly.
*** ERROR: ambiguous python shebang in /usr/src/ofa_kernel/default/ofed_scripts/utils/mlnx_dump_parser: #!/usr/bin/python. Change it to python3 (or python2) explicitly.
*** ERROR: ambiguous python shebang in /usr/src/ofa_kernel/default/ofed_scripts/utils/mlnx_mcg: #!/usr/bin/python. Change it to python3 (or python2) explicitly.
*** WARNING: ./usr/src/ofa_kernel/default/ofed_scripts/utils/setup.py is executable but has no shebang, removing executable bit
*** ERROR: ambiguous python shebang in /usr/src/ofa_kernel/default/ofed_scripts/utils/mlnx_perf: #!/usr/bin/python. Change it to python3 (or python2) explicitly.
*** ERROR: ambiguous python shebang in /usr/src/ofa_kernel/default/ofed_scripts/utils/mlnx_qcn: #!/usr/bin/python. Change it to python3 (or python2) explicitly.
*** ERROR: ambiguous python shebang in /usr/src/ofa_kernel/default/ofed_scripts/utils/tc_wrap.py: #!/usr/bin/python. Change it to python3 (or python2) explicitly.
*** ERROR: ambiguous python shebang in /usr/src/ofa_kernel/default/ofed_scripts/utils/mlx_fs_dump: #!/usr/bin/python. Change it to python3 (or python2) explicitly.
mangling shebang in /usr/src/ofa_kernel/default/ofed_scripts/utils/build/scripts-3.6/mlnx_qos from /usr/bin/python3 to #!/usr/libexec/platform-python
mangling shebang in /usr/src/ofa_kernel/default/ofed_scripts/utils/build/scripts-3.6/tc_wrap.py from /usr/bin/python3 to #!/usr/libexec/platform-python
mangling shebang in /usr/src/ofa_kernel/default/ofed_scripts/utils/build/scripts-3.6/mlnx_perf from /usr/bin/python3 to #!/usr/libexec/platform-python
mangling shebang in /usr/src/ofa_kernel/default/ofed_scripts/utils/build/scripts-3.6/mlnx_qcn from /usr/bin/python3 to #!/usr/libexec/platform-python
mangling shebang in /usr/src/ofa_kernel/default/ofed_scripts/utils/build/scripts-3.6/mlnx_dump_parser from /usr/bin/python3 to #!/usr/libexec/platform-python
mangling shebang in /usr/src/ofa_kernel/default/ofed_scripts/utils/build/scripts-3.6/mlx_fs_dump from /usr/bin/python3 to #!/usr/libexec/platform-python
*** WARNING: ./usr/src/ofa_kernel/default/ofed_scripts/mlnx_en/scripts/mlnx_en_uninstall.sh is executable but has no shebang, removing executable bit
*** ERROR: ambiguous python shebang in /usr/src/ofa_kernel/default/ofed_scripts/ib2ib/ib2ib_setup: #!/usr/bin/env python. Change it to python3 (or python2) explicitly.
error: Bad exit status from /var/tmp/rpm-tmp.C60wUy (%install)
As per https://fedoraproject.org/wiki/Changes/Make_ambiguous_python_shebangs_error , /usr/lib/rpm/redhat/brp-mangle-shebangs
has changed behavior and now returns error on ambiguous shebangs that would’ve given a warning before. Since the error code returned is nonzero, compilation terminates early.
I’m assuming ofed-scripts
are used for all platforms and the ambiguity of the python shebang doesn’t happen on more recent distributions with deprecated support for python2
and other package managers.
The obvious solution is to replace unanbiguous shebangs in the OFED sources, repackage it and run the installation script again. That is:
# unpack sources
tar zxvf MLNX_OFED_SRC-4.9-6.0.6.0.tgz
cd MLNX_OFED_SRC-4.9-6.0.6.0
rpm -ivh mlnx-ofa_kernel-4.9-OFED.4.9.6.0.6.1.src.rpm
cd ~rpmbuild/SOURCES/
tar zxvf mlnx-ofa_kernel-4.9.tgz
cd mlnx-ofa_kernel-4.9
# find and replace offending files
grep -rnH '^#!/usr/bin/python'
grep -rnH '^#!/usr/bin/python'
sed -i '1s/$OLD_SHEBANG/$NEW_SHEBANG/' $OFFENDING_FILES
# repack sources
tar zcvf mlnx-ofa_kernel-4.9.tgz mlnx-ofa_kernel-4.9
cd ../ ; tar zcvf mlnx-ofa_kernel-4.9.tgz mlnx-ofa_kernel-4.9
rpmbuild -bs SPECS/mlnx-ofa_kernel.spec
cp ~/rpmbuild/SRPMS/mlnx-ofa_kernel-4.9-OFED.4.9.6.0.6.1.src.rpm $MLNX_ROOT/MLNX_OFED_SRC-4.9-6.0.6.0
From what I can gather from the logs, the installation script does some shebang manipulation similar to this but it seems that some files might not be corrected. Regarding the correct shebang to replace with, I checked the scripts and my best guess was that python2
was likely to work with all of them, so that’s what I used. This worked and drivers compiled correctly. Using normal installation without adding kernel support yielded a soft lockup in the kernel that stuck the boot process when starting openibd, I assume this can be expected for an unsupported kernel.
I notify this because it seems likely to break for other OFED versions/other OSs. I’m not sure if this is the proper place to suggest a fix or if there is a specific way to interact with developers (such as a GitHub repository), if I’m in the wrong place I’d appreciate a nudge in the right direction.
Regards,
Joaquin Torres.
Comisión Nacional de Energía Atómica - Centro Atómico Constituyentes
HPC Sysadmin
PS:
It would be nice if the documentation had more info for building the sources. In my experience, latest supported kernels in OFED releases are almost always behind latest kernels provided by the distribution and I’ve had much better results by recompiling than by using KMPs of older kernels… Except that compilation is a lot more likely to fail in unexpected ways.
I understand that this might mean more development/packaging time but since an RPM build structure is already made what really would save an awful lot of time is to have a non-local repo with updated releases synced to RedHat package versions like the people at ELRepo | HomePage
I realize that maintaining a cross-distribution set of packages is incredibly difficult and distributing stable RPMs can be a good compromise in that case (and probably there are a lot more issues that I’m not seeing). But, in the current state of affairs, my experience up to now has been:
- A slow, uphill battle to install OFED.
- Get a working install (luckily with kernel support, else with KMPs or building the drivers).
- Need to update the kernel to keep up with other packages.
- OFED install breaks.
- Try to update OFED but newer kernel is not yet supported.
- Compile kernel support fails.
- Go back to step 1.
Good documentation on compilation would help to at least make this process a lot smoother. Currently the only ways I’ve found to diagnose a build issue are to follow the install.pl
script or the multiple logs generated by the scripts. And, since the script chooses the defaults for the corresponding configure/make steps this can be tediously slow for each new try since -j or --with-ncpus default values provided can be extremely slow for multicore processors, and reproducibility of errors difficult because of the increasingly complex number of options.