Package digest query fails for mft rpm file if run on aarch64 but not x86_64

We’ve not been using the mlnx_ofed stack. We’re required to mirror packages locally and ‘createrepo’ against that before provisioning downstream. All compute nodes on all HPC Asset clusters install the Infiniband Support package group. Through last week this did not pull in ‘mft’ as a dependency because the mlnx_ofed repo was not enabled. This week we’re trying to add knem, kmod-knem, and ucx-knem to the compute image, so we enabled the NVidia OFED repodef for the clients; this points to a cluster-local createrepo’d mirror of {new_users_can’t_post_multiple_links}://linux.mellanox.com/public/repo/mlnx_ofed/latest/rhel8.7/ with the clients’ repodef referencing {releasever} and {basearch} as usual.

Enabling dnf searching into the mlnx_ofed stack caused the Infiniband Support package group install to fail with no-digest for the NVidia arm64 mft package. We can easily verify this on an aarch64 system but, if we try to verify it on an x86_64 system against the same package files, the digests are OK.

Why are the SHA256 and MD5 digests of a static aarch64 package file visible on an x86_64 box but not visible, for the same static file, on an aarch64 box running the same base os?

Here’s a chroot session into the image being built on an aarch64 system, starting with the minimal bits required to support chroot and dnf:

bash-4.4# dnf groupinstall 'Infiniband Support'
Last metadata expiration check: 1:07:49 ago on Fri Feb 24 20:16:21 2023.
No match for group package "libibmad"
Dependencies resolved.

 Problem: cannot install both libibverbs-41.0-1.el8.aarch64 and libibverbs-59mlnx44-1.59056.aarch64
  - package perftest-4.5-12.el8.aarch64 requires libefa.so.1()(64bit), but none of the providers can be installed
  - package perftest-4.5-12.el8.aarch64 requires libefa.so.1(EFA_1.1)(64bit), but none of the providers can be installed
  - cannot install the best candidate for the job
======================================================================================================================================================================================================
 Package                                         Architecture                     Version                                                             Repository                                 Size
======================================================================================================================================================================================================
Installing:
 mlnx-ofed-all                                   noarch                           5.9-0.5.6.0.rhel8.7                                                 nvidia-mlnx-ofed                           11 k
Installing group/module packages:
 ibacm                                           aarch64                          59mlnx44-1.59056                                                    nvidia-mlnx-ofed                           84 k
   <...snipsnip...>
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Total                                                                                                                                                                  56 MB/s | 123 MB     00:02     
Running transaction check
Transaction check succeeded.
Running transaction test
The downloaded packages were saved in cache until the next successful transaction.
You can remove cached packages by executing 'dnf clean packages'.
Error: Transaction test error:
  package mft-4.23.0-104.aarch64 does not verify: no digest

Here’s the manual download and digest check, on the same aarch64 system and in the same chroot session, showing inability to see the SHA256 and MD5 digests of the mft package but ability to see them in (arbitrarily chosen) perl-Digest-SHA:

bash-4.4# uname -rm
4.18.0-425.10.1.el8_7.aarch64 aarch64

bash-4.4# dnf clean packages
0 files removed

bash-4.4# dnf download mft
Last metadata expiration check: 1:28:21 ago on Fri Feb 24 20:16:21 2023.
mft-4.23.0-104.arm64.rpm                                                                                                                                               57 MB/s |  34 MB     00:00    

bash-4.4# rpm -Kvv --nosignature mft-4.23.0-104.arm64.rpm 2>&1 |grep -i digest
Header SHA1 digest: OK
Header SHA1 digest: OK
Header SHA1 digest: OK
Header SHA1 digest: OK
Header SHA1 digest: OK
Header SHA1 digest: OK
    Header SHA1 digest: OK
    Payload SHA256 digest: NOTFOUND
    MD5 digest: NOTFOUND

bash-4.4# dnf clean packages
0 files removed

bash-4.4# dnf download perl-Digest-SHA
Last metadata expiration check: 1:28:53 ago on Fri Feb 24 20:16:21 2023.
perl-Digest-SHA-6.02-1.el8.aarch64.rpm                                                                                                                                 19 MB/s |  63 kB     00:00    

bash-4.4# rpm -Kvv --nosignature perl-Digest-SHA-6.02-1.el8.aarch64.rpm 2>&1 |grep -i digest
Header SHA1 digest: OK
Header SHA1 digest: OK
Header SHA1 digest: OK
Header SHA1 digest: OK
Header SHA1 digest: OK
Header SHA1 digest: OK
perl-Digest-SHA-6.02-1.el8.aarch64.rpm:
    Header SHA256 digest: OK
    Header SHA1 digest: OK
    Payload SHA256 digest: OK

In case the problem is in the [as of yet minimal] system image or chroot session, here’s the same check on the same system, outside of the chroot and new system image, accessing the same package files:

16:49:21 root@compute104.godzilla:/var/image/godzilla/20230214-compute.aarch64.el8/rootfs/root # uname -rm
4.18.0-425.10.1.el8_7.aarch64 aarch64

16:49:23 root@compute104.godzilla:/var/image/godzilla/20230214-compute.aarch64.el8/rootfs/root # rpm -Kvv --nosignature mft-4.23.0-104.arm64.rpm 2>&1 |grep -i digest
Header SHA1 digest: OK
Header SHA1 digest: OK
Header SHA1 digest: OK
    Header SHA1 digest: OK
    Payload SHA256 digest: NOTFOUND
    MD5 digest: NOTFOUND

16:49:39 root@compute104.godzilla:/var/image/godzilla/20230214-compute.aarch64.el8/rootfs/root # rpm -Kvv --nosignature perl-Digest-SHA-6.02-1.el8.aarch64.rpm 2>&1 |grep -i digest
Header SHA1 digest: OK
Header SHA1 digest: OK
Header SHA1 digest: OK
perl-Digest-SHA-6.02-1.el8.aarch64.rpm:
    Header SHA256 digest: OK
    Header SHA1 digest: OK
    Payload SHA256 digest: OK

And the screwy part: here’s the same check done with the same files on the x86_64 infrastructure node responsible for serving the local package repo (built from a mirror of Index of /public/repo/mlnx_ofed/latest/rhel8.7 with ‘createrepo’ run against it). These are the same package files as before, for aarch64, but with the check run on an x86_64 system:

16:52:17 root@admin.godzilla:/var/repo/nvidia/public/repo/mlnx_ofed/latest/rhel8/aarch64 # uname -rm
4.18.0-425.10.1.el8_7.x86_64 x86_64

16:52:18 root@admin.godzilla:/var/repo/nvidia/public/repo/mlnx_ofed/latest/rhel8/aarch64 # rpm -Kvv --nosignature mft-4.23.0-104.arm64.rpm 2>&1 |grep -i digest
Header SHA1 digest: OK
Header SHA1 digest: OK
Header SHA1 digest: OK
    Header SHA1 digest: OK
    MD5 digest: OK

16:53:41 root@admin.godzilla:/var/repo/rocky-linux/8/AppStream/aarch64/os/Packages/p # rpm -Kvv --nosignature perl-Digest-SHA-6.02-1.el8.aarch64.rpm 2>&1 |grep -i digest
Header SHA1 digest: OK
Header SHA1 digest: OK
Header SHA1 digest: OK
perl-Digest-SHA-6.02-1.el8.aarch64.rpm:
    Header SHA256 digest: OK
    Header SHA1 digest: OK
    Payload SHA256 digest: OK
    MD5 digest: OK

If I force mft onto the build (rpm --nodigest -i) then it installs and I can at least start the groupinstall for ‘Infiniband Support’, but this then throws unpack / digest mismatch errors for select other packages in the NVidia OFED stack. These are not digest-missing or digest-parsable errors, the installing platform can download and verify digests on each of these packages, these appear to be “real” content-validation errors:

Error unpacking rpm package sharp-3.2.0.MLNX20230122.a97f1d1c-1.59056.aarch64
error: unpacking of archive failed on file /etc/ld.so.conf.d/sharp.conf;63f94125: cpio: Digest mismatch
error: sharp-3.2.0.MLNX20230122.a97f1d1c-1.59056.aarch64: install failed

Error unpacking rpm package hcoll-4.8.3221-1.59056.aarch64
error: unpacking of archive failed on file /etc/ld.so.conf.d/hcoll.conf;63f94125: cpio: Digest mismatch
error: hcoll-4.8.3221-1.59056.aarch64: install failed

Error unpacking rpm package openmpi-4.1.5rc2-1.59056.aarch64
error: unpacking of archive failed on file /usr/mpi/gcc/openmpi-4.1.5rc2/bin/aggregate_profile.pl;63f94125: cpio: Digest mismatch
error: openmpi-4.1.5rc2-1.59056.aarch64: install failed

Error unpacking rpm package mlnx-ofed-all-5.9-0.5.6.0.rhel8.7.noarch
error: unpacking of archive failed on file /usr/share/doc/mlnx-ofed-all/mlnx-ofed-all-release;63f94125: cpio: Digest mismatch
error: mlnx-ofed-all-5.9-0.5.6.0.rhel8.7.noarch: install failed

    <...snipsnip...>
Skipped:
  libibverbs-41.0-1.el8.aarch64                                                                      perftest-4.5-12.el8.aarch64

Skipped in favor of libibverbs-59mlnx44-1.59056.aarch64 and perftest-4.5-0.20.gac7cca5.59056.aarch64, OK and expected.

Failed:
  hcoll-4.8.3221-1.59056.aarch64           mlnx-ofed-all-5.9-0.5.6.0.rhel8.7.noarch           openmpi-4.1.5rc2-1.59056.aarch64           sharp-3.2.0.MLNX20230122.a97f1d1c-1.59056.aarch64

(See previous digest mismatch errors.)

Hi,

Thank you for your patience .
Did you try to download the package from our website and install it ?

Thanks,
NVIDIA Enterprise Support

Yes, the packages are downloading intact. The local mirror is created with the same recursive wget method you find in RHEL/CentOS/Rocky/Alma for cases where there’s no upstream rsync service. The “createrepo” is only there for when a mirroring pass falls inside of an NVidia update of the upstream tree (ie, resulting in the mirroring seeing empty repodata directories). And besides, as the blockquotes show the problem isn’t the metadata at all, that’s working just fine. It’s something with visibility of digests, on only the aarch64 side of the fence, and only for a very small number of packages in the mlnx_ofed collection. The mft, hcoll, sharp, and openmpi packages obviously have their digests, the x86_64 systems can see that via ‘rpm -K’ against the static aarch64 rpm files. The aarch64 platform can see the same for all of the other 100+ packages in the collection, but not for mft, and not for hcoll, openmpi, sharp, or mlnx-ofed-all when unpacking the rpm.

Hi

We suspect that the issue might be OS Kernel issue for RH8.7 , we can see your image version
is 4.18.0-425.10.1.el8_7 but we tested in a newer version 4.18.0-425.14.1.el8_7 and it was ok as bellow:

uname -a
4.18.0-425.14.1.el8_7.aarch64 #1 SMP Mon Feb 13 10:41:20 EST 2023 aarch64 aarch64 aarch64 GNU/Linux

rpm -Kvv --nosignature mft-4.23.0-104.arm64.rpm 2>&1 |grep -i digest
Header SHA1 digest: OK
Header SHA1 digest: OK
Header SHA1 digest: OK
Header SHA1 digest: OK
MD5 digest: OK

Please give it a try.

Thanks,
Samer

It’ll be a bit before I can test against the .13.1 kernel, it’s not yet in the EL mirror as of last night (Rocky 8).

06:42:32 root@admin.godzilla:~ # find /var/repo/rocky-linux/8/ -type f -iname \*4.18.0-425.14.1.el8_7\*

06:42:59 root@admin.godzilla:~ # tail -5 /var/repo/utils/rocky.reposync.log
sent 1,470,212 bytes  received 532,395,971 bytes  5,365,489.28 bytes/sec
total size is 1,249,105,053,159  speedup is 2,339.73
End: Wed 08 Mar 2023 04:07:15 AM EST

With the compute nodes running the latest EL8.7 stable kernel (4.18.0-425.13.1.el8_7.aarch64) the problem persists. I can’t run a later kernel without leaving the Customer “blessed” EL stack and don’t currently have the resources to isolate a node for testing prerelease kernels. I’ll put something together but it won’t happen right away…