Mellanox OFED 5.5 kernel panics on RHEL 7.9 when running sysctl -a

Running latest RHEL 7.9 with latest MLNX OFED 5.5 install via yum repo Index of /public/repo/mlnx_ofed/latest

Hardware: ConnectX-5 VPI

Problem: When running sysctl -a the kernel panics and machine reboots.

Workaround: Downgrading to MLNX OFED 5.4 fixed the issue and sysctl -a works fine.

Any way to get the driver developers something useful to help fix this problem?

Is there any more information needed to help debug this?

Thanks,
Nick

Did some extra digging and found that on the servers this was happening on, NFS RDMA was enabled and had loaded kernel modules rpcrdma and svcrdma

When i disabled NFS RDMA by editing /etc/sysconfig/nfs and removing RPCNFSDARGS="--rdma=20049"

Then also editing /etc/nfs.conf and commenting out

BAD

[nfsd]
  rdma=20049

GOOD

[nfsd]
# rdma = n

Rebooting the server and lsmod no longer showed rpcrdma and svcrdma and sysctl -a was able to work again.

Hopefully this helps you all debug the issue.

Hello,

Thank you for sharing this information with us. We are glad to hear that you were able to resolve the issue.

If you require further support with this issue and have a current support entitlement, please submit a new support case through the customer support portal.

-Nvidia Network Support

Hilaryn,

While i was able to create a workaround, the OFED driver still has a bug in it that causes the system to kernel panic. Might want to tell your engineers about it.

Thanks,
Nick

Hilaryn,

Also on the topic of support, i’ve tried twice now to get an active support contract for our NICs, but the sales folks said they weren’t able to sell me a contract for an existing NIC, i would need to purchase a new NIC in order to get a contract.

Thanks,
Nick

Could you share panic log from the console?
As additional check, does adding ‘nosmap’ parameter to grub.conf and restarting the servers resolves the issue?

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.