Running latest RHEL 7.9 with latest MLNX OFED 5.5 install via yum repo Index of /public/repo/mlnx_ofed/latest
Hardware: ConnectX-5 VPI
Problem: When running
sysctl -a the kernel panics and machine reboots.
Workaround: Downgrading to MLNX OFED 5.4 fixed the issue and
sysctl -a works fine.
Any way to get the driver developers something useful to help fix this problem?
Is there any more information needed to help debug this?
Did some extra digging and found that on the servers this was happening on, NFS RDMA was enabled and had loaded kernel modules
When i disabled NFS RDMA by editing /etc/sysconfig/nfs and removing
Then also editing
/etc/nfs.conf and commenting out
# rdma = n
Rebooting the server and
lsmod no longer showed
sysctl -a was able to work again.
Hopefully this helps you all debug the issue.
Thank you for sharing this information with us. We are glad to hear that you were able to resolve the issue.
If you require further support with this issue and have a current support entitlement, please submit a new support case through the customer support portal.
-Nvidia Network Support
While i was able to create a workaround, the OFED driver still has a bug in it that causes the system to kernel panic. Might want to tell your engineers about it.
Also on the topic of support, i’ve tried twice now to get an active support contract for our NICs, but the sales folks said they weren’t able to sell me a contract for an existing NIC, i would need to purchase a new NIC in order to get a contract.
Could you share panic log from the console?
As additional check, does adding ‘nosmap’ parameter to grub.conf and restarting the servers resolves the issue?
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.