Cannot restart openibd

Currently using connectx6 cards and trying to install the drivers on ubuntu 20.04. Successfully did this on two other identical systems already. On this system, the driver install completes successfully, but when I try to run sudo /etc/init.d/openibd restart I get the following error:

Unloading ib_core [FAILED]
rmmod: ERROR: Module ib_core is in use

I then run sudo lsmod | grep ib_core to find which modules are dependent on it and get the following output

ib_core 348160 2
mlx_compat 65536 1 ib_core

after attempting to run sudo modprobe -r mlx_compat I get the error

modprobe: FATAL: Module mlx_compat is in use.

I then run sudo lsmod | grep mlx_compat and get

mlx_compat 65536 1 ib_core

It seems that ib_core and mlx_compat are dependent on each other. Even if I try to us rmmod -f I still am never able to unload either module. Because I can’t restart openibd, I am not able to see the network interfaces either. I did add the recommended blacklists and regenerated my initramfs as well. Any advice what to try next?

Hello wk10,

Thank you for posting your inquiry to the NVIDIA Developer Forums.

Has the system been rebooted since the installation was completed?

If so, check # modinfo ib_core, and see if there are any processes running (maybe ibacm?) that are hanging onto that module:
# fuser -m (path to module) -v

Was the use of inbox/upstream tooling/drivers in place on this system prior to MLNX_OFED installation? If so, are any applications/daemons running that use the inbox/upstream drivers?

Are any iwarp drivers / iwarp adapters in play?
Anything that’s dependent on libibverbs?
If so, remove them.

Does this reproduce on a clean system?

Barring that, if you have NVIDIA Enterprise Support entitlement, we highly recommend opening a ticket with our NVIDIA Enterprise Support team for further assistance:

Best regards,
NVIDIA Enterprise Experience

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.