How long should it take for an IB interface’s State to go from Down/Polling to Active/LinkUp? For a small, relatively new IB network we are waiting anywhere from 20 to 120 seconds for RHEL 7 hosted HCAs to become active.

For a small, relatively new IB network we are waiting anywhere from 20 to 120 seconds for RHEL 7 hosted HCAs (Mellanox Technologies MT28908 Family [ConnectX-6]) to become active.

I’m just getting started with IB, but seems this should take only a few seconds. What should I be expecting?

The network consists of two QM8700 spine switches, each of which is connected to four QM8790 leaf switches. FWIW, there are four connections from each spine switch to each leaf switch. Attached to the four leaf switches are about 81 REHL 7 nodes.

Initially we were running the SM on the spine switches, with default config. This morning I tried running higher priority SM managers on two of the RHEL 7 nodes. The nodes do start using the new SM, but this doesn’t seem to change much in terms of LinkUp times.

So again, is it normal to see a port polling for almost two minutes before becoming active? If not (as I suspect) can we get some suggestions on how best to fix (or at least diagnose) the problem.

Hello Randy,

Thank you for posting your inquiry on the NVIDIA Networking Community.

Based on the information provided, there is a big chance that you are not running the latest f/w and s/w code for HDR in your fabric.

With the latest f/w and s/w code for ConnectX-6 adapter and switch, the link up time from INIT to Active is around 20-30 sec.

It is not relevant where you run the SM, just to make sure you have the latest code installed on the switches and adapters.

If after upgrading you still experiencing issues regarding to links, please open a NVIDIA Networking Technical support ticket by sending an email to support@mellanox.com

Latest f/w and s/w code:

Thank you and regards,

~NVIDIA Networking Technical Support