For a small, relatively new IB network we are waiting anywhere from 20 to 120 seconds for RHEL 7 hosted HCAs (Mellanox Technologies MT28908 Family [ConnectX-6]) to become active.
I’m just getting started with IB, but seems this should take only a few seconds. What should I be expecting?
The network consists of two QM8700 spine switches, each of which is connected to four QM8790 leaf switches. FWIW, there are four connections from each spine switch to each leaf switch. Attached to the four leaf switches are about 81 REHL 7 nodes.
Initially we were running the SM on the spine switches, with default config. This morning I tried running higher priority SM managers on two of the RHEL 7 nodes. The nodes do start using the new SM, but this doesn’t seem to change much in terms of LinkUp times.
So again, is it normal to see a port polling for almost two minutes before becoming active? If not (as I suspect) can we get some suggestions on how best to fix (or at least diagnose) the problem.