Switches IS5025 stopped working (yellow LED on all ports)

Dear Colleagues,

Until recently we have been using our three InfiniBand IS5025 switches, connected to have 3x 36 ports, in production. This week we had a power shutdown in our server room, and the three switches were on UPS, so they did not lose the power. When we recovered the power, however, our machines cannot access the InfiniBand anymore. Strangely on the three switched it seemed that there were two “blocks” of ports with yellow, the rest green LED on each port. I tried to recover the connections by restarting one of the switches (power off/on), but after this all the ports show a yellow LED, and of course no connection between the nodes. A side note, we have a second network, with two switches of the same type, on the same computing cluster, and that network seems to continue working fine (there are some nodes that have two InfiniBand cards in them, the one on the first network does not, on the second work does work).

Would any one have an idea what I could try next? I must admit that I do not have much experience in maintaining InfiniBand networks, I became the administrator here once the cluster was already operational.

Thank you in advance for any suggestions!

Hi Ari,

Sounds like there is SM (subnet manager) running.

can you run the command “sminfo” from one of the Linux servers (with sudo)?

if there is no response, try startin opensm on one of the servers:

opensm -B

Hello Eddie,

Thank you very much for your reply! Indeed, you were right: The old master, which had been dismounted from all the services, was still running the ‘opensm’ for the networks, but when we had moved that machine outside the InfiniBand network during the power shutdown. So now when I started the ‘opensm’ on the new frontal node of the cluster, I could indeed recover the green lights and the connections. :) Thank you again, I appreciate very much your time and effort!!