Dear Colleagues,
Until recently we have been using our three InfiniBand IS5025 switches, connected to have 3x 36 ports, in production. This week we had a power shutdown in our server room, and the three switches were on UPS, so they did not lose the power. When we recovered the power, however, our machines cannot access the InfiniBand anymore. Strangely on the three switched it seemed that there were two “blocks” of ports with yellow, the rest green LED on each port. I tried to recover the connections by restarting one of the switches (power off/on), but after this all the ports show a yellow LED, and of course no connection between the nodes. A side note, we have a second network, with two switches of the same type, on the same computing cluster, and that network seems to continue working fine (there are some nodes that have two InfiniBand cards in them, the one on the first network does not, on the second work does work).
Would any one have an idea what I could try next? I must admit that I do not have much experience in maintaining InfiniBand networks, I became the administrator here once the cluster was already operational.
Thank you in advance for any suggestions!