I’m having speed issues that I’ve isolated to my IS5030 switch.
System: I have two Infiniband switches: 1. A Mellanox IS50XX with 36 ports enabled and the FrabricIT internal subnet manager running, making it an IS5030, with latest firmware (IBM P/N: 98Y3756) 2. A Sun 36 port QDR Infiniband Switch, internal subnet manager. I also have 3 types of HCAs: Sun 375-3696 X4242A (rebranded Mellanox ConnectX-2 card), HP 544FLR-QDR (based on Mellanox ConnectX-3), and Intel/QLogic QLE7340. I haven’t figured out how to update the firmware for the Sun products. The HP HCAs have the latest firmware. The QLE7340’s don’t use firmware. Running CentOS 7 on all nodes.
I’ve tried the following tests:
- Two HP HCAs, back-to-back, opensm: 40 Gb/s
- Sun HCA, HP HCA, back-to-back, opensm: 40 Gb/s
- Two QLE7340, back-to-back, opensm: 40 Gb/s
- Sun and HP HCAs, Sun switch: 40 Gb/s
- Sun and HP HCAs, IS5030 switch: 20 Gb/s
- Two QLE7340, Sun switch: 10 Gb/s
- Two QLE7340, IS5030 switch: 20 Gb/s
I had a long discussion with an Intel rep, and it turns out that these two Infiniband switches are not compatible with the QLE7340’s (some of the later Mellanox X series switches are, though), which is probably why they aren’t reaching QDR speeds. So let’s ignore those for now.
The really weird thing is that the Sun and HP HCAs should really be negotiating to 40 Gb/s with the IS5030, and I have no idea why they are not. Clearly the HCAs, software, and cables (I tested all of these) are fine since they work at 40 Gb/s back-to-back.