I have 2 sets of servers with two tower workstations that are the head nodes, one tower for each server set. One set is 6 servers and the second is made up of 8 servers. The two towers are identical, hardware-wise. All machines use the same dual port ConnectX-4 cards. The server sets are connected to two Mellanox SX6018 switches with QSFP+ cables (one switch per server set) and the head node towers are connected to an SX6036G with QSFP+ cables. One port is used on each ConnectX-4 card. This configuration has been in use for over a year and we’ve switched between high speed ethernet with RDMA/ ROCE and FDR10 infiniband fabric multiple times with no issues. We recently switched to FDR infiniband for testing and everything worked fine, but when we switched back to FDR10 the head node towers would no longer pass data (MPI) to the srvers. We can ping from tower to server over the infiniband and ib_send_bw runs successfully between them with speeds at 38 Gbs, but MPI can’t establish a connection from tower to server. The MPI software works fine from server to server. The MPI software has not changed from when it worked previously at FDR10 and this configuration works flawlessly when set to FDR but it does not work at the slower FDR10 configuration. The switches are set to auto-negotiate the fabric speed and switch reboots have not helped. Our customer dictates that we use FDR10 so we need to get this back and working at FDR10. Any suggestions?
Related topics
| Topic | Replies | Views | Activity | |
|---|---|---|---|---|
| Replace ConnectX-3 FDR10 adapter with a ConnectX-5 EDR adapter | 1 | 294 | February 27, 2021 | |
| ConnectX-5 EDR/Connect-X6 HDR with SX6036 FDR switch incompatibility(?) | 4 | 2256 | May 11, 2022 | |
| Speed problem on new MT28908 cards | 1 | 992 | September 6, 2020 | |
| How to configure MCX354A-FCBT Mellanox InfiniBand speed at 56Gbps? | 15 | 2468 | December 11, 2018 | |
| Extremely slow ethernet speed with ConnectX VPI card in EN mode. | 2 | 656 | December 28, 2015 | |
| Mellanox connectX 3 and connectX 4 Loop or crossover connection | 1 | 1062 | March 18, 2020 | |
| Connecting Qlogic 12200-BS01 QDR InfiniBand switch with NVidia Mellanox QM8700 HDR switch | 5 | 1038 | December 24, 2023 | |
| Setting up 10gb in my Lab - Help | 2 | 705 | January 2, 2019 | |
| SX6018 to an Connect-IB MCB191A gives me only 40Gbps, not 56Gbps | 3 | 455 | October 16, 2017 | |
|
configure connectx-3 to fdr
|
0 | 160 | June 6, 2018 |