ConnectX-5 EDR/Connect-X6 HDR with SX6036 FDR switch incompatibility(?)

hi all. i would love your feedback on a weird problem where i try to mix HDR/EDR with an FDR switch. I tried all possible cables FDR, EDR and HDR, copper and glass, but i can only get the combination to really work when i force the ports on the switch to QDR. Any other setting results in extremely weak performance (merely Megabytes/second+huge latencies)

From my understanding, IB is backwards compatible and i should be able to get a working FDR line speed right?
Am i missing something?

Hello schonewille,

Thank you for posting your inquiry on the NVIDIA Developer Forum - Infrastructure and Networking - Section.

To a certain extend you are right regarding the backwards compatibility of IB.

In the f/w RN for the adapters and switches (HDR and NDR), we are providing a Connectivity Matrix what is supported and on which speed.

If the correct combination used, you do not have to force any port speed.

The connectivity matrix you can find through the following link → https://docs.nvidia.com/networking/display/NVIDIAQuantum2Firmwarev3120102110/Firmware+Compatible+Products

Or through the main page → NVIDIA Documentation Center | NVIDIA Developer ,searching for the adapter or other switch RN.

For all mixed fabrics to function with out any issues, all needs to be on the latest code (Switch and HCA f/w and driver as well)

In any case you have a fully supported combination, and still experiencing issues, please do not hesitate to open a NVIDIA Networking Support ticket (Valid Support Entitlement needed) so we can assist you further.

Thank you and regards,
~NVIDIA Networking Technical Support

Hi MvB

thanks for the heads up. Yes, all HCA-s and switch are at the latest available firmware level. The matrix more or less confirmed that it should work, and it sort of does, but not all the way:

#-----------------------------------------------------------------------------
# Benchmarking Sendrecv
# #processes = 2
#-----------------------------------------------------------------------------
       #bytes #repetitions  t_min[usec]  t_max[usec] t_avg[usec]   Mbytes/sec
            0         1000         2.12         2.12 2.12         0.00
            1         1000         2.02         2.02 2.02         0.99
            2         1000         2.01         2.01 2.01         1.99
            4         1000         2.02         2.02 2.02         3.95
            8         1000         2.02         2.02 2.02         7.90
           16         1000         2.03         2.03 2.03        15.78
           32         1000         2.13         2.13 2.13        30.00
           64         1000         2.56         2.56 2.56        49.91
          128         1000         2.53         2.53 2.53       101.13
          256         1000         3.01         3.01 3.01       169.99
          512         1000         3.23         3.23 3.23       317.49
         1024         1000         3.75         3.75 3.75       545.81
         2048         1000         5.07         5.07 5.07       807.99
         4096         1000         7.57         7.57 7.57      1082.22
         8192         1000      3219.36      3219.36 3219.36         5.09
        16384         1000     10069.49     10069.49 10069.49         3.25
        32768          637     42785.04     42785.06 42785.05         1.53
        65536          637        49.46        49.46 49.46      2649.95
       131072          320        88.63        88.63 88.63      2957.88
       262144          160        49.96        49.96 49.96     10493.18
       524288           80        89.23        89.23 89.23     11751.43
      1048576           40       167.32       167.33 167.33     12532.88
      2097152           20       322.22       322.23 322.23     13016.39
      4194304           10      1138.03      1138.04 1138.03      7371.13

8k-32k is extremely poor. 64k-128k is not too pretty either.

regards,
A

Believe it or not, but kernel 5.17.5-1 seem to have solved our problem. The Mellanox stack 5.4 or 5.5 did not as well as recent kernels, however the very latest kernel worked. Magic…

unfortunately, only intelmpi/psm3 performed well. openmpi4/ucx is still way below par, even using ud or ud_x.

am running out of options here except reducing the rates to QDR speed.