MXM - only up to 2 devices are supported?

Hello,

I recently installed a test system that utilizes multirail Infiniband. It consists of two fat nodes, each with four Xeon E5-4650’s. Each CPU has got a ConnectX-2 card directly attached. The latest MLNX-OFED is installed and working. Is there a way to force MXM to use all four cards? Both nodes are connected to a IS5025 switch.

If I run for example the osu_alltoall benchmark by

/usr/mpi/gcc/openmpi-1.8.4/bin/mpirun --mca btl,self openib -n 64 --hostfile test /usr/mpi/gcc/openmpi-1.8.4/tests/osu-micro-benchmarks-4.4/osu_alltoall

I get a lot of warnings like this:

[1421687772.785621] [linux-3e34:12178:0] ib_dev.c:405 MXM WARN Skipping IB device ‘mlx4_2’ - up to 2 devices are supported[1421687772.785640] [linux-3e34:12178:0] ib_dev.c:405 MXM WARN Skipping IB device ‘mlx4_1’ - up to 2 devices are supported[1421687772.785647] [linux-3e34:12178:0] ib_dev.c:405 MXM WARN Skipping IB device ‘mlx4_0’ - up to 2 devices are supported

OSU MPI All-to-All Personalized Exchange Latency Test v4.4

Size Avg Latency(us)

1 66.02

2 64.99

4 66.64

8 76.15

16 81.32

32 88.54

64 137.83

128 186.70

256 294.64

512 558.72

1024 1287.07

2048 2418.95

4096 3637.48

8192 5647.53

16384 9947.06

32768 19036.50

65536 38769.77

131072 71470.19

262144 141088.41

524288 294086.71

1048576 600280.88

by disabling mxm via --mca mtl ^mxm, the warnings disappear, and also the latency goes down dramatically:

Size Avg Latency(us)1 37.482 37.124 38.248 39.5916 50.0732 47.9364 53.22128 77.66256 116.47512 214.171024 335.792048 594.704096 1045.588192 1334.2216384 2972.1032768 4990.4465536 9215.92131072 16271.00262144 31121.92524288 61814.721048576 124195.41

I would be thankful for any suggestions!

Kind regards,

Tobias

Only two devices can be used at the same time. Try to add “-x MXM_IB_PORTS=mlx4_N”, just replace N, to mpirun command line and see if it improves results.