3 compute node switchless

Hello: I have three computing nodes equipped with ConnectX-4 dual port Mellanox cards.

Each compute node is directly connected with the other two nodes (sort of hyper-cube).

If I start a subnet on two nodes, I’m able to start an MPI (RDMA) job on those two nodes. If I start two subnets and try to execute my application on the three nodes, the MPI processes are started on all compute nodes, but after a few seconds the job fails.

I tried to follow this suggestion:

https://community.mellanox.com/s/feed/0D51T00006Sn2QqSAJ

but it doesn’t seems to be working in my case.

Can anyone help me understand how to configure this kind of setup?

Thank you !

Emanuele

Hi Emanuele,

Can you please provide the error that you are seeing when the job fails?

Regards,

Chen

Hi Chen,

sorry for the late reply

I have 3 nodes called DUMBO, TIMOTEO and JIMCORVO

this is part of /etc/hosts on Timoteo:

10.10.3.2 TIMOTEO21 TIMOTEO tim-ib

10.10.5.2 TIMOTEO23

10.10.3.1 jimcorvo12 JIMCORVO jim-ib

10.10.4.1 jimcorvo13

10.10.5.3 DUMBO32 DUMBO dumbo-ib

10.10.4.3 DUMBO31

I try to execute the job with command:

mpirun -genvall -genv I_MPI_HYDRA_DEBUG 1 -genv I_MPI_FABRICS=shm:ofi -n 24 -ppn 8 -hostfile hostfile ./wrf.exe

“hostfile” contains the names of the 3 nodes

the application is reporting errors like:

Abort(1014056975) on node 7 (rank 7 in comm 0): Fatal error in PMPI_Comm_dup: Other MPI error, error stack:

PMPI_Comm_dup(179)…: MPI_Comm_dup(MPI_COMM_WORLD, new_comm=0x7ffe1f481868) failed

PMPI_Comm_dup(164)…:

MPIR_Comm_dup_impl(57)…:

MPII_Comm_copy_with_info(702)…:

MPIR_Get_contextid_sparse_group(498): Failure during collective

I’m also linking the console output that I’m getting from mpiexec and the pcap file collected by ibdump on one of the two port on Timoteo

https://www.dropbox.com/s/z3e1wm9r4njyqfl/mpiexec_output.txt?dl=0

https://www.dropbox.com/s/gmgsjcjdfugx896/sniffer.pcap?dl=0

Please let me know if further details are required

thanks in advance!

Emanuele