3 compute node switchless

Hello: I have three computing nodes equipped with ConnectX-4 dual port Mellanox cards.

Each compute node is directly connected with the other two nodes (sort of hyper-cube).

If I start a subnet on two nodes, I’m able to start an MPI (RDMA) job on those two nodes. If I start two subnets and try to execute my application on the three nodes, the MPI processes are started on all compute nodes, but after a few seconds the job fails.

I tried to follow this suggestion:

https://community.mellanox.com/s/feed/0D51T00006Sn2QqSAJ

but it doesn’t seems to be working in my case.

Can anyone help me understand how to configure this kind of setup?

Thank you !

Emanuele

Hi Emanuele,

Can you please provide the error that you are seeing when the job fails?

Regards,

Chen

Hi Chen,

sorry for the late reply

I have 3 nodes called DUMBO, TIMOTEO and JIMCORVO

this is part of /etc/hosts on Timoteo:

10.10.3.2 TIMOTEO21 TIMOTEO tim-ib

10.10.5.2 TIMOTEO23

10.10.3.1 jimcorvo12 JIMCORVO jim-ib

10.10.4.1 jimcorvo13

10.10.5.3 DUMBO32 DUMBO dumbo-ib

10.10.4.3 DUMBO31

I try to execute the job with command:

mpirun -genvall -genv I_MPI_HYDRA_DEBUG 1 -genv I_MPI_FABRICS=shm:ofi -n 24 -ppn 8 -hostfile hostfile ./wrf.exe

“hostfile” contains the names of the 3 nodes

the application is reporting errors like:

Abort(1014056975) on node 7 (rank 7 in comm 0): Fatal error in PMPI_Comm_dup: Other MPI error, error stack:

PMPI_Comm_dup(179)…: MPI_Comm_dup(MPI_COMM_WORLD, new_comm=0x7ffe1f481868) failed

PMPI_Comm_dup(164)…:

MPIR_Comm_dup_impl(57)…:

MPII_Comm_copy_with_info(702)…:

MPIR_Get_contextid_sparse_group(498): Failure during collective

I’m also linking the console output that I’m getting from mpiexec and the pcap file collected by ibdump on one of the two port on Timoteo

Please let me know if further details are required

thanks in advance!

Emanuele