map-by dist not working in HPC-X?

I have noticed that the HPC-X MPI does not support the -map-by dist:<hca_name> any longer.

Even when rebuilding it I don’t get it to compile in the “dist” value for this option.

I suspect, but I am not sure that it has to do with the building of the hwloc library. Right now I can not see what makes it fail though. The configure and build looks as if they work as expected.

Has anyone else noticed this behaviour?

It works as expected in HPC-X 1.2.0-325 but not in 1.3.0-326 nor in 1.3.331 (RHEL 6.5/6.6 versions used)

mpirun -help shows:

–map-by Mapping Policy [slot | hwthread | core | socket

(default) | numa | board | node]

The man page lists:

Supported options include slot, hwthread, core, L1cache, L2cache, L3cache, socket, numa, board, node, sequential, distance, and ppr.

It works fine to run version 1.2.0-325 on m,y system, but using -map-by dist:xxxx,span on the 1.3 versions I get the following error:

The mapping request contains an unrecognized modifier:

Request: dist:mlx5_0,span

Please check your request and try again.


[nxt0225:27836] [[2496,0],0] ORTE_ERROR_LOG: Bad parameter in file …/…/…/…/…/openmpi-gitclone/orte/mca/ess/hnp/ess_hnp_module.c at line 523

/Nils

Interestingly enough, the following does not cause an error although I think it is equivalent.

-mca rmaps_base_mapping_policy dist:span -mca rmaps_dist_device ${hca_dev}

As I understand it this should be the same as -map-by dist:${hca_dev},span ???

It is a bit hard to find out if the binding is correct because on the system I have available this mapping is the same as the natural mapping…

Further googling shows that the syntax I used initially apparently is not the documented way to specify this mapping any longer. Although no obvious pointers as to when it changed and why.

Suggested syntax in the new README http://bgate.mellanox.com/products/hpcx/v1.3/README.txt http://bgate.mellanox.com/products/hpcx/v1.3/README.txt is:

-map-by dist:span -mca rmaps_dist_device ${hca_dev}

Hi Nils,

The syntax you found from the README is a right one, the motivation and some details can be found here:

https://github.com/open-mpi/ompi/pull/494 https://github.com/open-mpi/ompi/pull/494

M

Thanks.