MPI Jetson Nano

Hi,
I have a minor problem using MPIEXEC using the following call:
mpiexec --hostfile clusterfile ./simpleMPI

When clusterfile only has one ip adress, i.e. one node it still says it will start 4 processes.
And when adding another node (new ip adress for second nano) it will start 8 processes.
Using jtop for the two different nano boards, I can see the both GPUs start to process information but the process generates an error both with 1 or 2 nano boards.

I can add the key -np 1 when running just one of the boards and can use MPI to connect the other board individually, but this does not work if I would like to run both at the same time.

I have used the setup from (examples/how_to_build_nvidia_jetson_gpu_cluster.md at master · garyexplains/examples · GitHub)
and I have seen a previous post that partially solved connection to other node
(Open MPI network setup)

What i would like to know is why the hostfile generate 4 times as many nodes for each node, it seems like thats where the problem is.

Kind regards

Magnus

I can’t answer this, may other developers help to share experiences or have suggestions.

Hi,
a solution… not a good one… will let my IT guys look at this…

time mpiexec --mca btl_tcp_if_include eth0 --pernode --hostfile clusterfile ./simpleMPI

breakdown: --mca btl_tcp_if_include eth0
for some reason internal ip stuff…
next: --pernode
seems to be needed, else 4 CPUs will get the same information???

Kind regards
Magnus

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.