I’m trying to use a cluster of S1070s but not having much success. I can use MPI over 4 devices within one S1070, but here’s what happens when I try to use 8 devices over two S1070s of a cluster.
I run the MPI with
mpirun -hostfile hostsfile -np 8
and hostsfile is
node0
node1
node0
node1
node0
node1
node0
node1
and am mapping the MPI processes to the devices with the switch statement
//configuration for 2x4
switch(rank)
{
case 0 :
case 1 : DEVICE = 0;
break;
case 2 :
case 3 : DEVICE = 1;
break;
case 4 :
case 5 : DEVICE = 2;
break;
case 6 :
case 7 : DEVICE = 3;
break;
}
I assumed that this would map
Process 0 → node 0 device 0
Process 1 → node 1 device 0
Process 2 → node 0 device 1
Process 3 → node 1 device 1
Process 4 → node 0 device 2
Process 5 → node 1 device 2
Process 6 → node 0 device 3
Process 7 → node 1 device 3
but apprently not so. What appears to be happening is that both processes 0 and 1 are being mapped onto node 0 device 0 and as a consquence that device runs out of device memory and many variables in process 1 do not get allocated and I get invalid device pointers when I try to use them.
Similarly for processes 2 and 3, processes 4 and 5, processes 6 and 7.
Does anyone have any suggestions and/or remedies as to what is going wrong and how to rectify it?