I need to be able to run a program using all the GPUs in a two computer cluster. There are five in total, 3 in one machine, 2 in the other. My idea for implementing such a program is the following:
Initialize device driver threads for each device (on both computers)
Send data to secondary computer
Send data to all GPUs
Do required operations
Terminate GPU threads
Send data back to primary computer
My question is, will this work? Also, I am having trouble finding documentation on how to implement MPI over a cluster. Does the program need to be present on each computer in the cluster, or will MPI transfer the program into the computer’s local memory? I will be using an SPMD paradigm.
This is being done on a cluster with both computers running CentOS x64 (most recent version). I am using MPICH as my MPI Implementation, and CUDA 2.3 as my CUDA implementation. Any help or advice would be greatly appreciated.