CUDA and MPI Cluster Computing Implementation. Need advice for setting up MPI and CUDA over a cluste

alex.hart · February 18, 2010, 5:44pm

I need to be able to run a program using all the GPUs in a two computer cluster. There are five in total, 3 in one machine, 2 in the other. My idea for implementing such a program is the following:

[*]Start MPI

[*]Initialize device driver threads for each device (on both computers)

[*]Send data to secondary computer

[*]Send data to all GPUs

[*]Do required operations

[*]Terminate GPU threads

[*]Send data back to primary computer

[*]Stop MPI

My question is, will this work? Also, I am having trouble finding documentation on how to implement MPI over a cluster. Does the program need to be present on each computer in the cluster, or will MPI transfer the program into the computer’s local memory? I will be using an SPMD paradigm.

This is being done on a cluster with both computers running CentOS x64 (most recent version). I am using MPICH as my MPI Implementation, and CUDA 2.3 as my CUDA implementation. Any help or advice would be greatly appreciated.

~Alex

Tom_Milledge · February 19, 2010, 5:04am

I need to be able to run a program using all the GPUs in a two computer cluster. There are five in total, 3 in one machine, 2 in the other. My idea for implementing such a program is the following:

[*]Start MPI

[*]Initialize device driver threads for each device (on both computers)

[*]Send data to secondary computer

[*]Send data to all GPUs

[*]Do required operations

[*]Terminate GPU threads

[*]Send data back to primary computer

[*]Stop MPI

My question is, will this work? Also, I am having trouble finding documentation on how to implement MPI over a cluster. Does the program need to be present on each computer in the cluster, or will MPI transfer the program into the computer’s local memory? I will be using an SPMD paradigm.

This is being done on a cluster with both computers running CentOS x64 (most recent version). I am using MPICH as my MPI Implementation, and CUDA 2.3 as my CUDA implementation. Any help or advice would be greatly appreciated.

~Alex

This thread has some good links on parallel programming: http://forums.nvidia.com/index.php?showtopic=107375

You may get more interest if you describe in some detail the nature of your application.

avidday · February 19, 2010, 8:45am

That is a bit of a curious statement. MPI is primarily intended for use on distributed memory machines and clusters. The interweb is literally overflowing with tutorials and documentation describing how to use it on clusters. LLNL maintain a very useful set of introductions to many of the APIs used in HPC, for example. There material on MPI can be read here.

It would seem you have two basic design choices - either use host threads for running multiple gpus on each cluster node, and then use MPI for inter-node communications (so the number of members in the MPI communicator is equal to the number of nodes), or just use MPI and run one process per gpu (so the number of members in the MPI communicator is equal to the number of GPUs). The latter will be simpler, because it only requires one API, not two, but it potentially won’t perform as well, because MPI processes are considerably “heavier” than host threads, and things which happen naturally with threads within a shared memory space requires explicit data exchange in MPI, which increases communication overhead.

Topic		Replies	Views
Can I achieve multi-GPUs communication in one node through MPI? CUDA Programming and Performance	1	360	January 20, 2021
How to run these sample multi-gpu programs CUDA Programming and Performance	6	641	July 18, 2024
Good reference/examples for CUDA fortran with MPI, please? Legacy PGI Compilers	9	1337	April 5, 2024
Elementary Cuda Question MPI and CUDA Can one run a program already written in standard MPI? CUDA Programming and Performance	1	1451	August 12, 2009
Proper way to call CUDA function within MPI code CUDA Programming and Performance	5	506	April 4, 2024
Using GPUs on high performance machines CUDA Programming and Performance	4	1068	February 8, 2013
Assigning GPUs to specific CPU cores using MPI and Server 2008 CUDA Programming and Performance	0	1814	June 30, 2010
Error: Number of CUDA devices (2) is less than MPI processes (4) CUDA Programming and Performance	2	37	August 18, 2024
PGI Community edition support Legacy PGI Compilers	1	4131	January 20, 2017
Running nvidia Fortran on multiple GPUs with MPI nvc, nvc++ and nvfortran	36	386	December 14, 2024

CUDA and MPI Cluster Computing Implementation. Need advice for setting up MPI and CUDA over a cluste

Related topics