Question about CUDA+MPI

Geunwoo · March 11, 2018, 7:31pm

Hi, I have a question about inter-node calculation using cuda+mpi
I succeeded in making cuda with mpi code.
But I can only use that code in a single node.
How can I use it in multiple node?

I set the MPI stuff by following code.

Integer:: myrank, nprocs, tag, ierr, localRank
Character(len=10):: localRankStr

call get_environment_variable&('OMPI_COMM_WORLD_LOCAL_RANK', localRankStr)
read(localRankStr , ' (i10) ' ) localRank
ierr = cudaSetDevice(localRank)

! MPI initialization
call MPI_init(ierr)
call MPI_comm_rank( MPI_COMM_WORLD , myrank , ierr )
call MPI_comm_size( MPI_COMM_WORlD , nprocs, ierr )

Secondly, can I assign GPU when I using multiple calculation?
I used that code by following command.

mpif90 -o3 -ta=tesla,cuda8.0 aaa.cuf
mpirun -np 4 ./3

Then, automatically, device number 0,1,2,3 were used.
Can I assign GPU like 1,3,5,7?

tull · March 12, 2018, 6:47pm

Hello,

I think what you are asking for is answers to the following
(remember we want to speed up code)

With MPI, can I speed up my program by running two processes on the same Platform. Yes you can, because you may have high computation costs, but low memory use, so a multi-core machine
may be able to run 2 or more MPI processes at full speed. Multi-user
operating systems are needed for this.
With CUDA+MPI, can I speed up my program by running two processes on the same Platform, and have them share the resources of one GPU.

Probably not. Running a single process that uses a GPU could be faster than having two processes where only EACH process uses the GPU. But two processes on a GPU would be done sequentially, and could be slower. Having only one process use the GPU may be the
best (but a little more complicated.

With CUDA+MPI can I run two processes on on a single multi-core platform with 2 GPUs, with each process selecting and using one of the GPUs.

This may work, but the separate memory spaces may be a pain to use. But it will probably be slower than running a single
process on the platform, with a ‘multi-threaded’ OpenMP parallel section. Each thread of the single process selects GPU and runs the
kernel on it. Since it is multi-threaded, both GPUs work in the same memory space.

In general, it is better to run single MPI processes on platforms with
GPUs. Have the single process use multiple GPUs with OpenMP,
so the GPUs can run simultaneously.

dave

MatColgrove · March 12, 2018, 9:03pm

To add to Dave’s suggestions, you can compile with “-ta=host,tesla:cuda8.0” to create a unified binary. In this case, if the node has a GPU, then it will be used. Otherwise, the code will run sequentially on the host. Compiling with “-ta=multicore,tesla:cc8.0” will create a unified binary that will again run on the GPU if available otherwise run across all the cores of the host.

Then, automatically, device number 0,1,2,3 were used.
Can I assign GPU like 1,3,5,7?

Sure, assuming you have 8 GPUs on the system. Just change “localrank” in your call to “cudaSetDevice” to the GPU number you want to assign to the MPI Rank.

-Mat

Geunwoo · March 13, 2018, 3:53pm

Thank you Dave and thank you Mat!
I solved 2 problems!

Thanks again!

Topic		Replies	Views
about multi GPU control CUDA Programming and Performance	3	711	December 23, 2019
Using multiple GPUs Legacy PGI Compilers	7	22081	August 11, 2009
How to run these sample multi-gpu programs CUDA Programming and Performance	6	435	July 18, 2024
Using MPI+multi-GPUs with CUDA 4.0 CUDA Programming and Performance	5	778	June 9, 2011
CUDA and MPI Cluster Computing Implementation. Need advice for setting up MPI and CUDA over a cluste CUDA Programming and Performance	2	2481	February 19, 2010
Cuda streams vs Cuda+MPI How the different CPU processes access to the GPU? CUDA Programming and Performance	13	15940	March 20, 2011
Using GPUs on high performance machines CUDA Programming and Performance	4	1063	February 8, 2013
problem with multi gpu using mpi Legacy PGI Compilers	2	2178	December 2, 2015
Running CUDA-Fortran on multiple GPU nodes nvc, nvc++ and nvfortran	4	808	March 12, 2021
using all 4 GPUs in S1070 from multi-core cpu? how CUDA Programming and Performance	11	32418	December 13, 2010

Question about CUDA+MPI

Related topics