Mixed Programing combining MPI and CUDA

Praveen_PVS · May 4, 2009, 5:50am

Can anyone help me giving details regarding how to write a program combing both MPI and CUDA API’s?
I am new to this and need a starting point and i am very much interested in this.

Sarnath · May 4, 2009, 5:57am

It is possible and i have done this. But I think you should first learn CUDA and then MPI and then mix them both.

Praveen_PVS · May 4, 2009, 8:09am

Hello sir,
I have learnt Both CUDA and as well as MPI.
I have done programming in both individually and now i heard that these both can be mixed up.
I request you to give me some guide lines and some direction to achieve this.
Thanking you for your kind reply .

Sarnath · May 4, 2009, 11:36am

Then, it is pretty straightforward.

As a first step, just print he CUDA devices listed in various nodes.

So, you run your job as “mpirun” – each node will find CUDA devices and print them.

Thats good enough to start with, isn’t it.

Now, you can write CU files and call kernels in all these nodes…

You need to divide your data-set based on the MPI Rank assigned to each node and so on.

Where you store the data (datasbase, distributed file system or Windows share or NFS share ) is all upto you…

Therez nothing that limits you from integrating MPI and CUDA that I am aware of. Its fairly straightforward.

avidday · May 4, 2009, 11:58am

They can’t be “mixed up”.

In some applications it makes sense to use both. I work with doing simulations on large domains where it is possible to use MPI with domain decomposition to distribute the computational domain over many nodes on a distributed memory cluster, and then use CUDA kernels and/or CUBLAS/CUFFT function calls to parallelize subdomain level computations on a companion GPU dedicated to each node. In embarrassingly parallel problems, or those with low interprocess communication overheads, like explicit time marching schemes, such an approach makes sense. But even then, the CUDA and MPI elements of the code aren’t “mixed up”, they are effectively totally separate.

MPI and CUDA are basically orthogonal parallel computing paradigms.

Praveen_PVS · May 4, 2009, 12:00pm

Dear sir,
Thanking you for your kind and quick reply to my problem.
I don’t see the same straightforwardness as you see because i have never written a program like this.
suppose i have called a kernel from the each node, how should I compile and run those programs?
Is there any material which clears all the doubts??
Is it possible to send me simple code which clears me so that i will understand better with the example.
Thanking you once again for your kind help.

Praveen_PVS · May 4, 2009, 12:04pm

What i meant is exactly what you are talking about. I have so many nodes and each node can call the kernel and run the program.
I feel more parallelism we will get this time.
But my doubt is how to go about doing this. I am very new to this. Can you suggest me some material or some simple code which can clear all the doubts in my mind.
I thank you for your kind help.

Sarnath · May 4, 2009, 12:36pm

MPI_Init( (int *)NULL, (char ***)NULL);

	MPI_Comm_rank( MPI_COMM_WORLD, &myrank );

	MPI_Get_processor_name(computer, &length);

	if (IsCUDACapable() == true)

	{

		printf("%s: I am CUDA capable\n", computer);

		fflush(stdout);

		printf("%s: I have the following CUDA devices:\n", computer);

		fflush(stdout);

		printCUDADevices();

		fflush(stdout);

	} else {

		printf("%s: I am NOT CUDA capable\n", computer);

	}

	fflush(stdout);

	MPI_Finalize();

In the example above, each NODE prints the CUDA devices it has OR it says it is NOT cuda capable.

You just need to implement “printCUDADevices” in the usual way - Get the device count, and for all devices , get the props and print the name.

It must be fairly simple to implement the “IsCUDACapable()” function as well. Just get the device count and see if it is >1 – then sure capanble.

If 1 - if the props has “emulation” string in name then say NOT capable. Otherwise say capable…

avidday · May 4, 2009, 12:51pm

That almost certainly isn’t what I am talking about. I am talking about a situation where you have

[list=1]

[*]A distributed memory domain decomposition code which uses MPI for interprocess communications

[*]Said code contains an embarrassingly parallel subdomain level workload

[*]Said code runs on a distributed memory cluster where each node has its own CUDA capable GPU

In such a situation, it may then be feasible to parallelize the subdomain workload using CUDA code. In the resulting code, each node is simply offloading calculations from its local CPU onto its local GPU to increase performance at a subdomain level. The modifications to the existing distributed memory code can be very minimal indeed, and (from an MPI viewpoint) almost nothing changes.

Praveen_PVS · May 4, 2009, 1:18pm

I understood how to implement but what really bothers me using which complier i should use, if i use mpicc how to link the .cu file to this and vice versa.
I will try this simple program but can you tell me how to compile and the run the codes.
Please Try to understand, i am new to this kind of programming. How to link object code of one to another??
I am very grateful to you, because of this fruitful discussion so many of my doubts got cleared.
Only this big doubt is remaining. Can you guide me regarding this.

avidday · May 4, 2009, 2:06pm

You can use the standard CUDA build process with some slight modifications to the SDK common.mk. All that is required is to provide extra paths and libraries to the C/C++ compiler and linker to add in MPI support. In most flavours of MPI I have used, mpicc is nothing more that a wrapper script which decorates any existing compile and link arguments with the necessary extras for the preprocessor to find the MPI headers, and for the linker to find the MPI libraries. Identical results can be achieved with the standard C/C++/Fortran compilers with the additional arguments to support MPI explicitly added.

Sarnath · May 4, 2009, 2:39pm

I dont know what is MPICC.

But, MPI is just a communication library. Just compile normal CPP with MPI calls (just like the one I pasted above) and then link it to the MPI library available. In the same project use “CU” files and use NVCC to compile it. And the linking automatically happens as they are in the same project.

Thats all. I dont see a problem.

Sarnath · May 4, 2009, 2:45pm

From MPICC man page, I found on net.

"

Environment Variables

By default, the wrappers use the compilers that were selected when Open MPI was configured. These compilers were either found automatically by Open MPI’s “configure” script, or were selected by the user in the CC, CXX, F77, and/or FC environment variables before “configure” was invoked. Additionally, other arguments specific to the compiler may have been selected by configure.

These values can be selectively overridden by either editing the text files containing this configuration information (see the FILES section), or by setting selected environment variables of the form “OMPI_value”.

Valid value names are:

CPPFLAGS

Flags added when invoking the preprocessor (C or C++)

LDFLAGS

Flags added when invoking the linker (C, C++, or Fortran)

LIBS

Libraries added when invoking the linker (C, C++, or Fortran)

CC

C compiler

CFLAGS

C compiler flags

CXX

C++ compiler

CXXFLAGS

C++ compiler flags

F77

Fortran 77 compiler

FFLAGS

Fortran 77 compiler flags

FC

Fortran 90 compiler

FCFLAGS

Fortran 90 compiler flags

"

You may need to set “OMPI_CC” as “nvcc” so that it uses nvcc as the backend compiler.

avidday · May 4, 2009, 4:00pm

That is only applicable to Open MPI. There are at least two other popular open source MPI stacks available (LAM and MPICH2), and several vendor implementations. All are different. Without knowing what operating system, compiler and MPI stack he is using it is impossible to provide specifics.

Even if he is using Open MPI, you definitely don’t want to change to default compiler to nvcc. That will certainly not work.

Praveen_PVS · May 5, 2009, 6:21am

Dear sir,
Good morning. Thank you very much for your help. Any mpi program, we will compile using mpicc. thats what i meant mpicc.
Any C file that has mpi calls, we will compile using mpicc program_name.
I think mpicc is just a wrapper function which uses exiting compiler but links the mpi library.
I have written a .cu file having mpi calls which asks each node to give the count of the cuda devices.
So since it is .cu file so i have used nvcc program_name.
Since the program has mpi calls i have to link that library, so can you tell me i should i do that.

Praveen_PVS · May 5, 2009, 6:27am

Dear Sir, Thank you very much for your kind help.
I have the following things after reading your reply.
Please suggest me what i have to do.

The program name is test.cu

#include<stdio.h>
#include<cuda.h>
#include “mpi.h”

int main(int argc, char **argv)
{
int MyRank, NumberOfProcs;
MPI_Status Status;
int Root = 0;
int Count, Device;
struct cudaDeviceProp Properties;

    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &MyRank);
    MPI_Comm_size(MPI_COMM_WORLD, &NumberOfProcs);

    if(MyRank == Root)
    {
            cudaGetDeviceCount(&Count);
            cudaGetDevice(&Device);
            cudaGetDeviceProperties(&Properties, Device);
            printf("I am processor with my rank %d has %d number of Cuda Devices and their names are %s \n",MyRank, Count, Properties.name);
    }
    else
            printf("I am processor with My Rank %d and printing HELLO WORLD \n",MyRank);

    MPI_Finalize();
    return(0);

}

I have compiled this by giving nvcc test.cu
Then it gave
test.cu:3:17: mpi.h: No such file or directory
Then i realized that i have to include mpi library
My mpi library path is
/usr/local/mpich2-1.0.7/bin/mpicc
So I request you to tell my how to link this path and make this program running successfully.
At present i have only one card on the root processor so that why i have written program like that.
Looking forward for your help.

avidday · May 5, 2009, 6:55am

You don’t need to compile that example with nvcc. Either compile it with the standard C build process in the SDK and add MPI paths and libraries to it - something like

-I/usr/local/mpich2-1.07/include -L/usr/local/mpich2-1.07/lib -lmpich

should probably do it, or provide the cuda includes and libraries to mpicc - something like

-I/usr/local/cuda/include -L/usr/local/cuda/lib -lcudart

Praveen_PVS · May 5, 2009, 7:33am

So i have complied using cc complier thsi is the result

cc -I/usr/local/mpich2-1.0.7/include -L/usr/local/mpich2-1.0.7/lib -lmpich -I/usr/local/cuda/include -L/usr/local/cuda/lib -lcudart test.c
test.c: In function â€˜mainâ€™:
test.c:11: error: storage size of â€˜Propertiesâ€™ isnâ€™t known

When i have used nvcc compiler and linking the mpi library
I am getting the following error
nvcc -I/usr/local/mpich2-1.0.7/include -L/usr/local/mpich2-1.0.7/lib -lmpich test.cu
In file included from /usr/local/mpich2-1.0.7/include/mpi.h:1142,
from test.cu:3:
/usr/local/mpich2-1.0.7/include/mpicxx.h:26:2: #error “SEEK_SET is #defined but must not be for the C++ binding of MPI”
/usr/local/mpich2-1.0.7/include/mpicxx.h:30:2: #error “SEEK_CUR is #defined but must not be for the C++ binding of MPI”
/usr/local/mpich2-1.0.7/include/mpicxx.h:35:2: #error “SEEK_END is #defined but must not be for the C++ binding of MPI”

Why cant I use nvcc complier and link the mpi library, in the same way i am using cc complier and linking both the libraries.

So Can you tell me What i should do to make this program run properly.

avidday · May 5, 2009, 8:07am

Read the MPICH2 documentation. The SEEK_SET conflict between C++ stdio.h and the MPI version 2 standard is well documented, and several workarounds are offered in the documentation.

Praveen_PVS · May 5, 2009, 8:55am

I have found the way
nvcc -I/usr/local/mpich2-1.0.7/include -L/usr/local/mpich2-1.0.7/lib -lmpich test.cu -DMPICH_IGNORE_CXX_SEEK
That’s the thing we have to give because of conflicts in versions( i am thinking)
I feel i can do some programming and get some results.
I thank you whole heartedly for your kind help.
Thank you very much once more.
Is it possible to send your mail id to meetpraveen_18@yahoo.com so that if have any doubts i can ask you.
Looking forward for your mail.

Topic		Replies	Views
MPI and CUDA mixed programming General CUDA programming CUDA Programming and Performance	22	23646	July 27, 2010
How to run these sample multi-gpu programs CUDA Programming and Performance	6	307	July 18, 2024
CUDA Cluster Programming Any1 Experienced? CUDA Programming and Performance	12	7047	December 5, 2008
Proper way to call CUDA function within MPI code CUDA Programming and Performance	5	373	April 4, 2024
MPI + Peer2Peer combine MPI and Peer2Peer CUDA Programming and Performance	5	1813	February 8, 2012
Benchmarking CUDA-Aware MPI Technical Blog	16	1351	August 20, 2019
An Introduction to CUDA-Aware MPI Technical Blog	5	940	August 30, 2019
use gpu and cpu with c language CUDA Programming and Performance	0	2062	May 10, 2010
An error occurred when using MPI and OpenACC together nvc, nvc++ and nvfortran	11	984	April 26, 2023
How to compile MPI CUDA code? CUDA Programming and Performance	5	22368	May 4, 2009

Mixed Programing combining MPI and CUDA

Related topics