How to execute a CUDA/MPI program

I’ve compiled a mixed CUDA/MPI program successfully but don’t know how to run it to pass the number of processes to be created.

I’ve tried mpirun -np X exe but that didn’t seem to work.

Sorry I should have also mentioned that I compiled with nvcc -lmpi

How you run it is going to be completely down to which MPI implementation and operating system you are using. Should we guess?

openmpi 1.3.2


for openmpi you ought to be able just run

mpirun -np 1 cmd


orterun -np 1 cmd


mpiexec -np 1 cmd

and it should just work. What happens when it doesn’t? (again, should we guess?)

the error message is as follows;

*** Process received signal ***

signal : segmentation fault (11)

signal code : address not mapped (1)

Failing at address (nil)

[0] /lib64/ […]

[1] /usr/lib64/ […]

[2] /usr/lib64/ […]

[3] /usr/lib64/ […]

[4] /usr/lib64/ […]

[5] /usr/lib64/ […]

[6] /usr/lib64/ […]

[7] /usr/lib64/ (cuCtxCreate…) […]

[8] /opt/cuda/lib/ […]

[9] /opt/cuda/lib/ […]

[10] /opt/cuda/lib/…) […]

[11] main […]

[12] main […]

[13] /lib64/ (__libc_start+0xf4 […]

[14] main(__gxx_personality_v0+0x59( […]

*** End of error message ***

The executable is called main. and there are hexadecimal numbers in the […]

All the code should do is for each process to allocate a small array on its device.

That doesn’t much look like an MPI problem to me. Put a barrier in the beginning of the main and recompile it with host debugging symbols and no optimization. Run it with mpiexec and attach a gdb session to it , break out of the barrier loop and let it fail and then do a backtrace. It will show you where you code is failing. This page gives a reasonable overview of how to debug MPI programs, if you haven’t done this sort of thing before.

I remember some bug related to mpirun that was very odd–try launching the program with the absolute path instead of the relative path.


Thanks for that.