I’ve compiled a mixed CUDA/MPI program successfully but don’t know how to run it to pass the number of processes to be created.
I’ve tried mpirun -np X exe but that didn’t seem to work.
I’ve compiled a mixed CUDA/MPI program successfully but don’t know how to run it to pass the number of processes to be created.
I’ve tried mpirun -np X exe but that didn’t seem to work.
Sorry I should have also mentioned that I compiled with nvcc -lmpi
How you run it is going to be completely down to which MPI implementation and operating system you are using. Should we guess?
openmpi 1.3.2
redhat
for openmpi you ought to be able just run
mpirun -np 1 cmd
or
orterun -np 1 cmd
or
mpiexec -np 1 cmd
and it should just work. What happens when it doesn’t? (again, should we guess?)
the error message is as follows;
*** Process received signal ***
signal : segmentation fault (11)
signal code : address not mapped (1)
Failing at address (nil)
[0] /lib64/libc.so.9 […]
[1] /usr/lib64/libcuda.so.1 […]
[2] /usr/lib64/libcuda.so.1 […]
[3] /usr/lib64/libcuda.so.1 […]
[4] /usr/lib64/libcuda.so.1 […]
[5] /usr/lib64/libcuda.so.1 […]
[6] /usr/lib64/libcuda.so.1 […]
[7] /usr/lib64/libcuda.so.1 (cuCtxCreate…) […]
[8] /opt/cuda/lib/cudart.so.2 […]
[9] /opt/cuda/lib/cudart.so.2 […]
[10] /opt/cuda/lib/cudart.so.2(cudaMalloc…) […]
[11] main […]
[12] main […]
[13] /lib64/linc.so.6 (__libc_start+0xf4 […]
[14] main(__gxx_personality_v0+0x59( […]
*** End of error message ***
The executable is called main. and there are hexadecimal numbers in the […]
All the code should do is for each process to allocate a small array on its device.
That doesn’t much look like an MPI problem to me. Put a barrier in the beginning of the main and recompile it with host debugging symbols and no optimization. Run it with mpiexec and attach a gdb session to it , break out of the barrier loop and let it fail and then do a backtrace. It will show you where you code is failing. This page gives a reasonable overview of how to debug MPI programs, if you haven’t done this sort of thing before.
I remember some bug related to mpirun that was very odd–try launching the program with the absolute path instead of the relative path.
BINGO!
Thanks for that.