mpi + cuda problems

I’m trying to start a work using mpi + cuda, but I’m having some problems at the execution.

I found other thread in this forum related to the problem, but on it’s case, two source codes were used. In this case the source code is unique, só I can’t use nvcc to compile only the kernel and mpicc to compile the rest.

The command I’m using to compile:

nvcc -o mpicuda template.cu -I /usr/lib/openmpi/include -L /usr/lib/openmpi/lib -lmpi

The command I’m using to run:

mpirun -l -np 1 ./mpicuda

I receive the following error:

[ubuntu:02598] [[INVALID],INVALID] ORTE_ERROR_LOG: A system-required executable either could not be found or was not executable by this user in file ../../../../../../orte/mca/ess/singleton/ess_singleton_module.c at line 269

[ubuntu:02598] [[INVALID],INVALID] ORTE_ERROR_LOG: A system-required executable either could not be found or was not executable by this user in file ../../../../../../orte/mca/ess/singleton/ess_singleton_module.c at line 143

[ubuntu:02598] [[INVALID],INVALID] ORTE_ERROR_LOG: A system-required executable either could not be found or was not executable by this user in file ../../../orte/runtime/orte_init.c at line 132

--------------------------------------------------------------------------

It looks like orte_init failed for some reason; your parallel process is

likely to abort.  There are many reasons that a parallel process can

fail during orte_init; some of which are due to configuration or

environment problems.  This failure appears to be an internal failure;

here's some additional information (which may only be relevant to an

Open MPI developer):

orte_ess_set_name failed

  --> Returned value A system-required executable either could not be found or was not executable by this user (-127) instead of ORTE_SUCCESS

--------------------------------------------------------------------------

--------------------------------------------------------------------------

It looks like MPI_INIT failed for some reason; your parallel process is

likely to abort.  There are many reasons that a parallel process can

fail during MPI_INIT; some of which are due to configuration or environment

problems.  This failure appears to be an internal failure; here's some

additional information (which may only be relevant to an Open MPI

developer):

ompi_mpi_init: orte_init failed

  --> Returned "A system-required executable either could not be found or was not executable by this user" (-127) instead of "Success" (0)

--------------------------------------------------------------------------

*** An error occurred in MPI_Init

*** before MPI was initialized

*** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)

[ubuntu:2598] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed!

Problably it is a trivial error at the compilation commands.

I would appreciate if someone could help me.

And sorry about the bad english.

PS. This thread is copy of one that I started at a wrong place

The app probably can’t find the cuda shared libraries. If you can run ldd and all the libraries for the application are found, then use the --x option in mpirun to export your LD_LIBRARY_PATH to the new process and see what happens.

There’s no “-x” or “–x” option at the mpirun command:

mpiexec [-h or -help or --help]	# get this message

mpiexec -file filename			 # (or -f) filename contains XML job description

mpiexec [global args] [local args] executable [args]

   where global args may be

	  -l						   # line labels by MPI rank

	  -bnr						 # MPICH1 compatibility mode

	  -machinefile				 # file mapping procs to machines

	  -s <spec>					# direct stdin to "all" or 1,2 or 2-4,6 

	  -1						   # override default of trying 1st proc locally

	  -ifhn						# network interface to use locally

	  -tv						  # run procs under totalview (must be installed)

	  -tvsu						# totalview startup only

	  -gdb						 # run procs under gdb

	  -m						   # merge output lines (default with gdb)

	  -a						   # means assign this alias to the job

	  -ecfn						# output_xml_exit_codes_filename

	  -recvtimeout <integer_val>   # timeout for recvs to fail (e.g. from mpd daemon)

	  -g<local arg name>		   # global version of local arg (below)

	and local args may be

	  -n <n> or -np <n>			# number of processes to start

	  -wdir <dirname>			  # working directory to start in

	  -umask <umask>			   # umask for remote process

	  -path <dirname>			  # place to look for executables

	  -host <hostname>			 # host to start on

	  -soft <spec>				 # modifier of -n value

	  -arch <arch>				 # arch type to start on (not implemented)

	  -envall					  # pass all env vars in current environment

	  -envnone					 # pass no env vars

	  -envlist <list of env var names> # pass current values of these vars

	  -env <name> <value>		  # pass this value of this env var

mpiexec [global args] [local args] executable args : [local args] executable...

mpiexec -gdba jobid				# gdb-attach to existing jobid

mpiexec -configfile filename	   # filename contains cmd line segs as lines

  (See User Guide for more details)

You are probably mixing MPI implementations.
It looks like you compiled for OpenMPI but try to run with MPICH.

Check the output of “which mpirun” and see if it is pointing to the OpenMPI binary.

There most certainly is, in the mpirun version from OpenMPI (which is what your code looks to be built with). That output you showed is from MPICH2 mpirun. I think you have some installation or build issues to sort out…

Thanks avidday and mfatica for the help! Like mfatica said there was a problem with the implementations, I just uninstalled them all and reinstaled only the openmpi.

Sorry if it was such a trivial thing, I’m just starting my work using cuda and mpi with ubuntu!