mpi + cuda problems

clockmaster · June 11, 2010, 1:37am

I’m trying to start a work using mpi + cuda, but I’m having some problems at the execution.

I found other thread in this forum related to the problem, but on it’s case, two source codes were used. In this case the source code is unique, sÃ³ I can’t use nvcc to compile only the kernel and mpicc to compile the rest.

The command I’m using to compile:

nvcc -o mpicuda template.cu -I /usr/lib/openmpi/include -L /usr/lib/openmpi/lib -lmpi

The command I’m using to run:

mpirun -l -np 1 ./mpicuda

I receive the following error:

[ubuntu:02598] [[INVALID],INVALID] ORTE_ERROR_LOG: A system-required executable either could not be found or was not executable by this user in file ../../../../../../orte/mca/ess/singleton/ess_singleton_module.c at line 269

[ubuntu:02598] [[INVALID],INVALID] ORTE_ERROR_LOG: A system-required executable either could not be found or was not executable by this user in file ../../../../../../orte/mca/ess/singleton/ess_singleton_module.c at line 143

[ubuntu:02598] [[INVALID],INVALID] ORTE_ERROR_LOG: A system-required executable either could not be found or was not executable by this user in file ../../../orte/runtime/orte_init.c at line 132

--------------------------------------------------------------------------

It looks like orte_init failed for some reason; your parallel process is

likely to abort.  There are many reasons that a parallel process can

fail during orte_init; some of which are due to configuration or

environment problems.  This failure appears to be an internal failure;

here's some additional information (which may only be relevant to an

Open MPI developer):

orte_ess_set_name failed

  --> Returned value A system-required executable either could not be found or was not executable by this user (-127) instead of ORTE_SUCCESS

--------------------------------------------------------------------------

--------------------------------------------------------------------------

It looks like MPI_INIT failed for some reason; your parallel process is

likely to abort.  There are many reasons that a parallel process can

fail during MPI_INIT; some of which are due to configuration or environment

problems.  This failure appears to be an internal failure; here's some

additional information (which may only be relevant to an Open MPI

developer):

ompi_mpi_init: orte_init failed

  --> Returned "A system-required executable either could not be found or was not executable by this user" (-127) instead of "Success" (0)

--------------------------------------------------------------------------

*** An error occurred in MPI_Init

*** before MPI was initialized

*** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)

[ubuntu:2598] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed!

Problably it is a trivial error at the compilation commands.

I would appreciate if someone could help me.

And sorry about the bad english.

PS. This thread is copy of one that I started at a wrong place

avidday · June 11, 2010, 7:05am

The app probably can’t find the cuda shared libraries. If you can run ldd and all the libraries for the application are found, then use the --x option in mpirun to export your LD_LIBRARY_PATH to the new process and see what happens.

clockmaster · June 11, 2010, 2:51pm

There’s no “-x” or “–x” option at the mpirun command:

mpiexec [-h or -help or --help]	# get this message

mpiexec -file filename			 # (or -f) filename contains XML job description

mpiexec [global args] [local args] executable [args]

   where global args may be

	  -l						   # line labels by MPI rank

	  -bnr						 # MPICH1 compatibility mode

	  -machinefile				 # file mapping procs to machines

	  -s <spec>					# direct stdin to "all" or 1,2 or 2-4,6 

	  -1						   # override default of trying 1st proc locally

	  -ifhn						# network interface to use locally

	  -tv						  # run procs under totalview (must be installed)

	  -tvsu						# totalview startup only

	  -gdb						 # run procs under gdb

	  -m						   # merge output lines (default with gdb)

	  -a						   # means assign this alias to the job

	  -ecfn						# output_xml_exit_codes_filename

	  -recvtimeout <integer_val>   # timeout for recvs to fail (e.g. from mpd daemon)

	  -g<local arg name>		   # global version of local arg (below)

	and local args may be

	  -n <n> or -np <n>			# number of processes to start

	  -wdir <dirname>			  # working directory to start in

	  -umask <umask>			   # umask for remote process

	  -path <dirname>			  # place to look for executables

	  -host <hostname>			 # host to start on

	  -soft <spec>				 # modifier of -n value

	  -arch <arch>				 # arch type to start on (not implemented)

	  -envall					  # pass all env vars in current environment

	  -envnone					 # pass no env vars

	  -envlist <list of env var names> # pass current values of these vars

	  -env <name> <value>		  # pass this value of this env var

mpiexec [global args] [local args] executable args : [local args] executable...

mpiexec -gdba jobid				# gdb-attach to existing jobid

mpiexec -configfile filename	   # filename contains cmd line segs as lines

  (See User Guide for more details)

mfatica · June 11, 2010, 2:54pm

You are probably mixing MPI implementations.
It looks like you compiled for OpenMPI but try to run with MPICH.

Check the output of “which mpirun” and see if it is pointing to the OpenMPI binary.

avidday · June 11, 2010, 2:58pm

There most certainly is, in the mpirun version from OpenMPI (which is what your code looks to be built with). That output you showed is from MPICH2 mpirun. I think you have some installation or build issues to sort out…

clockmaster · June 11, 2010, 10:26pm

Thanks avidday and mfatica for the help! Like mfatica said there was a problem with the implementations, I just uninstalled them all and reinstaled only the openmpi.

Sorry if it was such a trivial thing, I’m just starting my work using cuda and mpi with ubuntu!

Topic		Replies	Views
MPI install and behaviour from nvidia-hpc-sdk nvc, nvc++ and nvfortran nvbugs	2	2434	December 10, 2020
mpi + cuda problems on mpi init CUDA Programming and Performance	0	876	June 11, 2010
How to execute a CUDA/MPI program CUDA Programming and Performance	8	3316	November 24, 2009
about mpirun + nvprof profiling Visual Profiler and nvprof	1	1230	August 26, 2019
MPI-CUDA-C++ Programing using MPI ,CUDA calls and C++ CUDA Programming and Performance	6	8652	May 11, 2009
Mpi apps cann't be executed without mpiexec Legacy PGI Compilers openmpi	2	256	December 11, 2024
MPI Error while executing I have one aplication but while execution it gives error CUDA Programming and Performance	1	1500	November 27, 2010
Error using mpiexex (or mpirun) nvc, nvc++ and nvfortran	9	106	May 13, 2026
Using nvhpc intrinsic mpicxx compiler error Visual Profiler and nvprof	1	767	October 24, 2020
Error while running an sample MPI with CUDA inside it. CUDA Programming and Performance	0	13022	July 28, 2010

mpi + cuda problems

Related topics