Community Edition problem with simple mpi program


After compiling a simple mpi fortran program (calculate pi) with mpif77 i get an error:

-bash-4.1$ mpirun -np 4 ./pi
request to allocate mask for invalid number; abort
: Success

Primary job terminated normally, but 1 process returned
a non-zero exit code… Per user-direction, the job has been aborted.

request to allocate mask for invalid number; abort
: Success
request to allocate mask for invalid number; abort
: Success

mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

Process name: [[31357,1],0]
Exit code: 1

This error is specific to this particular machine. The same compiled program runs on at least one other machine.

This is running on Scientific 6 (an RHEL 6 clone)

Any help would be greatly appreciated.

If the program was this

program main
use mpi
double precision  PI25DT
parameter        (PI25DT = 3.141592653589793238462643d0)
double precision  mypi, pi, h, sum, x, f, a
integer n, myid, numprocs, i, ierr
!                                function to integrate
f(a) = 4.d0 / (1.d0 + a*a)

call MPI_INIT(ierr)
call MPI_COMM_RANK(MPI_COMM_WORLD, myid, ierr)
call MPI_COMM_SIZE(MPI_COMM_WORLD, numprocs, ierr)

   if (myid .eq. 0) then
   print *, 'Enter the number of intervals: (0 quits) '
   read(*,*) n
!                                broadcast n
!                                check for quit signal
if (n .le. 0) exit
!                                calculate the interval size
h = 1.0d0/n
sum  = 0.0d0
do i = myid+1, n, numprocs
    x = h * (dble(i) - 0.5d0)
    sum = sum + f(x)
mypi = h * sum
!                                collect all the partial sums
                        MPI_SUM, 0, MPI_COMM_WORLD, ierr)
!                                node 0 prints the answer.
if (myid .eq. 0) then
print *, 'pi is ', pi, ' Error is', abs(pi - PI25DT)
call MPI_FINALIZE(ierr)

I was able to compile with mpif90 (only use pgf77/mpif77 when
it is necessary that you use f77 features not in f90 )

mpif90 -o my_pi my_pi.f -Mfree

I then ran it with

% mpirun -np 4 my_pi

Enter the number of intervals: (0 quits)
pi is 3.142425985001098 Error is 8.3333141130470523E-004
Enter the number of intervals: (0 quits)
pi is 3.141600986923125 Error is 8.3333333318336145E-006
Enter the number of intervals: (0 quits)
pi is 3.141592736923126 Error is 8.3333333122936892E-008
Enter the number of intervals: (0 quits)
pi is 3.141592654423124 Error is 8.3333073774838340E-010
Enter the number of intervals: (0 quits)
pi is 3.141592653589903 Error is 1.1013412404281553E-013
Enter the number of intervals: (0 quits)
pi is 3.141592653589759 Error is 3.4194869158454821E-014
Enter the number of intervals: (0 quits)

Program is similar, I made some minor changes a few years ago. As I mentioned this fails on one particular machine. It runs OK on 2 others that I tried.

The error -

request to allocate mask for invalid number; abort

comes from

Not sure what the problem is.

If you

ldd ./pi

on each of the platforms in your machines list, you may find
that is different on one platform than another.

If you compile with


for OpenMP (not MPI), try compiling again with


so that libnuma is not an issue.


Thanks for this.

I recompiled with -mp and -mp=nonuma. As expected -mp failed while -mp=nonuma succeeded. The issue is definitely numa and I have some work to do.

The question on the failing platform is
“is the same on the failing platform as on
the compiling platform”

So look at

ldd ./pi

and see if is a pointer to

If not, it could be that does not exist, but does.
Best to soft link to

If does not exist, PGI provides a dummy version.

When compiling -mp on one platform to run on another platform,
the situation needs to be the same on both.

You could install the PGI compilers on one platform as a “network
install”, and then add the failing platform as a new PGI compiler
host (run add_network_host and the machine is added).

The network installs handle the differences by creating
a local directory of the same name on each platform. For example " /local/username/shared_objects" would have the correct
disposition of (pointer to or a dummy version of Then every platform reconciles
correctly at runtime.