Shared memory error using the mpirun

In the sample that code I am trying to run using mpirun, the mpi_init gives me an “error attaching to the shared memory object as a slave; Permission denied” message.

Is this because I killed the hanging processes on the cluster nodes. Does this not clear the shared memory usage?

Why is it that ipcs does not show any shared memory segments?

Is there a way to release the shared memory and clean up so I can run the mpirun command?

Thanks
Aditya

Hi Aditya,

Is this because I killed the hanging processes on the cluster nodes. Does this not clear the shared memory usage?

Killing the process does not clear the shared memory.

Why is it that ipcs does not show any shared memory segments?

The ipcs command should show all shared memory used.

Is there a way to release the shared memory and clean up so I can run the mpirun command?

The ipcrm command will remove the shared memory segment.

As for the initial error, we’re not sure. Which MPI version are you using and how do you have it configured? Which sample code exhibits the behavior and what is the command you use to run it? What OS are you using? Which version of the PGI compilers are you using?

  • Mat

Hi Mat,
Thanks for the reply!
The version of PGI compilers I am using is the Linux/x86-64 6.2-4 for a linux 64 bit machine. The error occured while I run the mpihello code using mpirun
i.e mpirun -np 4 mpihello. I am trying to run this on a cluster.

This probably might be the reason why the fault occured:
I initially had some processes hanging on the node (since I was not able to access the nodes due to password settings). So I explicitly killed the hanging processes on the nodes. As you said this did not clear the shared memory.

So after this I have been getting the shared memory error. It seems surprising to me that I keep getting this error although ipcs does not show any shared memory segments. Do you have a clue on why this might be so and how I can clear the shared memory?

Hi Aditya,

While I don’t know specifically what’s wrong, it seems that you have a fundamental problem with your MPI installation, your cluster, or your OS. Which MPI do you have and how was it built?

Note that we do have a pre-built MPICH-1 library available for download if you want to try it: http://www.pgroup.com/support/downloads.php?release=620

  • Mat

Hi Mat,
The version I have was part of the PGI CDK software bought by my institution from Portland group.I have the mpi version 1.2.7 in this package.

Actually I did not understand why you think the installation might be wrong. Isnt this something that was supposed to happen when I killed the processes?. The only inconsistent part is that ipcs shows no shared regions. Is that why you said that installation might not be right?

Hi,
There is another thing that I noticed. Once I reboot the cluster, I can run the mpi command just once, in which case it accesses only the first node and does not run in the subsequent iterations. I think I will reinstall the mpich libraries and let you know the results.

Aditya

Hi Aditya,

The MPICH library that comes with the PGI CDK is configured ch_p4 (TCP/IP), not ch_shmem (Shared Memory) so it’s very confusing why your getting shared memory errors. Hopefully reinstalling will clear up the problem, otherwise, I’m at a loss.

  • Mat