we are running the old pgi 6.2 stuff on our cluster.
Now I’ve installed the current version (7.0-7) which is working fine.
In the ne xt step I added the MPI stuff (also downloaded from pgi)
and installed as described in the readme.
Everything seems to be fine, I can compile and also compile
parallel code. But running this code with mpirun
is not working.
If I force local execution everything seems to be normal.
ssh is also working without any problems, also the old
mpi-installation.
taiga:~ # mpirun -np 4 mpihello hydra.bgc-jena.mpg.de: Connection refused
p0_21213: p4_error: Child process exited while making connection to remote process on hydra: 0
p0_21213: (37.031250) net_send: could not write to fd=4, errno = 32
How can I figure out the reason for this behavior ?
I installed everything as root. We are using ssh only.
I downloaded mpich v1 and v2 from the homepage and both are working
WITHOUT any problems …
0 errors or warnings.
pkoch@tchita:~> mpirun -np 4 mpihello pc002.bgc-jena.mpg.de: Connection refused
p0_3937: p4_error: Child process exited while making connection to remote process on pc002: 0
p0_3937: (37.121094) net_send: could not write to fd=5, errno = 32
pkoch@tchita:~> ll ./mpihello
-rwxr-xr-x 1 pkoch AG_DV 699120 2007-08-27 14:17 ./mpihello
ssh is working:
pkoch@tchita:~> ssh pc002 w
14:20:39 up 14 days, 3:40, 0 users, load average: 1.00, 1.03, 1.05
USER TTY LOGIN@ IDLE JCPU PCPU WHAT