pgdbg only runs on head node

Hi -

I’ve been trying to use pgdbg (6.0) on multiprocess programs (MPI), and it invariably doesn’t work. It works perfectly when I try to run a single process on the head node. Whenever I try to run it on any node other than the head node (via the -np x or -nolocal flags), pgdbg sits for about a minute, then I get something like:

GDBG 6.0-2 x86 (Cluster, 64 CPU)
Copyright 1989-2000, The Portland Group, Inc. All Rights Reserved.
Copyright 2000-2005, STMicroelectronics, Inc. All Rights Reserved.
***Reading DWARFv2 Information.
node2: Connection refused

  • accept: client init: Bad file descriptor
    ERROR: CANNOT LOAD /home/class07/werneta/homework/week4/prob1/cgm.

I have used a machinefile to specify which nodes to use and I get the connection refused on every node I try, so I don’t think it’s an error with any one node.

Also, whenever I leave off the -nolocal flag, pgdbg runs fine until it hits the MPI_Init() function call in the source code. Then it gives me:

libnss_files.so.2 loaded by ld-linux.so.2.
node2: Connection refused

  • accept: client init: Bad file descriptor
    ERROR: New Process (PID 22148, HOST node2) ATTACH FAILED.
    ERROR: New Process (PID 22148, HOST node2) IGNORED.
    ERROR: New Process (PID 23700, HOST node3) IGNORED.
    ERROR: New Process (PID 24539, HOST node4) IGNORED.

In this case, however, the program continues to run, but only on a single node.

Thanks in advance,

Tom

Hi Tom,

How are you invoking PGDBG? It sounds like you might be trying something like “pgdbg mpirun -np 2 a.out” or “mpirun -n 2 pgdbg a.out”. You need to run it from the mpirun script using the “-dbg” flag, ex “mpirun -np 2 -dbg=pgdbg a.out”.

Note that PGDBG MPI support is only available with the PGI CDK version of MPICH.

  • Mat

Thanks for the quick reply. I’m running it as you described, i.e. “mpirun -np 4 -dbg=pgdbg a.out”.

I’m taking a course in parallel computing, so I’m not sure of the mpich version we’re using.

  • Tom

Hi Tom,

I’m not really sure then. Are you able to run your program without the debugger?

  • Mat

Mat -

Every program I’ve written runs fine without the debugger.

Tom

Tom,

How did you compile your program? What options did you use? It’s possible that you compiled with 64-bit compiler and ran with 32-bit PGDBG. From the information you gave us, you invoked 32-bit PGDBG. Also how did you set PGI environment variable?

Hongyon

Hi Tom,

In addition to answering Hongyon’s questions above, please try setting the PGRSH environment variable. This controls the kind of communication that gets established between PGDBG and each node of your program. By default, PGDBG will use rsh to establish its connection. Some clusters have rsh disabled, so you’ll need to use ssh for your connection. To set ssh, perform the following:

On csh/tcsh:

setenv PGRSH ssh

On sh/bash:

export PGRSH=ssh


-Mark

Hi -

Hongyon:
To compile a program, I use something like:

pgCC -o prob10p4 prob10p4.cpp -g -I$(HOME)/local/include/gsl -L$(HOME)/local/lib -lgslcblas -lmpich[\quote]

To set the PGI environment variable, in my .bash_profile I have:

PGI=/usr/pgi
PATH=${PGI}/linux86/6.0/bin:$PATH:$HOME/bin
export PGI PATH

As far as I know we only have 32-bit processors in the cluster, so if I had accidentally run a 64-bit compiler I would have expected much worse errors.


Mark:
I think you might have hit on the problem. I tried to rsh to one of the other nodes and got a “connection refused”. So I set the PGRSH to ssh as described, then when I ran the debugger I got

bash: line 1: /usr/pgi/linux86/6.0/bin/pgserv: No such file or directory

  • accept: client init: Bad file descriptor

I looked around on the cluster for pgserv. There was a copy of it on the master node, in /usr/pgi/linux86/6.0/bin/pgserv, where pgdbg was looking for it. However, there was not a copy of it anywhere on any of the other nodes. Could this have been a mistake in the install?

Thanks,

Tom

Hi -

I did some more digging on the cluster and found an older installation of PGI that had pgserv in the right directory on each node. I pointed PGI to that directory and it worked! Thanks for all your help.

Tom