pgdbg on opteron cluster

I’m running PGDBG 5.2-4 AMD-64 (Workstation, 16 CPU) on a beowulf cluster, but unfortunately I cannot seem to get very far with pgdbg in graphical mode. When trying to invoke the debugger with

pdgdb testit a b c d

where “testit” is a very simple C program that does nothing but print out its command line arguments, I get an X window popping up stating

current locale is not supported in X11, locale is set to CX locale modifiers are not supported, using defaultException in thread “main” java.lang.InternalError: Current locale is not supported
at sun.awt.motif.MWindowPeer.pSetTitle(Native Method)
at sun.awt.motif.MWindowPeer.init(MWindowPeer.java:97)
.
.
. blah blah blah

I’ve tried changing the local environment variables, but there’s no change.

Thinking that this might just have something to do with the java version that PGI uses, I downloaded the latest 1.5 jre from sun for opteron and set

export PGI_JAVA=/usr/java/jdk1.5.0_01/bin/java

but didn’t get much further than that, getting the same X window with a new java error as stated down below.

PGDBG 5.2-4 AMD-64 (Workstation, 16 CPU)
Copyright 1989-2000, The Portland Group, Inc. All Rights Reserved.
Copyright 2000-2004, STMicroelectronics, Inc. All Rights Reserved.
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
.
.
. blah blah blah



The debugger seems to work ok in text mode, but it would be nice to have a graphical interface for MPI debugging. The graphical debugger works just fine on my own workstation, but that’s a uniprocessor machine. Any suggestions?

Hi,

First, you are setting your display variable to your local machine, correct?

What do you get when you execute the “locale” command on the node in which you are invoking pgdbg?

Try doing an “unsetenv LANG” (or “unset LANG” if you are using bash/sh) before invoking pgdbg.

What happens when you try invoking PGDBG with the -motif switch? This brings up the old motif based GUI. While this may be adequate as a work around, it should not be considered a long term solution since the old GUI will no longer be available starting with the next release (6.0).

-Mark

Didn’t think to try the -motif switch. Seems to bring up a gui ok, thanks man. Now I get to modify Penguin’s mpirun to work with pgdbg. As to the other questions…

I’m logging in via SSH, so the DISPLAY should not be a problem? Other X applications run ok.

We have the garbled man pages issue on this machine, so instead of

LANG=en_US.UTF-8

I have

LANG=en_US


However, changing LANG doesn’t seem to affect anything… The results of “locale” are

bash-2.05b$ locale
LANG=en_US
LC_CTYPE=“en_US”
LC_NUMERIC=“en_US”
LC_TIME=“en_US”
LC_COLLATE=“en_US”
LC_MONETARY=“en_US”
LC_MESSAGES=“en_US”
LC_PAPER=“en_US”
LC_NAME=“en_US”
LC_ADDRESS=“en_US”
LC_TELEPHONE=“en_US”
LC_MEASUREMENT=“en_US”
LC_IDENTIFICATION=“en_US”
LC_ALL=

These change upon a new LANG setting, of course.

Hi,

So, you are logging in with “ssh -X”, correct? If not, please try different combinations of ssh log-in (ssh -X, ssh -Y, ssh -x, ssh). What version of linux are you using (kernel and distribution)? Is there any chance you can rsh/rlogin to the node and try it? I am suspecting that there may be some problems with java over your ssh connection. There have been some reported problems with running java GUIs over ssh connections (e.g., http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4374153 ;
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6184081 ;
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4511546 ).
Unfortunately, we have not been able to reproduce this here. Hopefully with more information we can get it fixed.

Thanks,

Mark

A couple of simple suggestions to help isolate the issue:

When you ssh - if you use -X -v you will get some extra debugging information. If motif is running, that is probably not the issue. (Also, you shouldn’t have to set the DISPLAY variable as it is coming back through the ssh tunnel and setting it disturbs this). I do multiple hops onto an Opteron system and am having some issues too, and am trying to track them down.

If you do you get motif back, but get an error message about the display on the java version that would be of value to know.

The OS is Scyld Beowulf release 29cz (29cz-3_Scyld 200408261137)
and the kernel version is 2.4.25-25_Scyldsmp #1 SMP Wed Aug 18 13:31:30 EDT 2004 x86_64 x86_64 x86_64 GNU/Linux

I’m able to run other java applications, e.g. matlab with the desktop enabled, and a simple example at http://java.sun.com/docs/books/tutorial/uiswing/learn/example-1dot4/HelloWorldSwing.java

The motif option does seem to work, although I’m unable to get multiple threads yet. One thing at a time, though…

How do you ssh into your cluster? Are you using “ssh -X”? I’m curious what happens when you try different methods for ssh (e.g., “ssh -X”, “ssh -x”, “ssh -Y”, “ssh”). Also are you able to rlogin/rsh into your cluster? If so, please try that so we can determine if this is an ssh issue.

Thanks,

Mark

From my linux desktop, I usually ssh in with no extra arguments, but I’ve tried -X and -Y, and there’s no difference. “-x” disables X11 forwarding, so no graphics can run in that case.

Forgot to mention that rsh isn’t available.

Do you see any error messages when you log in with “ssh -X -v” and start up pgdbg?
We’re not able to reproduce this problem here. Probably because we do not have a machine with the same kernel/distribution you are using. Officially we support the distributions listed on this web page: http://www.pgroup.com/support/install.htm#release_info
However, many of our users have gotten our products to work on other distributions.

In addition to those distributions listed, the java website ( http://www.java.com/en/download/linux_manual.jsp ) states the following:

Most testing of Java RE for Linux in the English-locale has been conducted on Red Hat 7.2, with kernel patch 2.4.9-31. Most testing in non-English locales has been conducted on Red Hat 7.1. However, Java RE has undergone limited testing on these other Linux operating systems:

  • Caldera Open Linux 3.1 (kernel 2.4.2, glibc 2.2.1)
  • Turbo Linux 7.0 (kernel 2.2.18, glivc 2.1.x)
  • SuSE Linux 7.1 (kernel 2.4, glibc 2.2.14)
  • Turbo Linux for Simplified Chinese locale

I don’t know very much about your linux distribution…I’ll contact someone else here to see they do…maybe there’s something in your linux distribution that came from a non-English locale that java needs.

Is it possible to get a login to your cluster to investigate this more? If so, drop me an email…

By the way, not that it really matters, but do you get the same thing when you execute “pgprof” ?

Thanks,

-Mark

I don’t see any thing that looks like an error message when trying pgdbg via a login session started from ssh -X -v

bash-2.05b$ NP=2 pgdbg testit a b c d
debug1: client_input_channel_open: ctype x11 rchan 4 win 65536 max 16384
debug1: client_request_x11: request from 127.0.0.1 46616
debug1: channel 3: new [x11]
debug1: confirm x11
debug1: channel 3: FORCE input drain
debug1: client_input_channel_open: ctype x11 rchan 5 win 65536 max 16384
debug1: client_request_x11: request from 127.0.0.1 46617
debug1: channel 4: new [x11]
debug1: confirm x11
debug1: client_input_channel_open: ctype x11 rchan 6 win 65536 max 16384
debug1: client_request_x11: request from 127.0.0.1 46620
debug1: channel 5: new [x11]
debug1: confirm x11
debug1: channel 5: FORCE input drain

The NP=2 is a short cut on mpi-aware systems such as the scyld operating system so I don’t have to invoke mpirun -np 2 blah blah blah. It just tells the executable to attempt 2 processes. Using mpirun results in the same error.

I’ll talk to the head sysadmin about the login account. There are some issues there, but we’ve done it before with a guy from Penguin.

Finally tracked down what CX is. “Christmas Island” of all places. Seems to be a common occurance with java programs, but no common thread of resolution.

“pgprof” results in similar error messages, although here we don’t even get a persistent X window staying open.

Hi John,

Two of us here looked at the problem. We feel it’s an issue with the Xlib installed on your system. You (or someone in charge of your cluster) should contact SCYLD to see if they have a replacement Xlib or any other information on the problem. I saw references on the web (http://mail.gnome.org/archives/gtk-list/2001-December/msg00131.html) that made it seem like it might just be a matter of rebuilding X on your system. Hopefully you don’t have to go that far. Hopefully SCYLD can provide you an Xlib with proper locale built into it.

For the time being you can use the -motif switch, but that switch is going away in the next release (6.0). Let me know if I can be of further assistance with the debugger.

-Mark