MM5 Segmentation fault on dual core platform

Hi! Here is my problem:

We are tring to test MM5 on different platforms. Our first configuration was Suse Linux 10.0 on AMD X2 4800+ dual core (4GB of memory), with PGI 6.0 compilers. Here our compiling settings in configure.user

FC = pgf90
FCFLAGS = -I$(LIBINCLUDE) -Mnosgimp -pc 32 -byteswapio -tp k8-64 -fastsse -Mconcur -DDEC_ALPHA -mp
CPP = /lib/cpp
CFLAGS = -O -DDEC_ALPHA -pc 32 -tp k8-64 -Mnosgimp -byteswapio -DDEC_ALPHA -Mconcur -mp
CPPFLAGS = -I$(LIBINCLUDE) -DDEC_ALPHA -m64 -mp
LDOPTIONS = -byteswapio -fastsse -Mconcur -mp -DDEC_ALPHA

CC = pgcc

After setting NCPUS=2 and unlimiting the stack size, we were able to make different kind of simulations on all cores at the same time. There was no problem about number of nests, domain size and simulation duration (we were able to test even with maxnes 5, mix=120 and mjx=120 for several meteorological months). The only problem was on very large domains, where we had to delete -mp from LDOPTIONS; this operation resolved our problems

After that, we tested MM5 on the same platform, but with Suse 10.1 and PGI 6.2. With the same compilers options we had many segmentation faults, that didn’t seem to depend on the domain size or simulation length.

After that, we tested it on AMD FX5600+ dual core (4 Gb of memory) with suse 10.2 and PGI 6.2. We had the same problems. So we were waiting for PGI 7.0 compilers. After their realase, we tested it on this platform, and the same problems occurred. We have found that setting NCPUS=1 there is no segmentation fault, even if the program works only on 1 core (as expected).

Can you help me with this problem, please?

We have also setted “MPSTKZ” to many different values and stack size to very high values (512M, 1024M, 2048M and 3072M), but there was always the same problem

Hi Gian_UNIVPM,

On Linux, the ‘limit’ command takes precedence over MPSTKZ. However, as of the 7.0 release we have added the proposed OpenMP 3.0 environment variable OMP_STACK_SIZE which you can try. Note that ‘unlimited’ does have an OS dependent limit which can be too small. So instead, try setting ‘limit stacksize 16384megabytes’ or some other large value.

Hope this helps,
Mat

Hi mkcolg!

I’ve tried in limiting stack size over ram size (4 gb) as you told, but the programg doesn’t want to start (nb: it starts, but it doesn’t prompt anything … until my patience ends and i press ctrl-c :) ). I haven’t tried OMP_STACK_SIZE, that is in 7.0, but i’ve got the same problem in 6.2, where I think there is no possibility to set this variable. Now we are trying to work with 6.2. We have avoided the problem setting the “-V6.0” flag, but we have already tried without 6.0 compiler and the program stops abnormally in multiprocessor mode (but not in 1 core mode!!!). With “-V6.0” the same simulation goes well, and it terminates normally.

Is there a bug in new compilers, or is something changed in managing multi-processor from 6.0 release?

Thanks a lot for you help

Hi Gian,

While I don’t consider this a bug yet, I would like to try and recreate the error here. Can you send a note to trs@pgroup.com (ask customer service to forward it to me) with your configure file as well as instructions on how I can obtain your data? If you do not a ftp site where you can post the data, let me know and I’ll give you instructions on uploading to pgroup.com.

Thanks,
Mat

Hi Mat!

Thank you for your reply
On next Segmentation Fault i will contact you

Thanks a lot