'save' doesn't save the variables sometimes.

I am trying to run CCSM3 (Community Climate System Model) and have found that variables in one of its subroutines are not retained between calls while the word ‘save’ presents.

This word is present single after the declaration of all variables in the subroutine.
The problem was solved with adding ‘save’ to the declaration of variables which caused error messages.

I have found the following in the PGI Fortran reference
(http://www.pgroup.com/doc/pgf77_ref/f77ref04.htm#Heading121)

“SAVE may be used without a list, in which case all the allowable entities within the module are saved (this has the same effect as using the -Msave command-line option).”

How does the compiler define, which entity is allowable for saving and which is not?

Details:
Debian Linux with kernel 2.6.15.
The SMP computer with 4 AMD Opteron processors.
The compiler is PGI 6.1-2, 64 bit target on x86-64 Linux.

The full problem description is here:
http://bb.cgd.ucar.edu/showthread.php?p=793#post793

Hi Wladimir,

There’s a good chance this is the same compiler bug as TPR#3755 in which a few compiler generated temporary arrays representing the subscript bounds and multipliers were not being marked as SAVE’d. This bug was was fixed in the 6.1-4 release of the compiler. The bug only occured at “-O2” or higher, so compiling the offending source file at “-O1” should also work around the problem.

Of course, this could be something different, but if you could try compiling at “-O1” or install the latest release first, I would appreciate it.

Thanks,
Mat

Hi, Mat. Thank you for the answer.

Unfortunately I could not find the description of the bug TPR #3755 in the site.

My variables are timer numbers. They are used to estimate the total time, spent by different parts of the subroutine in calculations.

The timers are implemented as 3 global arrays with 200 cells each, allowing to keep 200 timers. The

timer_start

subroutine increments by 1 the respective cell of the array, storing number of calls, and puts current system time in another array, storing the latest start time. The

timer_stop

subroutine gets the current system time again, computes the difference between two system times (recently got and stored one) and adds this difference to the respective cell of the third global array.

The timer numbers were not saved, and timer_start and timer_stop subroutines complained about invalid timer numbers.

Here is the compilation line

mpif90  -I. -I/home/veremeev/include -I/home/veremeev/case1/lib/include -I/home/veremeev/include -I. -I/home/veremeev/case1/SourceMods/src.cpl -I/home/veremeev/ccsm3_0/models/cpl/cpl6 -I/home/veremeev/ccsm3_0/models/csm_share -I/home/veremeev/ccsm3_0/models/csm_share/shr -I/home/veremeev/ccsm3_0/models/csm_share/cpl -c -r8 -i4 -Kieee -Mrecursive -Mdalign -Mextend -Mfree  -DLINUX -DPGF90 -DNO_SHR_VMATH -DLINUX /home/veremeev/case1/SourceMods/src.cpl/flux_mod.F90

The are no “-O…” switches in the compiling line and no environment variables set. Therefore it should be -O1 by default.
Adding -O1 to the compiler switches didn’t help.
Moreover I would like to avoid changing the compilation options as the model documentation states this can corrupt the result.

How can I realize if it is free to upgrade the compiler?

Thanks

Hi Wladimir,

Since it still fails at “-O1”, then it’s unlikely the same issue as TPR#3755. However, you can still try upgrading to 6.1-4. Minor upgrades such as this are always free (you can download the latest release here) and major upgrades are included with an optional subscription.

I have access to CCSM3 and will try to recreate the problem here.

  • Mat

Thank you again.
Please, note also, this error appeared when I used MPICH-2 (the latest version 1.0.3).
It has appeared with dead and with active components.

There were no such errors, when I run dead components with MPICH-1.

The timer library is shared among all components (ccsm3/models/csm_share/shr_timer_mod.F90), dead and active ones.
I also have somewhat shortened the subroutine names.
Actually they are ‘shr_timer_start’ and ‘shr_timer_stop’.

The compiler upgrade to 6.1-4 didn’t resolve the problem, the variables are still not saved.