In preperation of an MPI implementation of our photochemical model I converted each array whose size depends on user-defined parameters to an allocatable array. Now I am seeing a significant perfomance hit from the F77 statically allocated version. I investigated and this is what I found. If I leave all of the original F77 code intact and change one subroutine by using allocatable arrays for the local storage, the routine takes more than twice as long as the original. To be clear, the only difference in this comparison is that in one version I have local arrays allocated using a parameter statement. In the other version I use an allocatable array and allocate based on a value passed in through an argument list. The second version takes more than twice as long to complete. I also tried to use automatic arrays - just declaring the arrays using a statement like:
where isize is passed through the argument list. I get a similar performance disbenifit. Is this something that others have experienced when using allocatable arrays? Are there programming pratices or compiler implementions that can deal with this? Do you have any thoughts on this?
I am using pgf90 6.0-5 32-bit target on x86 Linux. I also ran the comparisons on an Intel compiler. The allocatable arrays produced the same slow-down but the performance of the automatic arrays was similar to that of the static arrays.