WRF compiler optimisation

Hi,

I noticed today on the Tips & Techniques page that there are plans to present optimisations for the compilation of the WRF Model. As this is the project I’m currently working on (with only limited experience with the PGI compilers), this of course perked my interest.

We currently have PGI compilers (5.1) for an AMD64 (dual) Opteron running Linux, and we plan to use the new WRF version V2.0.2.

Which version of WRF will PGI be using for the work on WRF, and will the optimisations be performed for PGI compilers 5.2 or 5.x ? How far off is this project from completion?

Thanks very much for any help that can be given. The fact that these compiler optimisations are provided is excellent.

Sorry for the delay on this. WRFv2 came out late in our 5.2-1 release process and exposed a bug which wasn’t fixed until release 5.2-2.

I reminded the application engineer in charge of WRF that we needed to get this done, so he sent me the configuration files which I’ve posted at

WRF: http://www.pgroup.com/resources/wrf/arch.pgi.wrf
WRFv2: http://www.pgroup.com/resources/wrf/arch.pgi.wrfv2

I still need to write-up the actual FAQ and go through my check-list but hopefully, I can have something more substantial posted shortly. In the mean time these config files should help.

Thanks,
Mat

Thanks very much Mat.

Being relatively new to this type of compilation work, this kind of help is greatly appreciated.

I’ll let you know how every thing goes when I’ve compiled and run the model.

To coincide with the release of the 5.2-4 compilers, we’ve just updated the Tips and Techniques section of this web site with the WRF Version 2 Guide http://www.pgroup.com/resources/isv.htm#WRF.

Hope it helps!

  • Mat

I’m attempting to compile WRFV2.0.2. I can successfully compile and run OMP versions of the ideal cases (em_quarter_ss and em_b_wave). Both these ideal cases run in multi-threaded versions. I have also compiled the real case (em_real), however when trying to run with OMP_NUM_THREADS=2, the executable crashes out with a segmentation fault. There is no problem with OMP_NUM_THREADS=1. I’ve set the MPSTKZ large (512M), and ‘ulimit -s unlimited’ with no success. Notably I can set the ideal cases to a domain size comparable to (and in fact larger than) that of my real domain (in terms of number of points) and the ideal cases still successfully run.

I’m running an AMD dual Opteron with Fedora Core 2 (kernel 2.6.5-1). Portland Group Compilers V5.1-6.

My configure.wrf file is below - this is the OMP option provided in the configuration file posted in the PGI WRFV2 Tips & techniques page. As the comments say, I removed the –Mipa references in FCBASEOPTS. Leaving any –Mipa options led to failure of compilations. I also added $(OMPCPP) to the POUND_DEF flags. Without this, the executable would only run on a single thread, even after setting OMP_NUM_THREADS=2, NCPUS=2.

www.theweather.com.au/people/carthur/wrf/configure.wrf

I have just checked the configure options in the latest WRFV2.0.3 tarball, and there isn’t an OpenMP option (only single-threaded or RSL).

I don’t believe it is a compiler issue, as the idealised versions compile and run ok. I’d just like to rule it out by having a second opinion on the flags I’m using.

Many thanks,
Craig Arthur

Hi Craig,

The office is closed for a few days due to the Thanksgiving holiday so I don’t have access to WRF. We’ll be back on Monday so if you don’t mind waiting I’ll see what I can determine then.

Thanks,
Mat

Hi Craig,


Unfortuntately, I have not been able to recreate the error so don’t have a good idea how to fix it. Is it possible for you characterize how is seg faulting?

I’d like you to re-build with “-g -O0 -mp” and re-run. If it still seg faults, run it again in pgdbg or gdb and determine which file and which line it seg faults at. (use the ‘where’ and ‘disasm’ commands) If it does not seg fault at -O0, then continue adding higher optimization until it does, i.e. “-O2 -g -mp”, “-fast -g -mp”, -fastsse -g -mp".

Thanks,
Mat

Hi Mat,

I started out with the basic ‘-g –O0 –mp’ flag set, and compilation failed with a long list of errors. So I stepped back to the default set (as in those in the configure.wrf I posted previously) and progressively worked back to a point which compilation was successful. The most basic set I could get down to was ‘-g –O0 –mp –byteswapio –Mfree’ (I can’t find any mention of ‘-Mfree’ in the PGF User Guide, so I’m unsure of its effect).

I ran the compiled executable in pgdbg, with pgienv omp on, and I can reach the first OMP command, which is in the subroutine SOLVE_EM.

pgdbg> step
Stopped at 0x490705, function solve_em, file solve_em.f, line 1523
 #1523:        !$OMP PARALLEL DO   &

pgdbg> step
pgserv 27022: pr_ptrace (req PTRACE_PEEKTEXT, pid 27023)
pgserv 27022: read: unable to read address 0x490728
pgserv 27022: pr_ptrace (req PTRACE_PEEKTEXT, pid 27023)
pgserv 27022: read: unable to read address 0x85f2d0
pgserv 27022: pr_ptrace (req PTRACE_PEEKTEXT, pid 27023)
pgserv 27022: read: unable to read address 0x8600e0
pgserv 27022: pr_ptrace (req PTRACE_PEEKTEXT, pid 27023)
pgserv 27022: read: unable to read address 0x85fe58
pgserv 27022: pr_ptrace (req PTRACE_PEEKTEXT, pid 27023)
pgserv 27022: read: unable to read address 0x4ca6f0
pgserv 27022: pr_ptrace (req PTRACE_PEEKTEXT, pid 27023)
pgserv 27022: read: unable to read address 0xb7a688
Stopped at 0x49072a, function solve_em, file solve_em.f, line 1526
 #1526:        DO ij = 1 , grid%num_tiles

pgdbg> step

The relevant code lines are

!$OMP PARALLEL DO   &
!$OMP PRIVATE ( ij )

   DO ij = 1 , grid%num_tiles

      CALL rk_step_prep  ( config_flags, rk_step,            &
                           u_2, v_2, w_2, t_2, ph_2, mu_2,   &
                           moist_2,                          &
                           ru, rv, rw, ww, php, alt, muu, muv,   &
                           mub, mut, phb, pb, p, al, alb,    &
                           cqu, cqv, cqw,                    &
                           msfu, msfv, msft,                 &
                           fnm, fnp, dnw, rdx, rdy,          &
                           num_3d_m,                         &
                           ids, ide, jds, jde, kds, kde,     &
                           ims, ime, jms, jme, kms, kme,     &
                           grid%i_start(ij), grid%i_end(ij), &
                           grid%j_start(ij), grid%j_end(ij), &
                           k_start, k_end                   )

   END DO
   !$OMP END PARALLEL DO

And on stepping into the DO loop, the debugger dies reporting

pgserv 27022: read: stranger PID 27023
db_set_code_brk : DiBreakpointSet fails
pgserv 27022: cont : no threads to continue

I decided it worth running an idealised case compiled with the same configure.wrf (em_quarter_ss), as it does run on 2 cpu’s. The debugger dies at the same location as in the real case, reporting the same errors. As such, I don’t think I’m actually reaching the point where em_real is seg faulting.

Hi Craig,

Sorry I should have been more clear and said to change just the “FCOPTIM” flag and leave the “FCBASEOPTS” as is. Also, “-Mfree” and “-Mfixed” override the extension (.F, .F90, .f, .f90) to indicate if the file is free or fixed form.

Since 5.1-6 pre-dates Fedora Core 2 and a lot changed with the thread library, the 5.1 version of pgdbg can not step through parallel regions. Again, I should have been more clear. Please run the application without stepping and let it run until it seg faults. Then use the “where” command to see where your at in the program and “diasm” to see what assembly instructions were being executed. Also, please run the exe outside of the debugger to ensure that it does indeed still seg fault at the lower optimization.

Since 5.1-6 does not offically support Fedora Core 2, I’d also like you to try upgrading to 5.2-4 http://www.pgroup.com/support/download_release.php. It is possible that we have an incompatabily between 5.1-6 and Fedora Core 2. Also, the debugger when through a major upgrade. Note that we upgraded your license to 5.2 but you’ll need to regenerate your license key in order for the 5.2 compilers to work beyond the 15 day evaluation.

Thanks,
Mat

Hi Mat,

I’ve gone through the steps you set out above, and the executable continues to seg fault. Below is one example of the output from the debugger when running wrf.exe compiled with “-g -O0 -mp”.

([1] New Thread)
 WRF NUMBER OF TILES FROM OMP_GET_MAX_THREADS =   2                              
 WRF NUMBER OF TILES =   2                                                       
[0] Signalled SIGSEGV at 0x5D88EB, function surface_driver, file module_surface_driver.f, line 374
5D88EB:  F3 F 11 4 8A                   movss  %xmm0,(%rdx,%rcx,4)

pgdbg [all] 0> where
surface_driver line: "module_surface_driver.f"@374 address: 0x5D88EB  
pgdbg [all] 0> disasm
5D88EB:  F3 F 11 4 8A                   movss  %xmm0,(%rdx,%rcx,4)
5D88F0:  FF 85 50 FF FF FF              incl   -176(%rbp)
5D88F6:  FF 8C 24 60 1 0 0              decl   352(%rsp)

pgdbg [all] 0> threads
0   ID   PID     STATE      SIGNAL      LOCATION
 => 0    30926   Signalled  SIGSEGV     surface_driver line: "module_surface_driver.f"@374 address: 0x5D88EB 
    1    30927   Stopped    SIGSTOP     __GI_sched_yield file: interp.c address: 0x3EE7DA4129

The catch is though, the seg fault is not consistent in where it occurs. I have found about 5 different points where execution stops, most often in the surface_driver function.

For this reason, I’m starting to suspect the compiler is not the direct source of the issue. I’ll play around some more with the debugger to see if I can glean any more information about what’s occuring.