PGI Compiler hangs when trying to build WRF 3.7.1

I am attempting to use PGI 17.10 to build WRF 3.7.1

The compiler hangs when attempting to compile certain f90 files such as:

time mpif90 -o module_ra_rrtmg_lwf.o -c -Kieee -acc -ta=nvidia,fastmath,cuda9.0,cc60 -Mcuda -fastsse -Mvect=noaltcode -Msmartalloc -Mprefetch=distance:8   -w  -Mfree -byteswapio   -I../dyn_em -I../dyn_nmm  -module /var/data0/sandbox/wrf2/WORK_DIR/WRFV3/main -I/var/data0/sandbox/wrf2/WORK_DIR/WRFV3/external/esmf_time_f90  -I/var/data0/sandbox/wrf2/WORK_DIR/WRFV3/main -I/var/data0/sandbox/wrf2/WORK_DIR/WRFV3/external/io_netcdf -I/var/data0/sandbox/wrf2/WORK_DIR/WRFV3/external/io_int -I/var/data0/sandbox/wrf2/WORK_DIR/WRFV3/frame -I/var/data0/sandbox/wrf2/WORK_DIR/WRFV3/share -I/var/data0/sandbox/wrf2/WORK_DIR/WRFV3/phys -I/var/data0/sandbox/wrf2/WORK_DIR/WRFV3/chem -I/var/data0/sandbox/wrf2/WORK_DIR/WRFV3/inc -I/var/data0/sandbox/wrf2/INSTALL_DIR/netCDF-Fortran//include  -r4 -i4  module_ra_rrtmg_lwf.f90

I’ve successfully built netcdf with pgi, and have cuilt WRF without cuda support.

Any help in debugging this issue would be appreciated, I am stuck as there are no error messages being produced, just PGI hanging.

Please let me know what other information would be useful in debugging this issue.

Hi Derek,

For good or bad, the file compiles fine for me.

Which configuration option are you using and are you on x86 or Power?

I’m running on x86 and used option #11 (PGI Accelerator, dmpar).

What you can try is add the verbose (-v) flag to you compilation and post the output. This will have the compiler driver show each of the compilation commands used and may give us a better idea on where the hang occurs.

-Mat

Hi, which modules are currently accelerated (openacc - cuda) in the current version of wrf 3.9.1 ?

I have succesfully compiled the code correctly (FCOPTIM = -Kieee -acc -ta=tesla -Mcuda -fastsse -Mvect=noaltcode -Msmartalloc -Mprefetch=distance:8 -Minfo=all -Mneginfo=all), but there is no trace in the log of parallelized loop…
Only:

 36, Loop not vectorized/parallelized: contains call
write_outbuf:
     72, Loop not vectorized/parallelized: too deeply nested
     83, Copy in and copy out of rptr in call to ext_ncd_write_field
         Loop not fused: function call before adjacent loop
         Loop not vectorized: may not be beneficial
         Generated vector simd code for the loop
         Generated a prefetch instruction for the loop
    100, Copy in and copy out of iptr in call to ext_ncd_write_field
         Loop not fused: function call before adjacent loop
         Loop not vectorized: may not be beneficial
    119, Copy in and copy out of rptr in call to ext_gr1_write_field
         Loop not fused: function call before adjacent loop
         Loop not vectorized: may not be beneficial
         Generated vector simd code for the loop
         Generated a prefetch instruction for the loop
    136, Copy in and copy out of iptr in call to ext_gr1_write_field
         Loop not fused: function call before adjacent loop
         Loop not vectorized: may not be beneficial
    155, Copy in and copy out of rptr in call to ext_int_write_field
         Loop not fused: function call before adjacent loop
         Loop not vectorized: may not be beneficial
         Generated vector simd code for the loop
         Generated a prefetch instruction for the loop
    172, Copy in and copy out of iptr in call to ext_int_write_field
         Loop not fused: function call before adjacent loop
         Loop not vectorized: may not be beneficial
stitch_outbuf_patches:
    226, Loop not vectorized/parallelized: contains call
    233, Loop not vectorized/parallelized: contains call
    254, Memory set idiom, loop replaced by call to __c_mset4
    255, Memory zero idiom, loop replaced by call to __c_mzero4
    256, Loop unrolled 3 times (completely unrolled)
         Loop not fused: different loop trip count
         Loop unrolled 8 times
    262, Loop unrolled 3 times (completely unrolled)
    270, Loop not vectorized/parallelized: potential early exits
    380, Loop not vectorized/parallelized: too deeply nested
    384, Loop unrolled 3 times (completely unrolled)
    399, Loop not vectorized/parallelized: too deeply nested
    401, Loop unrolled 3 times (completely unrolled)
    403, Conflict or overlap between rbuffer and outpatch_table%patchlist%rptr
         Loop not fused: function call before adjacent loop
         Loop not vectorized: data dependency
    413, Loop not vectorized/parallelized: too deeply nested
    415, Loop unrolled 3 times (completely unrolled)
    417, Conflict or overlap between ibuffer and outpatch_table%patchlist%iptr
         Loop not fused: function call before adjacent loop
         Loop not vectorized: data dependency
merge_patches:
    438, Loop not vectorized: data dependency
         Loop unrolled 2 times
store_patch_in_outbuf:
    475, Loop not vectorized/parallelized: potential early exits
    512, Loop unrolled 3 times (completely unrolled)
    513, Loop unrolled 3 times (completely unrolled)
    519, Loop not fused: complex flow graph
    521, Generated vector simd code for the loop
         Generated a prefetch instruction for the loop
store_patch_in_outbuf_pnc:
    576, Loop not vectorized/parallelized: potential early exits
    609, Loop unrolled 3 times (completely unrolled)
    610, Loop unrolled 3 times (completely unrolled)
    623, Loop not fused: function call before adjacent loop
    627, Loop unrolled 3 times (completely unrolled)
    628, Loop unrolled 3 times (completely unrolled)
    629, Loop unrolled 3 times (completely unrolled)
    646, Loop unrolled 3 times (completely unrolled)
    647, Loop unrolled 3 times (completely unrolled)
    648, Loop unrolled 3 times (completely unrolled)
    681, Loop not fused: complex flow graph
    683, Generated vector simd code for the loop
         Generated a prefetch instruction for the loop

Hi Gippox,

I’m aware of it being used in two modules: module_mp_wsm3.f90 and module_mp_wsm5.f90.

I believe there’s work going on to port larger portions of WRF to the GPU, but don’t know the current status.

There was a talk yesterday at GTC2018. These talks are typically published in the next month, so you might want to check back and see what the developers had to say,

https://2018gputechconf.smarteventscloud.com/connect/sessionDetail.ww?SESSION_ID=152734

-Mat

Thank you Mat.

Gippox