WRF on Windows

Hello group,
I just read the article about setting up WRF on Windows using the PGI Workstation.
http://www.pgroup.com/lit/articles/insider/v1n3a4.htm
Does anybody already have experience with the speed-up? Anyone calculated some benchmarks? Is there work going on to make it work on linux?

Thanks

Mario

Actually, WRF is usually run on Linux, I think, and the latest Insider has an article by Michael Wolfe and Craig Toepfer detailing the use of accelerators on WRF. It contains some (amazing) results and a lot of advice on restructuring code for the pragmas that I will definitely be applying to my current efforts. (I do think there might be a small error when they inline some BLAS calls: the square root got lost somewhere.)

In my GPU euphoria I didn’t notice, that the article was “just” about setting WRF up on Windows, without making use of the accelerators. Sorry! So it’s basically the WSM5 module that has been restructured for parallel processing with GPU’s.

Hi Mario,

There are two articles in the PGInsider concerning WRF. Michael and Craig’s article on porting the WRF WSM5 kernel to the PGI Accelerator Model and my article on porting WRF to Windows. While the two articles focus on different aspects, once 10.0 is released, we’ll support Nvidia GPU’s on Windows and both patches could be combined.

As for a performance, I did not do a comparison between Windows and Linux, mostly because my Windows cluster is just a small test cluster not set-up for perfomance testing. Though, one of the main reasons I worked on this project was because one of our customers is evaluating a Windows cluster and wanted to compare Windows to Linux using WRF. No promises, but I’ll ask if they will share their results.

  • Mat

Hi Matt,

I asked Craig about the BLAS inlining issue you mentioned but he says he didn’t perform any. Can you give more details?

Thanks,
Mat

Well, it was BLAS-type, quoth the article:

…and two BLAS-type subroutine calls that we manually inlined:

do k = kts, kte
   call vsrec( tvec1(its), den(its,k), ite-its+1 )
   do i = its, ite
      tvec1(i) = tvec1(i) * den0
   enddo
   call vssqrt( denfac(its,k), tvec1(its), ite-its+1 )
enddo
>
> Routine vsrec computes the reciprocal of each vector element, and vssqrt computes the square root, so this is equivalent to:
>
> ```text
    do k = kts, kte
       do i = its, ite
          denfac(i,k) = den0 / den(i,k)
       enddo
    enddo

It just looked to my eye that the sqrt call never made it into the code snippet in the article:

denfac(i,k) = sqrt(den0 / den(i,k))

Good eye! It’s just typo. The actual code does have the sqrt. We’ll get the article updated.

Thanks,
Mat