Hello group,
I just read the article about setting up WRF on Windows using the PGI Workstation. http://www.pgroup.com/lit/articles/insider/v1n3a4.htm
Does anybody already have experience with the speed-up? Anyone calculated some benchmarks? Is there work going on to make it work on linux?
Actually, WRF is usually run on Linux, I think, and the latest Insider has an article by Michael Wolfe and Craig Toepfer detailing the use of accelerators on WRF. It contains some (amazing) results and a lot of advice on restructuring code for the pragmas that I will definitely be applying to my current efforts. (I do think there might be a small error when they inline some BLAS calls: the square root got lost somewhere.)
In my GPU euphoria I didn’t notice, that the article was “just” about setting WRF up on Windows, without making use of the accelerators. Sorry! So it’s basically the WSM5 module that has been restructured for parallel processing with GPU’s.
There are two articles in the PGInsider concerning WRF. Michael and Craig’s article on porting the WRF WSM5 kernel to the PGI Accelerator Model and my article on porting WRF to Windows. While the two articles focus on different aspects, once 10.0 is released, we’ll support Nvidia GPU’s on Windows and both patches could be combined.
As for a performance, I did not do a comparison between Windows and Linux, mostly because my Windows cluster is just a small test cluster not set-up for perfomance testing. Though, one of the main reasons I worked on this project was because one of our customers is evaluating a Windows cluster and wanted to compare Windows to Linux using WRF. No promises, but I’ll ask if they will share their results.
…and two BLAS-type subroutine calls that we manually inlined:
do k = kts, kte
call vsrec( tvec1(its), den(its,k), ite-its+1 )
do i = its, ite
tvec1(i) = tvec1(i) * den0
enddo
call vssqrt( denfac(its,k), tvec1(its), ite-its+1 )
enddo
>
> Routine vsrec computes the reciprocal of each vector element, and vssqrt computes the square root, so this is equivalent to:
>
> ```text
do k = kts, kte
do i = its, ite
denfac(i,k) = den0 / den(i,k)
enddo
enddo
It just looked to my eye that the sqrt call never made it into the code snippet in the article: