I’ve a problem using the latest WRFv3.3 code compiled with trial version of PGI 11.5 and 11.6 compiler for Tesla C1060 card on board. I’m trying to run a tutorial Jan2000 case (and other real data ones) with no luck. Everything was done step-by-step with WRF guide (netCDF compiled with PGI too). When I run wrf.exe host optimized binary version (empty ACC_DEVICE environment variable) it finishes with “SUCCESS COMPLETE WRF”. The problem arises while running accelerated one. It stucks just after output message :
Timing for processing lateral boundary for domain 1: 0.39840 elapsed seconds.
WRF TILE 1 IS 1 IE 20 JS 1 JE 20
WRF NUMBER OF TILES = 1
Here I can see PGI message (triggered by setting ACC_NOTIFY to 1) saying that acc kernel has been entered (wsm32D function, line 211). Then the wrf.exe process consumes 100% CPU and GPU time (returned by nvidia-smi utility) and no more happens. Running wrf.exe with strace returns sequential ioctl() calls. After a bit of nvidia kernel module debugging it turned out that they were related to rm_ioctl().
It’s worth to say that PGI Fortran & C examples work fine.
All of the above have been tested on two GNU/Linux distros : Debian 6.0.1a and Fedora 13 (both x86_64) with plenty of Nvidia kernel driver versions from 190.53 upward.
Have you ever encountered similar problems or have any idea how to deal with it ? If it’s needed I can post some more info, command output, etc. just drop me a note.