PGI Accelerated and WRF 3.3 - weird lockup

Hi all,

I’ve a problem using the latest WRFv3.3 code compiled with trial version of PGI 11.5 and 11.6 compiler for Tesla C1060 card on board. I’m trying to run a tutorial Jan2000 case (and other real data ones) with no luck. Everything was done step-by-step with WRF guide (netCDF compiled with PGI too). When I run wrf.exe host optimized binary version (empty ACC_DEVICE environment variable) it finishes with “SUCCESS COMPLETE WRF”. The problem arises while running accelerated one. It stucks just after output message :


Timing for processing lateral boundary for domain 1: 0.39840 elapsed seconds.
WRF TILE 1 IS 1 IE 20 JS 1 JE 20
WRF NUMBER OF TILES = 1

Here I can see PGI message (triggered by setting ACC_NOTIFY to 1) saying that acc kernel has been entered (wsm32D function, line 211). Then the wrf.exe process consumes 100% CPU and GPU time (returned by nvidia-smi utility) and no more happens. Running wrf.exe with strace returns sequential ioctl() calls. After a bit of nvidia kernel module debugging it turned out that they were related to rm_ioctl().

It’s worth to say that PGI Fortran & C examples work fine.

All of the above have been tested on two GNU/Linux distros : Debian 6.0.1a and Fedora 13 (both x86_64) with plenty of Nvidia kernel driver versions from 190.53 upward.

Have you ever encountered similar problems or have any idea how to deal with it ? If it’s needed I can post some more info, command output, etc. just drop me a note.

Best regards

Hi lgrzegorek,

You’re encountering a known bug in the accelerator compiler logged as TPR #17932. We do apologize for this error and are currently working on a fix. Expect the fix to be available in the 11.7 release.

Best Regards,
Mat

Thank you Mat. Good to hear that.

Best regards