Problems with FORTRAN Accelerator and subroutines

I tried to have a kernel push data back to the host by using the mirror-reflected directives (by using the “update” clause inside the kernel), but now I understand it shouldn’t be possible.


Thanks again,

Nicola.

Hi again! After I got some good results with the CUDA FORTRAN version of the SWE1D code (and some holidays), I got back to the PGI Acclerator’s one in order to compare the performances between the two of them.

I got the PGI Accelerator version working, but only for small size arrays: if I try to use sizes of 10000* or more elements (I should be using sizes of 100000-200000 for my applications), I get “cuMemAlloc error 2” (aka “Out of Memory”).

*Actually, I tested a little bit more the code and discovered that the “memory out limit” is quite random: sometimes, it’s given by 1000 elemts and other times by 20000 elements.

Any suggestion? As the usual, I’ve uploaded the latest version of the code on MediaFire:

http://www.mediafire.com/?4dos3otya1dhvy2

In particular, the parameter NC inside the “Input.log” file determines the size of the arrays used by the code.


Thanks in advance, Nicola.