I tried to have a kernel push data back to the host by using the mirror-reflected directives (by using the “update” clause inside the kernel), but now I understand it shouldn’t be possible.
Thanks again,
Nicola.
I tried to have a kernel push data back to the host by using the mirror-reflected directives (by using the “update” clause inside the kernel), but now I understand it shouldn’t be possible.
Thanks again,
Nicola.
Hi again! After I got some good results with the CUDA FORTRAN version of the SWE1D code (and some holidays), I got back to the PGI Acclerator’s one in order to compare the performances between the two of them.
I got the PGI Accelerator version working, but only for small size arrays: if I try to use sizes of 10000* or more elements (I should be using sizes of 100000-200000 for my applications), I get “cuMemAlloc error 2” (aka “Out of Memory”).
*Actually, I tested a little bit more the code and discovered that the “memory out limit” is quite random: sometimes, it’s given by 1000 elemts and other times by 20000 elements.
Any suggestion? As the usual, I’ve uploaded the latest version of the code on MediaFire:
http://www.mediafire.com/?4dos3otya1dhvy2
In particular, the parameter NC inside the “Input.log” file determines the size of the arrays used by the code.
Thanks in advance, Nicola.