OpenACC: How to CACHE into GPU shared memory?

Hi,

My code is that of an 26 pt isotropic stencil, I want to pre-fetch some values into GPU shared memory, typically i-1:i+4 and j-1:j+4. I am not able to do this, and get a warning like:

PGF90-W-0155-Compiler failed to translate accelerator region (see -Minfo messages): multiple indices in shared memory dimension (kernel.f90: 416)

or

PGF90-W-0155-Compiler failed to translate accelerator region (see -Minfo messages): unknown shared array size (kernel.f90: 617)

when I suppress a dimension.
My code structure is as follows:

  !$ACC KERNELS                      &
  !$ACC PRESENT(p0,q0,phi,eta,roc2)
  !$ACC LOOP INDEPENDENT
  do k=k0,k1
   !$ACC LOOP INDEPENDENT
   do j=j0,j1
    !$ACC LOOP INDEPENDENT
    do i=i0,i1
     !$ACC CACHE(p0(...),q0(...))

Perhaps this is a bad idea to make the cache construct execute so many times, I want to get some idea as to how it could be used efficiently in my case.

Thank you very much,
Sayan

UPDATE:

This is working when I specify the cache construct as:

!$ACC CACHE(p0(i-1:i+4,j-1:j+4,k-1:k+1), q0(i-1:i+4,j-1:j+4,k-1:k+1))

compilation info:

        423, Cached references to size [(x+5)x6x(y+2)] block of 'q0'
             Cached references to size [(x+5)x6x(y+2)] block of 'p0'

Now I get this warning instead:

PGF90-W-0155-Compiler failed to translate accelerator region (see -Minfo messages): illegal opcode (kernel.f90: 416)

Code structure is the same as above. But the problem is that the code is terribly slow, I would need to change the loop mapping, any ideas welcome.

Hi Sayan,

PGF90-W-0155-Compiler failed to translate accelerator region (see -Minfo messages): illegal opcode (kernel.f90: 416)

This is actually a compiler error but we haven’t had any other reports of it yet. If you can send us a reproducing example that would be great, otherwise, I’ll see if I can. With the cache clause so new unfortunately there’s bound to be problems.

But the problem is that the code is terribly slow,

The error above is most likely preventing the kernel from being generated so could be the cause.

  • Mat