OpenACC loop with "larger steps"

Hi all,

I have the following “simple loop” in my code (note: ii+=2):
//************************************************************
#pragma acc parallel loop pcopy(v[0:v_size]) pcopyin(dia[0:dia_size], u[0:v_size])
for(int ii = 0; ii < v_size; ii+=2)
{
const S __restrict pdia = dia + (2ii);
S t0 = pdia[0];
S t1 = pdia[1];
S t2 = pdia[2];
S t3 = pdia[3];

v[ii + 0] = omega * (t0u[ii] + t1u[ii+1]);
v[ii + 1] = omega * (t2u[ii] + t3u[ii+1]);
}
//************************************************************

which fails (ERROR:
call to cuStreamSynchronize returned error 700: Illegal address during kernel execution
call to cuMemFreeHost returned error 700: Illegal address during kernel execution
)
I compiled the code for the host (without -acc), and checked with valgrind for memory leaks --> non!

I have to replace the code with (note: ++ii):
//************************************************************
#pragma acc parallel loop pcopy(v[0:v_size]) pcopyin(dia[0:dia_size], u[0:v_size])
for(int ii = 0; ii < (int)v_size/2; ++ii)
{
const S __restrict pdia = dia + (4ii);
S t0 = pdia[0];
S t1 = pdia[1];
S t2 = pdia[2];
S t3 = pdia[3];

v[2ii + 0] = omega * (t0u[2ii] + t1u[2ii+1]);
v[2
ii + 1] = omega * (t2u[2ii] + t3u[2ii+1]);
}
//************************************************************
With this version, my code works.
Since I need (in other parts of my code) loops with “larger steps” than ++ii (which are working) I wonder why this error occurs.

Had someone a similar problem and knows why this bug occurs?

Thanks in advance!
Best,
Stefan

Hi Stefan,

I think I’ll need a reproducing example. I tried recreating the issue, but it worked fine. Unfortunately, we’re having trouble with the User Forum where it’s not allowing use to post code. Until our webmaster fixes it, please send the example PGI Customer Service (trs@pgroup.com) and ask them to forward it to me.

Thanks,
Mat