Hi all,
I have the following “simple loop” in my code (note: ii+=2):
//************************************************************
#pragma acc parallel loop pcopy(v[0:v_size]) pcopyin(dia[0:dia_size], u[0:v_size])
for(int ii = 0; ii < v_size; ii+=2)
{
const S __restrict pdia = dia + (2ii);
S t0 = pdia[0];
S t1 = pdia[1];
S t2 = pdia[2];
S t3 = pdia[3];
v[ii + 0] = omega * (t0u[ii] + t1u[ii+1]);
v[ii + 1] = omega * (t2u[ii] + t3u[ii+1]);
}
//************************************************************
which fails (ERROR:
call to cuStreamSynchronize returned error 700: Illegal address during kernel execution
call to cuMemFreeHost returned error 700: Illegal address during kernel execution
)
I compiled the code for the host (without -acc), and checked with valgrind for memory leaks → non!
I have to replace the code with (note: ++ii):
//************************************************************
#pragma acc parallel loop pcopy(v[0:v_size]) pcopyin(dia[0:dia_size], u[0:v_size])
for(int ii = 0; ii < (int)v_size/2; ++ii)
{
const S __restrict pdia = dia + (4ii);
S t0 = pdia[0];
S t1 = pdia[1];
S t2 = pdia[2];
S t3 = pdia[3];
v[2ii + 0] = omega * (t0u[2ii] + t1u[2ii+1]);
v[2ii + 1] = omega * (t2u[2ii] + t3u[2ii+1]);
}
//************************************************************
With this version, my code works.
Since I need (in other parts of my code) loops with “larger steps” than ++ii (which are working) I wonder why this error occurs.
Had someone a similar problem and knows why this bug occurs?
Thanks in advance!
Best,
Stefan