call to EventSynchronize returned error 999: Unknown

Hi,
I am getting the following error while running the code:
call to EventSynchronize returned error 999: Unknown

Please could you let me know what causes this error and how to solve it.

int n = 100000;
int m = 1000;

#pragma acc kernels for copyin(a[0:n],b[0:n]) copyout(r[0:n]) private(r[0:n])
for( int i = 0; i < 100; ++i )
{
for( int j = 0; j < m; ++j )
{
int k = (i*m+j);
r[k] = a[k] + b[k];
}
}
}

Hi Joyt,

My guess is that the privatization of “r” is the problem. By privatizing “r”, every thread gets it’s own copy and given the size of “r”, this would cause problems.

Though, having a complete example would tell us for sure what the issue is.

  • Mat

I am also getting this error while jacobi relaxation example. Following is the complete code, let me know what is the problem.

#include <stdio.h>
#include <time.h>
#include <math.h>

#define NN 4096
#define NM 4096

double A[NN][NM];
double Anew[NN][NM];

int main(int argc, char** argv)
{
const int n = NN;
const int m = NM;
const int iter_max = 1000;

const double tol = 1.0e-6;
double error = 1.0;

memset(A, 0, n * m * sizeof(double));
memset(Anew, 0, n * m * sizeof(double));

for (int j = 0; j < n; j++)
{
A[j][0] = 1.0;
Anew[j][0] = 1.0;
}

printf(“Jacobi relaxation Calculation: %d x %d mesh\n”, n, m);

double stime = time(NULL);
int iter = 0;

#pragma acc data copy(A) create(Anew)
while ( error > tol && iter < iter_max )
{
error = 0.0;

#pragma acc kernels loop gang(32) vector(16)
for( int j = 1; j < n-1; j++)
{
#pragma acc loop gang(16) vector(32)
for( int i = 1; i < m-1; i++ )
{
Anew[j] _= 0.25 * ( A[j][i+1] + A[j][i-1]

  • A[j-1] + A[j+1]);
    error = fmax( error, fabs(Anew[j] - A[j]));
    }
    }

    #pragma acc kernels
    for( int j = 1; j < n-1; j++)
    {
    #pragma acc loop gang(16) vector(32)
    for( int i = 1; i < m-1; i++ )
    {
    A[j] = Anew[j];
    }
    }

    if(iter % 100 == 0) printf("%5d, %0.6f\n", iter, error);

    iter++;
    }

    double etime = time(NULL);
    double runtime = etime - stime;

    printf(" total: %f s\n", runtime / 1000);
    }_

Hi Ayaz,

You’re hitting a known issue (TPR#18947) when there’s a mismatch between the schedule used by the auto-generated parallel reduction loop and the one set by the user. The work around is to have the compiler set the schedule (i.e. remove the “gang” and “vector” clauses) or change the product of the two vector sizes to be 256 instead of 512.

  • Mat

Thanks. Its OK now.