Data Clauses (output is zero when i use OpenACC)

Hello.
I wanna to reduce runtime of my code by use the OpenACC but unfortunately when i use OpenACC the output become zero.
I would appreciate the opportunity to help me in this way and use your guidances to achieve success and solve my problem.
I am looking forward to hear you soon.
Thank you kindly.
sajad.

#include <stdio.h>
#include <math.h>
#include <stdlib.h>
#include <assert.h>
#include <openacc.h>
#include<time.h>
#include <string.h>
#include <malloc.h>

#define NX 201
#define NY 101
#define NZ 201
int main(void)
{
 int  i, j, k, l, m;
static double   tr, w;
static double  dt = 9.5e-9, t;
static double cu[NZ];
static double AA[NX][NY][NZ] , CC[NX][NY][NZ] , BB[NX][NY][NZ] ;
static double A[NX][NY][NZ] , B[NX][NY][NZ] , C[NX][NY][NZ] ;
FILE *file;
file = fopen("BB-and-A.csv", "w");
t = 0.;
#pragma acc  data  copyin( tr, w,dt, t),copy(B ,A , C,AA , CC,BB,cu )
{
for (l = 1; l < 65; l++) {
#pragma acc kernels loop private(i, j,k)
for (i = 1; i < NX - 1; i++) {
			for (j = 0; j < NY - 1; j++) {
				for (k = 1; k < NZ - 1; k++) {
					A[i][j][k] = A[i][j][k]
					+ 1. * (B[i][j][k] - AA[i][j][k - 1]);

                }
        }
}
#pragma acc kernels loop private(i, j,k)
for (i = 1; i < NX - 1; i++) { /* BB */
		for (j = 1; j < NY - 1; j++) {
			for (k = 0; k < NZ - 1; k++) {
				B[i][j][k] =  B[i][j][k]
				+ 1.* (BB[i][j][k] - A[i - 1][j][k]);

					}
				}
			}
#pragma acc kernels
 for (m = 1; m < NZ - 1; m++) {
tr = t - (double)(m)*5 / 1.5e8;
if (tr <= 0.)
cu[m] = 0.;
else {
w = (tr / 0.25e-6)*(tr / 0.25e-6);
cu[m] =1666*w / (w + 1.)*exp(-tr / 2.5e-6) ;
cu[m] = 2*cu[m];
}
A[10][60][m] = -cu[m];
}
#pragma acc update self(B)
fprintf(file, "%e, %e \n", t*1e6,  -B[22][60][10] );
t = t + dt;
}
}
fclose(file);
}

The problem here is the “copyin( tr, w,dt, t)”, and in particular the “t” variable. By putting these scalars in a data clause, you’ll need to managed the synchronization between the host as device copies. Hence, when you update the variable on the host (i.e. “t = t + dt;”), you then need to update the device copy with the new value.

Also, there’s a potential race condition on “tr” since the device code will now the shared device variable instead of a private copy.

Though, the easiest thing to do is to simply not put these scalars in a data clause. By default, OpenACC privatizes scalars so there’s no need manage them yourself. In t’s case, it’s value will be passed as an argument to the CUDA kernel.

To fix your code change:

#pragma acc  data  copyin( tr, w,dt, t),copy(B ,A , C,AA , CC,BB,cu )



#pragma acc  data  copy(B ,A , C,AA , CC,BB,cu )

Note that there’s no need to put the loop indices in a private clause since they are implicitly private.

Hope this helps,
Mat

How kind you are to help me.