Hi! I would like to make sure that a certain part of my code is being correctly translated to parallel.
Here is the original code:
for(i=0; i<bands; i++)
{
mean=0;
for(j=0; j<N; j++)
mean+=(image[(i*N)+j]);
mean/=N;
meanSpect[i]=mean;
for(j=0; j<N; j++)
image[(i*N)+j]=image[(i*N)+j]-mean;
}
This is my OpenACC parallel code:
#pragma acc parallel loop
for(i=0; i<bands; i++)
{
mean=0;
#pragma acc loop reduction(+:mean)
for(j=0; j<N; j++)
mean+=(image[(i*N)+j]);
mean/=N;
meanSpect[i]=mean;
#pragma acc loop
for(j=0; j<N; j++)
image[(i*N)+j]=image[(i*N)+j]-mean;
}
As far as I understand everything is parallelizable as there’re no data dependencies, other than the “mean” variable, so there’s the thing. I believe that the first main for will be parallalized into a gang, one vector (thread) per iteration (or i value) right?
So each vector / thread will need its own mean variable, right? How do I tell them to do so? Is it OK now or do I have to do something else?
Thank you very much!