Accelerator Programming Model on NVIDIA GPUs Part 4

In the fourth article in whitepapers on your resource page you discuss data regions and updatein and updateout compiler directives. You also state that a data region can contain one of more accelerator regions and even another data region.

The example given is only a piece of code and only for Fortran - not c. Please direct me or give me a complete c code example showing the use of a data region(s) before and after and advantages and also show how a data region can contain another data region.

Also show by example the use of updatein and updateout in data regions.

Any help appreciated thank in advance.

THS 1138

Below is a small example written in C that shows the use of a data region which contains multiple compute regions. It also shows the use of the update directive for transferring data both to the device and from the device.

void main()
{

float a[100], b[100], c[100], d[100];
float s;
int i, j;
int n = 100;
int m = 100;

for (i = 0; i < n; i++) {
   a[i] = (float) i;   
   b[i] = (float) n-i;   
   c[i] = 0.0;
   d[i] = 0.0;
}

#pragma acc data region copyin(a,b) copy(c,d)
{

for (j = 0; j < m; j++) {

#pragma acc region
   {
      for (i = 0; i < n; i++) {
         c[i] += a[i] / b[i];
      }
   }

#pragma acc update host(c)

   s = 0.0;
   for (i = 0; i < n; i++) {
      s += c[i];
   }
   printf("Sum of C array:  %f\n",s);
   if (s > 500.0) {
      printf("Updating Device copy of C......\n");
      for (i = 0; i < n; i++) {
         c[i] = (float) j;
      }
   #pragma acc update device(c)
   }
   
#pragma acc region
   {
      for (i = 0; i < n; i++) {
         d[i] += c[i] - (float) j;
      }
   }

}  /*  j loop  */
}  /*  End of data region */

s = 0.0;
for (i = 0; i < n; i++) {
   s += d[i];
}

printf("Sum of D: %f\n",s);

}

If I wanted to see the performance without the data region I could take out the lines:

#pragma acc data region copyin(a,b) copy(c,d

#pragma acc update host©

#pragma acc update device©

Now with these three lines removed (and replaced with blank lines) I can see the program’s performance without any data regions by recompiling the modified code and running. I hopefully would recompie with

ta=nvida,time


command line option, then I will see performance of the program without data regions.

Any help appreciated. Thanks in advance.

THX 1138

Yes, that is correct. Compiling with -ta=nvidia,time will cause the program upon exit to print out timing information regarding the GPU.

Can you now work into the exsting code an example of the reflected clause? Also, I guess we in the c language world cannot use mirror. Is there a work around?

Thanks in advance.

THX 1138