Hi,
One more question: I want to use a nested date region (e.g. to measure data transfer times), however, it does not “really” work, since my first data region will be ignored. There is no message like this, but I can see it as on my acc region the corresponding array is not available.
I made a short test case:
#include <stdio>
#include <stdlib>
int main (int argc, char *argv[]) {
int n = atoi(argv[1]);
int tmp = 0;
int *a = (int*) malloc(sizeof(int)*n);
int *b = (int*) malloc(sizeof(int)*n);
for (int i=0; i<n; i++) {
a[i] = 1;
b[i] = 2;
}
#pragma acc data region copyin(a[0:n-1])
{
#pragma acc data region copyin(b[0:n-1])
{
#pragma acc region
for (int i=0; i<n; i++) {
tmp += a[i];
}
for (int i=0; i< n; i++) {
tmp += b[i];
}
}
}
printf("tmp: %d\n",tmp);
return 0;
}
Here, the pogram is running, BUT what you can see from the compiler feedback below is that array “a” is only copied in line 19 (acc region) and not already in the first data region (line 15)!!! In my case, I have the problem that it is NOT automatically detected at the ACC region that there is a missing array and thus my compilation fails.
Is it a compiler bug? I use version 11.10.
$ pgcc -ta=nvidia,cc20 -Minfo=accel datareg.c
main:
17, Generating copyin(b[:n-1])
19, Generating copyin(a[0:n-1])
Generating compute capability 2.0 binary
20, Loop is parallelizable
Accelerator kernel generated
20, #pragma acc for parallel, vector(256) /* blockIdx.x threadIdx.x */
CC 2.0 : 8 registers; 1032 shared, 68 constant, 0 local memory bytes; 100% occupancy
21, Sum reduction generated for tmp
Regards, Sandra