loop carried dependence

i fail to understand where is “Complex loop carried dependence” in the code below. if leave only one of the arrays everything is fine but not if both are present.

#include <stdlib.h>
#define COUNT 16384

int main(int argc, char **argv){
float *d;
float *d2;

d = (float *)malloc( COUNT * sizeof(float));
d2 = (float *)malloc( COUNT * sizeof(float));

#pragma acc region for
for(int i = 0; i < COUNT; i++) {
float sum = 0.0;
float sum2 = 0.0;
d _= sum;
d2 = sum2;
}
}

pgcc -ta=nvidia -Minfo=accel -o t.exe t.c
main:
11, No parallel kernels found, accelerator region ignored
12, Complex loop carried dependence of d prevents parallelization
Loop carried dependence of d2 prevents parallelization
Loop carried backward dependence of d2 prevents vectorization_

Hi ink,

You need to compile with “-Msafeptr” to assert that your pointers don’t overlap.

% pgcc -ta=nvidia -c test.c -Minfo=accel -V9.0-3
main:
     11, No parallel kernels found, accelerator region ignored
     12, Complex loop carried dependence of d prevents parallelization
         Loop carried dependence of d2 prevents parallelization
         Loop carried backward dependence of d2 prevents vectorization

% pgcc -ta=nvidia,time -o test.out test.c -Minfo=accel -Msafeptr -V9.0-3
main:
     11, Generating copyout(d[0:16383])
         Generating copyout(d2[0:16383])
     12, Loop is parallelizable
         Accelerator kernel generated
         12, #pragma for parallel, vector(256)
% test.out
Accelerator Kernel Timing data
test.c
  main
    11: region entered 1 time
        time(us): total=3895725 init=3895154 region=571
                  kernels=26 data=545
        w/o init: total=571 max=571 min=571 avg=571
        12: kernel launched 1 times
            grid: [64]  block: [256]
            time(us): total=26 max=26 min=26 avg=26

Hope this helps,
Mat

Hi Mat,
thanks for your help. it worked indeed.
all i need now is support for derived types and unsigned int.
thanks again