One more questions

Does pgi compiler have any solutions for the non-independent iteration ? I mean that the previous results should be the base for the latter?

Does pgi compiler have any solutions for the non-independent iteration ? I mean that the previous results should be the base for the latter?

I’m assuming you mean something like:

DO i=2,N
    ARR[i] = ARR[i-1] * constant
END DO

This code is not parallel so therefore could not be accelerated. You could run the code sequentially on a GPU but I would only recommend this if other part of the code were parallel and the cost to copy the data back out weighted the cost of running sequentially.

  • Mat

I have similar question about the loop.

g=1;
for(j=0;j<iter;j++){
    for(i=2*(g-1); i<SIZE;i=i+(2*g)){
        a[i]=a[i-g]*const;
    }
    g*=2;
}

Why does the compiler show “Accelerator restriction: invalid loop.”?
This loops works fine in sequential and it can be paralleled in CUDA.
Do I miss something?
Thanks.

Hi Vincent5552

Why does the compiler show “Accelerator restriction: invalid loop.”?

Which loop are you trying to accelerate? Do you have a full example?

it can be paralleled in CUDA.

I’m not a CUDA expert but I’m not sure how this could be done. Thread N is dependent upon thread N-g updating it’s element of “a” before it can get it’s value.

Is the CUDA code really doing something along the lines of the following?

g=1;
for(j=0;j<iter;j++){
    for(i=2*(g-1); i<SIZE;i=i+(2*g)){
        b[i]=a[i-g]*const;
    }
    for (i=0; i<SIZE;++i) {
        a[i]=b[i];
    }
    g*=2;
}
  • Mat