Erroneous behavior with loops inside sequential routines

Hi,

I am running the following Openacc C code with the latest PGI compiler.

########################################
void integrate_n_rk4_phasedifferences(const unsigned networkSize, unsigned iter, double *output, double *test) ;
void sweep(int max);

int main(int argc, char argv[]) {
int max;
sweep(1000);
}


void sweep(int max){
int networkSize = 4, l, m;
double test[25
10];
double output[8];

#pragma acc parallel loop independent ,collapse(1),
private(output),
copy(test[0:25*10])
for (l=0;l<max;l++){
integrate_n_rk4_phasedifferences(networkSize, l, output, test);
if(max-l-1<25){
for(m=1;m<9;m++){
test[(max-l-1)*10+m]=output[m];
}
test[(max-l-1)*10+9]=l;
test[(max-l-1)*10+0]=l;
}
}

//debug kernel
printf("\ndebug kernel begins\n");
for(l=0;l<25;l++){
for(m=0;m<10;m++)printf("%lf “,test[l*10+m]);
printf(”\n");
}
printf(“debug kernel ends\n”);

}

#pragma acc routine seq
void integrate_n_rk4_phasedifferences(const unsigned networkSize, unsigned iter, double *output, double *test) {

#pragma acc data deviceptr(output)
{
unsigned i, point_dimension = networkSize*2;
#pragma acc loop seq private(point_dimension)
for(i=1;i<point_dimension+1;i++){
//for(i=1;i<9;i++){
output_=i+(double)iter/1000000;
}
output[8]=point_dimension;
}
}

###########################################

As you can see in the function ‘integrate…’, in the for loop, when a constant value ‘9’ is used in loop condition, the results are correct in the output, whereas when using a variable ‘point_dimension+1’ for the same loop condition, the results are inconsistent and inaccurate.

This is the right output with constant ‘9’ : (first 3 lines )
999.000000 1.000999 2.000999 3.000999 4.000999 5.000999 6.000999 7.000999 8.000000 999.000000
998.000000 1.000998 2.000998 3.000998 4.000998 5.000998 6.000998 7.000998 8.000000 998.000000
997.000000 1.000997 2.000997 3.000997 4.000997 5.000997 6.000997 7.000997 8.000000 997.000000

Inaccurate output with variable ‘point_dimension + 1’ resulting in the same value ‘9’ :
999.000000 1.000996 2.000996 3.000996 4.000996 5.000996 6.000996 7.000996 8.000000 999.000000
998.000000 1.000993 2.000993 3.000993 4.000993 5.000993 6.000993 7.000993 8.000000 998.000000
997.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 8.000000 997.000000

What is strange is that even in this case, when I print the value of ‘point_dimension’ using output[8], itz value is the right value ‘8’.
It seems that a for loop called inside a sequential routine has inaccurate behavior without constant condition check.
Any help is greatly appreciated.

Thanks,
Krishna._

Hi Krishna,

For good or bad, the code seems to get correct answers for me. I’ve tried it on a variety of GPUs, platforms, and compiler versions.

Can you please post what OS and accelerator you are using as well as the compilation flags?

Thanks,
Mat

This is my output from a Tesla K40 on 64-bit Linux using PGI 15.5.

% pgcc test.c -acc -Minfo=accel -fast -o acc.out -V15.5
sweep:
     15, Generating copy(test[:])
         CUDA shared memory used for output
         Accelerator kernel generated
         18, #pragma acc loop gang /* blockIdx.x */
         21, #pragma acc loop vector(128) /* threadIdx.x */
     15, Generating Tesla code
     21, Loop is parallelizable
integrate_n_rk4_phasedifferences:
     40, Generating acc routine seq
         Generating Tesla code
     46, Loop is parallelizable
% acc.out

debug kernel begins
999.000000 1.000999 2.000999 3.000999 4.000999 5.000999 6.000999 7.000999 8.000000 999.000000
998.000000 1.000998 2.000998 3.000998 4.000998 5.000998 6.000998 7.000998 8.000000 998.000000
997.000000 1.000997 2.000997 3.000997 4.000997 5.000997 6.000997 7.000997 8.000000 997.000000
996.000000 1.000996 2.000996 3.000996 4.000996 5.000996 6.000996 7.000996 8.000000 996.000000
995.000000 1.000995 2.000995 3.000995 4.000995 5.000995 6.000995 7.000995 8.000000 995.000000
994.000000 1.000994 2.000994 3.000994 4.000994 5.000994 6.000994 7.000994 8.000000 994.000000
993.000000 1.000993 2.000993 3.000993 4.000993 5.000993 6.000993 7.000993 8.000000 993.000000
992.000000 1.000992 2.000992 3.000992 4.000992 5.000992 6.000992 7.000992 8.000000 992.000000
991.000000 1.000991 2.000991 3.000991 4.000991 5.000991 6.000991 7.000991 8.000000 991.000000
990.000000 1.000990 2.000990 3.000990 4.000990 5.000990 6.000990 7.000990 8.000000 990.000000
989.000000 1.000989 2.000989 3.000989 4.000989 5.000989 6.000989 7.000989 8.000000 989.000000
988.000000 1.000988 2.000988 3.000988 4.000988 5.000988 6.000988 7.000988 8.000000 988.000000
987.000000 1.000987 2.000987 3.000987 4.000987 5.000987 6.000987 7.000987 8.000000 987.000000
986.000000 1.000986 2.000986 3.000986 4.000986 5.000986 6.000986 7.000986 8.000000 986.000000
985.000000 1.000985 2.000985 3.000985 4.000985 5.000985 6.000985 7.000985 8.000000 985.000000
984.000000 1.000984 2.000984 3.000984 4.000984 5.000984 6.000984 7.000984 8.000000 984.000000
983.000000 1.000983 2.000983 3.000983 4.000983 5.000983 6.000983 7.000983 8.000000 983.000000
982.000000 1.000982 2.000982 3.000982 4.000982 5.000982 6.000982 7.000982 8.000000 982.000000
981.000000 1.000981 2.000981 3.000981 4.000981 5.000981 6.000981 7.000981 8.000000 981.000000
980.000000 1.000980 2.000980 3.000980 4.000980 5.000980 6.000980 7.000980 8.000000 980.000000
979.000000 1.000979 2.000979 3.000979 4.000979 5.000979 6.000979 7.000979 8.000000 979.000000
978.000000 1.000978 2.000978 3.000978 4.000978 5.000978 6.000978 7.000978 8.000000 978.000000
977.000000 1.000977 2.000977 3.000977 4.000977 5.000977 6.000977 7.000977 8.000000 977.000000
976.000000 1.000976 2.000976 3.000976 4.000976 5.000976 6.000976 7.000976 8.000000 976.000000
975.000000 1.000975 2.000975 3.000975 4.000975 5.000975 6.000975 7.000975 8.000000 975.000000
debug kernel ends

Hi Mat,
Thanks for the quick reply.
My system details are here :
OS : ubuntu 14.04 LTS
GPU : Tesla K40c

My compiler version seems to be old : pgcc --version
pgcc 14.10-0 64-bit target on x86-64 Linux -tp sandybridge

With “for(i=1;i<point_dimension+1;i++){” and the following compiler flags :
pgcc test1.c -acc
the output is

debug kernel begins
999.000000 1.000992 2.000992 3.000992 4.000992 5.000992 6.000992 7.000992  8.000000 999.000000 
998.000000 1.000996 2.000996 3.000996 4.000996 5.000996 6.000996 7.000996 8.000000 998.000000 
997.000000 1.000995 2.000995 3.000995 4.000995 5.000995 6.000995 7.000995 8.000000 997.000000 
996.000000 1.000989 2.000989 3.000989 4.000989 5.000989 6.000989 7.000989 8.000000 996.000000 
995.000000 1.000998 2.000998 3.000998 4.000998 5.000998 6.000998 7.000998 8.000000 995.000000 
....

which is incorrect.

Even with these flags, the results are incorrect :
pgcc test1.c -acc -Minfo=accel -fast -o acc.out -V14.10

999.000000 1.000999 2.000999 3.000999 4.000999 5.000999 6.000999 7.000999 8.000000 999.000000 
998.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 8.000000 998.000000 
997.000000 1.000992 2.000992 3.000992 4.000992 5.000992 6.000992 7.000992 8.000000 997.000000 
996.000000 1.000982 2.000982 3.000982 4.000982 5.000982 6.000982 7.000982 8.000000 996.000000 
995.000000 1.000989 2.000989 3.000989 4.000989 5.000989 6.000989 7.000989 8.000000 995.000000 
994.000000 1.000992 2.000992 3.000992 4.000992 5.000992 6.000992 7.000992 8.000000 994.000000 
993.000000 1.000995 2.000995 3.000995 4.000995 5.000995 6.000995 7.000995 8.000000 993.000000

It seems that the for loop inside a seq routine is not run sequentially but being intermixed with other invocations of the seq routine.

I am downloading the 15.5 version of pgcc, will report the results shortly.

It looks to be an issue with pgcc 14.10

pgcc test1.c -acc -Minfo=accel -V15.5 gives the right results while using -V14.10 does not.

Thank you so much for the help.

Best,
Krishna.