Hi,
i have a really small kernel:
int shstate[NN2];
93 int swaphelp;
94 ulong index;
95 index = 6227020799;
96 for (char d = 1; d <= NN2-1 ; d++ ) shstate[d-1]=d;
97 for (char n = NN2-1; n > 0 ; n–) {
98 swaphelp = shstate[n-1];
99 shstate[n-1] = shstate[index % n];
100 shstate[index % n] = swaphelp;
101 index = index / n;
102 }
103 for (int j = 0 ; j < NN2 ; j++) debugstate[j] = shstate[j];
104
105 return;
compiling this with -deviceemu gives me the right result,
running on the 280GTX not.
If i put the index into shared it gives me the right result on GPU also.
I have noticed, with -ptx, that the index variable is thrown into local memory if i do not tell to put it into shared.
I can provide further information if it is needed, IMHO its a bug in the compiler.
Thanks for further advices,
Damian