Running on device gives different results then in device mode

Hi,

i have a really small kernel:
int shstate[NN2];
93 int swaphelp;
94 ulong index;
95 index = 6227020799;
96 for (char d = 1; d <= NN2-1 ; d++ ) shstate[d-1]=d;
97 for (char n = NN2-1; n > 0 ; n–) {
98 swaphelp = shstate[n-1];
99 shstate[n-1] = shstate[index % n];
100 shstate[index % n] = swaphelp;
101 index = index / n;
102 }
103 for (int j = 0 ; j < NN2 ; j++) debugstate[j] = shstate[j];
104
105 return;

compiling this with -deviceemu gives me the right result,
running on the 280GTX not.
If i put the index into shared it gives me the right result on GPU also.
I have noticed, with -ptx, that the index variable is thrown into local memory if i do not tell to put it into shared.

I can provide further information if it is needed, IMHO its a bug in the compiler.

Thanks for further advices,
Damian

You might have to turn on compiler options for the 280GTX compute capability 1.3 stuff, because earlier cards only did 32 bit types. Maybe try the -arch nvcc flag? Good luck.