different results why?

Why gives the usage with shared memory different output?
Iam on CUDA 2.3, 3.0

http://pastebin.org/170346

thank you in advance

__syncthreads();

sdata[threadIdx.x] = v[i*blockDim.x+threadIdx.x];

__syncthreads();

so simple… :verymad:

Yeah, I’ve forgotten an extra __syncthreads() in a for loop before. Definitely drives you crazy. :)

oh yes the hole week…