different results why?

Why gives the usage with shared memory different output?
Iam on CUDA 2.3, 3.0

[url=“Pastebin.com - #1 paste tool since 2002!”]Pastebin.com - #1 paste tool since 2002!

thank you in advance

__syncthreads();

sdata[threadIdx.x] = v[i*blockDim.x+threadIdx.x];

__syncthreads();

so simple… External Media

Yeah, I’ve forgotten an extra __syncthreads() in a for loop before. Definitely drives you crazy. :)

oh yes the hole week…