data copy from each threads and blocks.

a[blockID] += threadID ;

lets assume numblocks=4 , numthreads=4

then i expect that a[0]= 0+1+2+3; a[1]=0+1+2+3…
but result is totaly wrong …

what makes this problem?

can anybody help me?
thatnks in advance ^^;

As I can see you have all threads in a block modify the same variable at the same time, thus you have undefined result. You should either use interlocked operations or do it in a different way (see ‘reduce’ sample).