Where to put __syncthreads()


for(n=2; n<1024; n=n*2){





    printf(" n=%i tr=%i shared[%i]=%i|shared[%i]=%i  \n",n, threadIdx.x, threadIdx.x*2,shared[threadIdx.x*2],threadIdx.x*2+1,shared[threadIdx.x*2+1]);






    printf(" n=%i tr=%i shared[%i]=%i|shared[%i]=%i  \n",n, threadIdx.x, threadIdx.x+n/2-1,shared[threadIdx.x+n/2-1],threadIdx.x+n-1,shared[threadIdx.x+n-1]);






start in a mode of emulation

and have:

indentation is a bit confusing, but as far as I can see you want to put it in the for loop, since you are manipulating shared memory

I need to find the maximal element from a array
I try to make as in “bitonic merge”

2 4 5 8 1 2 6 5 4 8 2 2 3

its simple, but not understand yet…

per one block 512 threads = 1024 numbers…
and then 1024 * some blocks,
after doing same operation, like with threads…

If you want to find the maximum, you just need to adjust the reduction sample.
Change +'s into fmaxf()'s and you’re done.

I and have made like you say :)