GSRush
1
for(n=2; n<1024; n=n*2){
if(threadIdx.x<(1024/n)){
if(n==2){
if(shared[threadIdx.x*2]>shared[threadIdx.x*2+1]){
printf(" n=%i tr=%i shared[%i]=%i|shared[%i]=%i \n",n, threadIdx.x, threadIdx.x*2,shared[threadIdx.x*2],threadIdx.x*2+1,shared[threadIdx.x*2+1]);
swap(shared[threadIdx.x+n/2-1],shared[threadIdx.x+n-1]);
}
}
else{
if(shared[threadIdx.x+n/2-1]>shared[threadIdx.x+n-1]){
printf(" n=%i tr=%i shared[%i]=%i|shared[%i]=%i \n",n, threadIdx.x, threadIdx.x+n/2-1,shared[threadIdx.x+n/2-1],threadIdx.x+n-1,shared[threadIdx.x+n-1]);
swap(shared[threadIdx.x+n/2-1],shared[threadIdx.x+n-1]);
}
}}
__syncthreads();
start in a mode of emulation
and have:
DenisR
2
indentation is a bit confusing, but as far as I can see you want to put it in the for loop, since you are manipulating shared memory
GSRush
3
I need to find the maximal element from a array
I try to make as in “bitonic merge”
2 4 5 8 1 2 6 5 4 8 2 2 3
–4—8—2—6—8—2–
------8--------6--------8
----------------8---------
its simple, but not understand yet…
per one block 512 threads = 1024 numbers…
and then 1024 * some blocks,
after doing same operation, like with threads…
DenisR
4
If you want to find the maximum, you just need to adjust the reduction sample.
Change +'s into fmaxf()'s and you’re done.
GSRush
5
I and have made like you say :)