a strange problem about __syncthreads()

When I add __syncthreads() in my code, the speed is up, and when remove it , the speed is down. Why?

I have no Idea.

did you check in visualcudaprofiler?

what is the difference of cpuTime in cudavisualprofiler between used __syncthreads() and not.

:huh: