When I add __syncthreads() in my code, the speed is up, and when remove it , the speed is down. Why?
I have no Idea.
did you check in visualcudaprofiler?
what is the difference of cpuTime in cudavisualprofiler between used __syncthreads() and not.
:huh: