My program solve the same problem with different algorithm,the accessing memory is the same among them ,but the compute task between the different algorithm is differemt.
I got the result that the compute task less is slower than the anther from the view of the compute peak.
can somebody tell me how to find the bottleneck.