Detailed concept for blocks and threads

Dear all,

Please any tutorial where in detail with example I’ll find threads and blocks cooperation.
For example in odd even sort algorithm there is different sorting time:

N=1024
blocks=1
threads=1
sorting time= 0.038ms

N=1024
blocks =1
thread=2
sorting time= 0.038ms

N=1024
blocks =1
thread=512
sorting time= 0.038ms

Why the sorting time is not less with higher number of threads ???

E.g. http://stackoverflow.com/questions/2392250/understanding-cuda-grid-dimensions-block-dimensions-and-threads-organization-s?answertab=active#tab-top