Transpose example performance problem

I tried the transpose example in the SDK with a 2048x2048 size matrix. The running time was 1.6 ms. I tried it with 2048x2064, it was 0.534, 3 times faster. Why is it faster with bigger size matrix?

Ok, I found partition camping, I have the same results as in the performance analysis…