GTX560 is much slower than GTX260! matrixMul Program Test

I test the sample program matrixMul on both GTX260 and GTX560.
The result is strange!~
The speed on GTX560 is only 1/3 of that on GTX260!!!

Why!!!

matrixMul isn’t really a benchmark. The matrices being multiplied are so small that the runtime is dominated by the kernel launch overhead, which is apparently bigger on your system for the GTX 560. Are you running Windows?

You can modify the code to multiply matrixes like 1000x1000 or 5000x5000.