timer Result of matrix multiplication

Hi all,
I’ve installed cuda (toolkit and SDK) and visual studio 2005 and i want to test examples.
I open project matrix multiplication with visual studio, compile it and run it OK without any problem.
But I found that the time of a matrix multiplication is not the same for the same matrix. For example with 3*3 matrix i have many result (0.11ms, 0.10 ms …)
I want to know if any one have the same problem and what is the solution

3*3 matrix is so small, only one trhead block is used.

try large dimension, for example, 1024 * 1024 or larger

Thanks for reply
With large dimension matrix i don’t have the same result timer for many run ( for example matrix 512*512 the difference is only 1 ms (15829.870111 ms for the first run and 15828.432129 ms for the second).
As you see the difference is so small but I just want to know if the variation of result timer of the same code don’t make a problem to implement my own program and I risk to abtain erroneous result.

Sorry I obtain this result (time) when i used 1 block but if i use 16*16 the time is shorter but the time is not the same for many run!!!

if you use all the resources (all SMs), then you can measure your kernel in average sense,

then variation should be less than 1%.

However if you under-utilize the GPU, then time variation is large.