Is CUDA timer trustable?

I used a lot of CUDA timers to measure the time consumption of kernels and memcpys in my code. But the results are very confusing. For the same piece of code, the time measurement by CUDA timer varies very much.

I use following codes to create, start, stop and destroy a CUDA timer:

/////////////////////////////////////////////////////////////////
CUT_SAFE_CALL(cutCreateTimer(&timer));
CUT_SAFE_CALL(cutStartTimer(timer));

    CUDA doing its job.....

CUT_SAFE_CALL(cutStopTimer(timer));
printf("%f ms ", cutGetTimerValue(timer));
CUT_SAFE_CALL(cutDeleteTimer(timer));

/////////////////////////////////////////////////////////////////

OK, now I run the following code:

/////////////////////////////////////////////
for(i=0; i< 10000; i++)
{
start timer1
copy data from host to CUDA;
stop and read timer1;

start timer2
run CUDA kernel1
stop and read timer2

start timer3
run CUDA kernel3
stop and read timer3
}
/////////////////////////////////////////////

Ok, then the results are very wierd. Basically, the sum of three timer measurements is constant. But the individual reading could be very different. For example:

t1+t2+t3 = 0.2 + 100 + 20 = 120 + 0.2 + 0.3 = 0.2 + 120 + 0.2 = … ~= 120

I have no idea what is going on there. I assume that the kernel calling is blocked, so is the CUDAMemcpy. How can I measure the time consumption in CUDA accurately?

Thanks,
Yong

OK, I solve the problem by (http://support.microsoft.com//kb/896256 ) and using cudaThreadSynchronize().