Why is clock64 so unreliable

I am try to print the time my app take when using GPU accelerations.
so I added this code.

__global__ void ndCudaBeginFrame(ndCudaSceneInfo& info)
{
	long long coreTicks = clock64();
	info.m_timeSlice = coreTicks;
	printf("t0 = %lld    ", coreTicks);
}

__global__ void ndCudaEndFrame(ndCudaSceneInfo& info, int frameCount)
{
	long long coreTicks = clock64();

	info.m_frameCount = frameCount;
	printf("t1 = %lld   diff= %lld\n", coreTicks, coreTicks - info.m_timeSlice);

	info.m_timeSlice = coreTicks - info.m_timeSlice;
}

this is called with 1 block and 1 thread.

but to my surprise, I am getting the print out.

t0 = 7172174314773 t1 = 7172203109538 diff= 7172203109538
t0 = 7172225478900 t1 = 7172229255455 diff= 3776555
t0 = 7172250624097 t1 = 7172243861841 diff= -6762256
t0 = 7172275867917 t1 = 7172269105730 diff= -6762187
t0 = 7172302289465 t1 = 7172295526428 diff= -6763037
t0 = 7172327896493 t1 = 7172331884130 diff= 3987637
t0 = 7172353841908 t1 = 7172347073734 diff= -6768174
t0 = 7172379242630 t1 = 7172372480320 diff= -6762310
t0 = 7172404525227 t1 = 7172408521560 diff= 3996333
t0 = 7172430869316 t1 = 7172424117998 diff= -6751318

to me that does no makes sense in any context.
thanks.