clock clock cycle

Hi,

I want to know how i obtain the time execution of the kernel for example for a matrix multiplication in number of cycle of clock.
I use this code to obtain result in ms but i want to obtain it in number of cycle clock
// create and start timer
unsigned int time = 0;
cutilCheckError(cutCreateTimer(&timer));
cutilCheckError(cutStartTimer(timer));

// stop and destroy timer
cutilCheckError(cutStopTimer(timer));

printf(“Processing time: %f (ms) \n”, cutGetTimerValue(timer));
cutilCheckError(cutDeleteTimer(timer));

External Image

Anyone have an idea for this problem please help :unsure:
I tested the clock SDK exemple but in this application they mesure the number of cycle clock for each block but i want to mesure the number of cycle clock for all the grid execution
Thanks