How to measure time in kernel function on devices?

In cuda, we can use clock() to measure the running time of some code fragment in kernrl function. If I want to implement a function that similar with clock() in OpenCL,
and can measure the time on devices. Can anybody give me some advice, thanks!

You should probably take a look at OpenCL’s built-in profiling capabilities, see e.g. the documentation for the clGetEventProfilingInfo() function.

I have used clGetEventProfilingInfo() function and event object to measure a whole kernel function running time. However, if I jsut want to measure

some lines code of the kernel funcion like follows:

__kernel void clock (…)

{

unsigned int t1 = p1;

unsigned int t2 = p2;

unsigned int start_time = 0, stop_time = 0;

for (int i = 0; i < its; i++)

{

	<i>start_time = clock();//this is cuda built_in function, how OpenCL can do that?</i>

	repeat64(t1+=t2;t2+=t1;)

	<i>stop_time = clock();//this is cuda built_in function</i>

}

out[0] = t1+t2;

    ......

}