How to timer the SDK examples?

I tried to timer the oclVectorAdd example. I use clGetProfilingInfo, the GPU timer to record the time spent on kernel execution. Time is caculated in milliseconds. But the output is weird.
code and output are below:

cl_ulong start,end;
    cl_event event_ker_x;
    ciErr1 = clEnqueueNDRangeKernel(cqCommandQueue, ckKernel, 1, NULL, &szGlobalWorkSize, &szLocalWorkSize, 0, NULL, &event_ker_x);
    shrLog("clEnqueueNDRangeKernel (VectorAdd)...\n");
    if (ciErr1 != CL_SUCCESS)
        shrLog("Error in clEnqueueNDRangeKernel, Line %u in file %s !!!\n\n", __LINE__, __FILE__);
        Cleanup(argc, argv, EXIT_FAILURE);
    clGetEventProfilingInfo(event_ker_x, CL_PROFILING_COMMAND_START, sizeof(cl_ulong), &start, NULL);
    clGetEventProfilingInfo(event_ker_x, CL_PROFILING_COMMAND_END, sizeof(cl_ulong), &end, NULL);
    float ker_x_time= (end-start) * 1.0e-6f;
    shrLog("kernel execution time is : %f\n", ker_x_time);
clEnqueueWriteBuffer (SrcA and SrcB)...
clEnqueueNDRangeKernel (VectorAdd)...
kernel execution time is : 18446744027136.000000
clEnqueueReadBuffer (Dst)...

Maybe I need add clWaitForEvents() before retrieve the profiling info , right?