Profile time error with clGetEventProfilingInfo?

I wonder whether you can help me on this

Below is what I did, appreciate you help me to figure out what I missed here:

(1) first, when creating the command-Q, need to enable the profiling
CommandQueue = clCreateCommandQueue(cxGPUContext, cdDevice, CL_QUEUE_PROFILING_ENABLE, &ciErr1);

(2) cl_event event4profiling;
ciErr2 = clEnqueueNDRangeKernel(CommandQueue, Kernel, dim, NULL, &szGlobalWorkSize[1], &szLocalWorkSize[0], 0, NULL, &event4profiling);

(3) ciErr1 = clWaitForEvents(1, &event4profiling);

(4) cl_ulong startTime, endTime;
ciErr = clGetEventProfilingInfo(event4profiling, CL_PROFILING_COMMAND_START, sizeof(cl_ulong), &startTime, NULL);
ciErr |= clGetEventProfilingInfo(event4profiling, CL_PROFILING_COMMAND_END, sizeof(cl_ulong), &endTime, NULL);

(5) then compute the kernel execution time by endTime - startTime ( nSec )

This is what I did, and I noticed some times (most of the time) – startTime = “0” and endTime = “0”
sometimes both startTime and endTime are some random number --> such as startTime > endTime
so that execution_time = (endTime - startTime) < 0 ?

What is missed here ?