I wonder whether you can help me on this
Below is what I did, appreciate you help me to figure out what I missed here:
(1) first, when creating the command-Q, need to enable the profiling
CommandQueue = clCreateCommandQueue(cxGPUContext, cdDevice, CL_QUEUE_PROFILING_ENABLE, &ciErr1);
(2) cl_event event4profiling;
ciErr2 = clEnqueueNDRangeKernel(CommandQueue, Kernel, dim, NULL, &szGlobalWorkSize, &szLocalWorkSize, 0, NULL, &event4profiling);
(3) ciErr1 = clWaitForEvents(1, &event4profiling);
(4) cl_ulong startTime, endTime;
ciErr = clGetEventProfilingInfo(event4profiling, CL_PROFILING_COMMAND_START, sizeof(cl_ulong), &startTime, NULL);
ciErr |= clGetEventProfilingInfo(event4profiling, CL_PROFILING_COMMAND_END, sizeof(cl_ulong), &endTime, NULL);
(5) then compute the kernel execution time by endTime - startTime ( nSec )
This is what I did, and I noticed some times (most of the time) – startTime = “0” and endTime = “0”
sometimes both startTime and endTime are some random number --> such as startTime > endTime
so that execution_time = (endTime - startTime) < 0 ?
What is missed here ?