Timing in OpenCL vs. CUDA How to verify that I'm measuring the same thing in OpenCL and CUDA?

sinclair · December 6, 2010, 5:59pm

Hi everyone,

I’ve been attempting to create some microbenchmarks just to get used to using OpenCL (I’ve used CUDA before, but I’m new to OpenCL). However, it seems like I’ve not quite been able to figure out a way to accurately measure the timing (of a kernel, for example) on an apples-to-apples basis with CUDA, and I can’t seem to find any previous posts about this, so I’m posting this here to see if someone(s) can clarify this for me. The example I’m going to use here is just for an empty kernel (i.e. it just returns immediately).

My question is how to measure the “same thing” in OpenCL and CUDA? For example, if I write this empty kernel in CUDA it would be (ignoring error checking, etc., which I am doing in my actual code):

// CUDA version

__global__ void empty()

{

  return;

}

// measures the kernel execution time

void measure_empty()

{

  cudaEvent_t start, stop;	

  float elapsedTime = 0.0f;

  cudaError_t cudaErr;

... // initialization

dim3 Db = dim3(512, 1, 1);

  dim3 Dg = dim3(16,16,16);

// start timing

  cudaErr = cudaEventRecord(start, 0);

empty<<<Dg, Db>>>();

  cudaThreadSynchronize();

// stop timing: stop - start = elapsedTime

  cudaErr = cudaEventRecord(stop, 0);

  cudaErr = cudaEventSynchronize(stop);

  cudaErr = cudaEventElapsedTime(&elapsedTime, start, stop);

}

To write the same (or what I believe to be the same) code in OpenCL, I did this (again, ignore online building, creating the command queue, etc., which I am doing but am not showing to save space):

// global variables

cl_command_queue commandQueue;

cl_kernel empty;

__kernel void empty()

{

  return;

}

void measure_empty()

{

  cl_int clErr = CL_SUCCESS;

  cl_event GPUExecution;

size_t localWorkSize[3] = {512,1,1};

  size_t globalWorkSize[3] = {8192,16,16};

// call kernel, flush so it executes without waiting for more commands to be enqueued

  clErr = clEnqueueNDRangeKernel(commandQueue, empty, 3, 0, globalWorkSize, localWorkSize, 0, NULL, &GPUExecution);

  clErr = clFlush(commandQueue);

// synchronize on return

  clErr = clFinish(commandQueue);

// get timing

  cl_ulong start, end;

  clErr = clGetEventProfilingInfo(&GPUExecution, CL_PROFILING_COMMAND_END, sizeof(cl_ulong), &end, NULL);

  clErr = clGetEventProfilingInfo(&GPUExecution, CL_PROFILING_COMMAND_START, sizeof(cl_ulong), &start, NULL);

  double elapsedTime = (double)1.0e-9 * (end - start);

}

So I’ve run both of these, and I don’t seem to be able to get the same timing (by same timing, I mean something that makes sense / is reasonable). Here’s what I’m getting:

CUDA: Kernel Processing Time: 0.094912 ms

OpenCL: Kernel proccesing time: 0.00010 s

Any ideas what I should be doing differently? One thought I had is getting the additional clGetEventProfilingInfo timing for how long it’s enqueued before starting, but I’m not sure this is the right way to go. Any help would be greatly appreciated!

Thanks,

Matt

sinclair · December 6, 2010, 6:15pm

So it would appear that maybe my problem is unit conversion. Doh!

My overall question still remains though, is this the “correct” way to be measuring logically equivalent code?

Thanks,
Matt

Topic		Replies	Views
CUDA OpenCL comparison CUDA Programming and Performance	9	3400	August 23, 2011
How to measure time in kernel function on devices? CUDA Programming and Performance	2	1416	September 25, 2011
Timing compares with OpenCL & CUDA CUDA Programming and Performance	1	952	June 25, 2012
Timing cuda code I'm sorry for small for dÃ©ja-vu :-) CUDA Programming and Performance	12	35988	July 12, 2011
Benchmark kernel execution time with CUDA and OpenCL How to ensure that identical kernels are benchm CUDA Programming and Performance	2	11833	May 4, 2011
Same Implementation in CUDA and OpenCL but different performance, and OpenCL Faster? CUDA Programming and Performance	2	1217	October 11, 2013
Compare Execution Times CPU vs GPU the proper way? CUDA Programming and Performance	5	5987	September 8, 2009
time measurement discrepancy timer, clock(), profiling CUDA Programming and Performance	4	6695	April 7, 2010
how to measure the time elapsed (or no. of clock cycles) between the start and the end of a cuda thr CUDA Programming and Performance	7	2789	December 13, 2009
Performance comparison of CUDA and OpenCL CUDA Programming and Performance	2	1085	June 3, 2016

Timing in OpenCL vs. CUDA How to verify that I'm measuring the same thing in OpenCL and CUDA?

Related topics