Timing compares with OpenCL & CUDA

franshua · June 25, 2012, 5:29am

I have implemented cuda events and opencl events to measure CPU-GPU, GPU-CPU copy and kernel execution times. The thing that bugs me the most is that my opencl implementation shows better results than cuda implementation.

For example (using events from CUDa and OpenCL documentation)
openCL:

cl_ulong start, end; clGetEventProfilingInfo(event, CL_PROFILING_COMMAND_END, sizeof(cl_ulong), &end, NULL);
clGetEventProfilingInfo(event, CL_PROFILING_COMMAND_START, sizeof(cl_ulong), &start, NULL);
float executionTimeInMilliseconds = (end - start) * 1.0e-6f;

CUDA:
cudaEvent_t start, stop;
float time;
cudaEventCreate(&start);
cudaEventCreate(&stop);
cudaEventRecord( start, 0 );
kernel<<<grid,threads>>> ( d_odata, d_idata, size_x, size_y, NUM_REPS);
cudaEventRecord( stop, 0 );
cudaEventSynchronize( stop );
cudaEventElapsedTime( &time, start, stop );
cudaEventDestroy( start );
cudaEventDestroy( stop );

for number of elements 2048
CUDA CPU-GPU 0,0165979 ms, GPU-CPU 0,091427 ms, Kernel - 0,007098 and
OpenCL CPU-GPU 0,007276 ms, GPU-CPU 0,006684 ms, Kernel - 0,011754.

I tried with bigger number of elements like 114440, 2097152 etc. and opencl still shows better performance.
Literature and articles all say that CUDA offers better performance, so I’m thinking that I’m doing something wrong, what should i check?
Already checked syncronization, calculated average values… changed kernel execution settings…

cbuchner1 · June 25, 2012, 10:44am

Are you using pinned memory in your CUDA implementation? That generally offers more (in some cases twice) the performance in copy operations.

Topic		Replies	Views
Same Implementation in CUDA and OpenCL but different performance, and OpenCL Faster? CUDA Programming and Performance	2	1224	October 11, 2013
CUDA OpenCL comparison CUDA Programming and Performance	9	3409	August 23, 2011
Timing cuda code I'm sorry for small for dÃ©ja-vu :-) CUDA Programming and Performance	12	35995	July 12, 2011
Performance comparison of CUDA and OpenCL CUDA Programming and Performance	2	1089	June 3, 2016
Timing in OpenCL vs. CUDA How to verify that I'm measuring the same thing in OpenCL and CUDA? CUDA Programming and Performance	1	7680	December 6, 2010
Why CUDA slower that OpenCL? CUDA Programming and Performance	5	1531	September 12, 2018
timing performance of kernels how ? cudaprof vs cudaEventRecord vs cutStartTimer CUDA Programming and Performance	3	5307	March 21, 2009
Wish List for next OpenCL release CUDA Programming and Performance	9	17441	September 9, 2009
Benchmark kernel execution time with CUDA and OpenCL How to ensure that identical kernels are benchm CUDA Programming and Performance	2	11840	May 4, 2011
Mesuring Kernel Performance CUDA Programming and Performance	3	1090	September 29, 2009

Timing compares with OpenCL & CUDA

Related topics