Performance comparison of CUDA and OpenCL

Ashkro · June 3, 2016, 3:12pm

Hi. I’m having kind of a problem when I compare kernel execution times in CUDA and OpenCL. Based on what I’ve read, CUDA should be a little bit faster than OpenCL. Meanwhile a simple add kernel (add two 1D arrays), runs faster on OpenCL. The code:

CUDA - Kernel
global void addKernel(int *c, const int *a, const int *b, int size)
{
int i = blockDim.x * blockIdx.x + threadIdx.x;
if (i < size) {
c[i] = a[i] + b[i];
}
}
CUDA - Execution
addKernel <<<62500, 1024 >>>(dev_c, dev_a, dev_b, size);

OpenCL - Kernel
__kernel void Add(__global int *a, __global int *b, __global int *c, int size) {
int i = get_global_id(0);

if (i < size) {
	c[i] = a[i] + b[i];
}

}
OpenCL - Execution
vectorSize=64000000; localWorkSize=1024
error = clEnqueueNDRangeKernel(commandQueue, kernel, 1, NULL, &vectorSize, &localWorkSize, 0, NULL, &event);

Array size in both cases is 64000000. Time is masured by built-in functions of CUDA and OpenCL.
Execution times are: CUDA - 8-9 ms, OpenCL - ~5 ms
GPU is GTX970, CUDA ver 7.5, OpenCL 1.2

Other kernels I’ve tested also run slower on CUDA. This is the simplest code that I’ve tested.
Am I doing something wrong? Or maybe the problem lies elsewhere. Does aynone know why I get such results?

Robert_Crovella · June 3, 2016, 4:01pm

Your addKernel invocation doesn’t even match your definition, so I’m pretty sure this is not the code you are running. If you want to provide complete code examples for both cases I will take a look.

Are you compiling a debug project or in debug mode (with -G)? That will slow things down and it’s not how you should make perf comparisons or analysis.

Ashkro · June 3, 2016, 4:46pm

Sorry, I’ve added an argument, to check something while I wrote this post. Its correct now.

Ran it in Release and now execution times are very close.
Thanks a lot!

Topic		Replies	Views
Timing in OpenCL vs. CUDA How to verify that I'm measuring the same thing in OpenCL and CUDA? CUDA Programming and Performance	1	7727	December 6, 2010
Why CUDA slower that OpenCL? CUDA Programming and Performance	5	1607	September 12, 2018
Same Implementation in CUDA and OpenCL but different performance, and OpenCL Faster? CUDA Programming and Performance	2	1291	October 11, 2013
Benchmark kernel execution time with CUDA and OpenCL How to ensure that identical kernels are benchm CUDA Programming and Performance	2	11905	May 4, 2011
CUDA performance vs. openCL performance CUDA Programming and Performance	7	12507	June 8, 2012
Timing compares with OpenCL & CUDA CUDA Programming and Performance	1	1001	June 25, 2012
OpenCL vs Cuda performance on same kernels CUDA Programming and Performance	13	55865	July 15, 2010
OpenCL Vs CUDA performance CUDA Programming and Performance	2	42051	November 8, 2009
CUDA and OpenCL basic example to compare performance CUDA Programming and Performance	1	2840	September 9, 2011
OpenCL performs better than CUDA CUDA Programming and Performance	1	532	March 1, 2011

Performance comparison of CUDA and OpenCL

Related topics