OpenCL Vs CUDA performance

mseenu · October 22, 2009, 8:00pm

Hi,
I am comparing the performance of the MatrixMul SDK example provided in the CUDA and OpenCL SDKs. The OpenCL version is 5-6X slower after normalizing the matrix sizes. Here’s my configuration: ION, Linux 32 driver 190.29, CUDA toolkit and SDK 2.3, GPUComputing SDK 2.3a .

Is this expected, or am I doing something wrong?

thesmileman · October 25, 2009, 9:56pm

I was at Nvidia’s GPU Developer’s conference and is several of the OpenCL classes the presenter was given several questions about the performance difference between cuda and OpenCL. As you and most everyone can tell the demos are much slow in OpenCL. The presenter who I believe was on the development team was very clear that any performance difference was simply because they team hadn’t had as much time to optimize the code for OpenCL because they have been working on it for less time. He also added that internal development versions of OpenCL were identical in performance to the latest CUDA implementations. After people asked more questions about the performance he clearly looked frustrated as said again the same thing and said they are literally identical performance between the two. When you look at the fact that the OpenCL spec was written specifically with CUDA in mind and you look at the very very similar specifications it seems to me that there would be no reason one would run faster than the other once they are both fully optimized and it sounds like they already have an internal version that is just as fast as the one for CUDA.

madiyaan · November 8, 2009, 8:08pm

From some simple tests that I ran, I think the issue comes about because global/local ids/sizes are stored in per-thread global memory instead of registers.

I suppose for CUDA these ids and sizes are programmed into registers by the hardware unit that assigned blocks to SMs. Just my speculation.

Topic		Replies	Views
CUDA performance vs. openCL performance CUDA Programming and Performance	7	12501	June 8, 2012
Why CUDA slower that OpenCL? CUDA Programming and Performance	5	1607	September 12, 2018
Performance comparison of CUDA and OpenCL CUDA Programming and Performance	2	1146	June 3, 2016
Significant speed gap between CUDA and OpenCL - how to debug? CUDA Programming and Performance	3	7687	January 28, 2018
Unordinary performance gap between OpenCL and CUDA CUDA Programming and Performance	0	531	January 26, 2013
Same Implementation in CUDA and OpenCL but different performance, and OpenCL Faster? CUDA Programming and Performance	2	1290	October 11, 2013
OpenCL vs Cuda C performance - nBody sample nbody sample for Cuda C much faster than OpenCL CUDA Programming and Performance	2	6425	September 30, 2009
OpenCL runs faster than CUDA and PTX version weirdness.... CUDA Programming and Performance	2	2610	March 4, 2010
OpenCL performs better than CUDA CUDA Programming and Performance	1	532	March 1, 2011
Significant speedup of OpenCL vs CUDA CUDA Programming and Performance	23	9418	February 12, 2022

OpenCL Vs CUDA performance

Related topics