OpenCL performs better than CUDA

Pushkar · February 28, 2011, 1:02pm

I have implemented MOPSO algorithm on CUDA and now I am implementing it on OpenCL… I am getting better performance in time when I execute my program on OpenCL. I dont understand why does it happen as I am executing the code on same GPU (Quadro FX 3700)…

philipjfry · February 28, 2011, 1:52pm

How similar are your implementations, are they really as similar as possible? Have you checked out the compute profiler output?

We made multiple experiments comparing C4CUDA and OpenCL performance, and have hardly noticed any difference in most cases. This is different if you make use of a specific feature, like the more generic texture concept in C for CUDA. Quite strangely, C4CUDA seems to require one more register than OpenCL. In some cases, this can lead to a better utilization (an additional block/workgroup on a multi-processor/compute unit), and an significant advantage for OpenCL.

Pushkar · February 28, 2011, 4:44pm

Thank you… that might help… Yes, implementations are same. what is compute profile output?.. how to check?.. what can be inferred fron it?.

philipjfry · February 28, 2011, 6:43pm

The compute profiler is part of the CUDA SDK and can be used to get insights how OpenCL and C4CUDA code performs. (I THINK since SDK 3.1, until 3.0 there were separate programs for profiling C4CUDA and OpenCL called CUDA Visual Profiler and OpenCL Visual Profiler.)

If you are using Linux (or MacOS, where it hardly works), you can find it in /usr/local/cuda/computeprof/bin/computeprof. Unfortunately, you will have to add /usr/local/cuda/computeprof/bin manually to the LD_LIBRARY_PATH - it will most likely crash if it is not using the bundled QT libraries, but those of your distribution. For Windows, you should find it in All Programs / NVIDIA Corporation / CUDA Toolkit.

Pushkar · March 1, 2011, 12:34am

where can we get the specification of “extra register”?.. I want to read about it more…

Topic		Replies	Views
OpenCL performs better than CUDA CUDA Programming and Performance	1	507	March 1, 2011
Same Implementation in CUDA and OpenCL but different performance, and OpenCL Faster? CUDA Programming and Performance	2	1241	October 11, 2013
Significant speedup of OpenCL vs CUDA CUDA Programming and Performance	23	8502	February 12, 2022
Significant speed gap between CUDA and OpenCL - how to debug? CUDA Programming and Performance	3	7584	January 28, 2018
Why CUDA slower that OpenCL? CUDA Programming and Performance	5	1540	September 12, 2018
CUDA performance vs. openCL performance CUDA Programming and Performance	7	12407	June 8, 2012
Cuda OpenCL comparison cuda, openCL, nvidia CUDA Programming and Performance	19	42789	November 1, 2012
OpenCL vs Cuda performance on same kernels CUDA Programming and Performance	13	55716	July 15, 2010
Performance comparison of CUDA and OpenCL CUDA Programming and Performance	2	1099	June 3, 2016
Migrating from CUDA to OpenCL - higher register consumption CUDA Programming and Performance	0	1871	June 11, 2010

OpenCL performs better than CUDA

Related topics