CUDA and OpenCL basic example to compare performance


I’ve posted this already in the “OpenCL” forums, but haven’t received any feedback … I would like to get an example, written in both the CUDA and OpenCL implementation to make a basic comparison of performance between CUDA and OpenCL. Does anybody have such an example? For instance, an implementation of the N-Body problem or something similar … the example should just output the time of execution so that I could make some basic comparison of performance in dependence of different input problem sizes…


One example I wrote is given in my blog. There are a few papers that compare the performance of OpenCL and CUDA; see the references in the web page. Ken