CUDA and OpenCL basic example to compare performance


I would like to get an example, written in both the CUDA and OpenCL implementation to make a basic comparison of performance between CUDA and OpenCL. Does anybody have such an example? For instance, an implementation of the N-Body problem or something similar … the example should just output the time of execution so that I could make some basic comparison of performance in dependence of different input problem sizes…