Unordinary performance gap between OpenCL and CUDA

Hi there,
I asked my question in Stackoverflow about 4days ago but I didn’t receive any suitable answer. I would like to save the time and refer you to the stackoverflow question at:
http://stackoverflow.com/questions/14437875/unordinary-performance-gap-between-opencl-and-cuda

Any help and idea would be appreciated!

Thanks,