CUDA double precision performance issue

I have found that my double precision CUDA application is inferior to an identical double CPU application. Does anyone have any explanation why this happens?
Is it probably a bug in my code or it is a general “disadvantage” of CUDA? My device’s architeture: 1.3 (GTX 285 supports double precision).

You are going to have to give more information to get a reasonable answer. What exactly is the defect your are observing?

With such a vague statement to go on…my guess is that your cpu application is taking advantage of the extended double precision FPU (80 bits). See this thread .

Yes, maybe I had to be more precise. Anyway, your answer/link has covered me. Thanks.