computation on cuda slower than on cpu

I am new in the cuda programming world. I am running a project that does some expensive computation on gpu cores or on cpu according to user input. The creators of the project claim it is faster on gpu ofcourse but they ran it on NVIDIA Fermi C2050 GPU while I am running on geforce 820 only 96 cores intel i5.
In my test the computation is faster running on cpu cores , is this possible or did I install the cuda and nvidia drivers wrong.

I am running on ubuntu 14.04 , cuda 7.0 , nvidia driver 346 . Also the screen was flickering so I installed a compiz refreshing package to fix flickering. Is this compiz package the reason of the slow performance or is it that my gpu card is so bad that cpu cores are better.

In the cuda samples folder I couldn’t find any project that compare speed of gpu with cpu , can you point me to a project that does this so I can be sure if it’s my hardware problem or their project’s problem.

Haswell E performance (peak)
3.0GHz * 8-cores * 8 DP-vector ops * 2 from FMA3 = 384 GFLOPS

Geforce 820M performance
Floating-point performance 297.6 GFLOPS

Your low end nVidia gets owned by a high end CPU - in hardware specs already.

Don’t expect too much from that chip. Also typically you don’t get peak GFLOPs on the GPU, and the actual performance heavily depends on the type of application you throw at it.


While I agree with the gist of cbuchner1’s remarks, I would point out that:

(1) No i5 consumer CPU has 8 cores
(2) When running AVX2, Haswell CPUs lower the clocks to stay within the power envelope
(3) Getting anywhere close to peak floating-point throughput from AVX2 and multiple CPU cores is no trivial task
(4) Many applications are not limited by FLOPS, but by memory throughput

That said, an i7 4790K retailing at about $340 can deliver around 220 GFLOPS double-precision peak throughput, which exceeds the peak DP throughput of all consumer GPUs except some of the Titans. For single-precision computation, GPU performance relative to CPUs is more favorable by about an order of magnitude at similar price points.

I cannot find information on a GTX 820, are we talking about the GTX 820M by any chance? That would be a very low-end compute capability 2.1 part. The hot clock is 1550 MHz, so 96 CUDA cores crank out 298 GFLOPS single precision, 25 GFLOPS double precision [if my math is right].

Thanks a lot for your replies
My gpu card is geforce 820M , it is embedded in my laptop ( nvidia optimus with intel card too).

It’s a dell inspiron i5.

So I shouldn’t try with the refresh rate compiz update thing. My ubuntu crashed after lots of testing and now I am reinstalling nvidia drivers and cuda … I will measure my gflops using glxgears and tell you the result.

Thanks again.