Comparison of execution time in CPU and GPU is the CPU better than GPU in execution time ???

hello ,

in my application i would like give a comparison between implementation of the same code in CPU and in GPU

the CPU is Intel® core™ Quad CPU Q6600 @ 2.4 GHz

the GPU is Geforce GTS 250 128 cores

so when i execute the code i use the gettimeofday() function to calculate the time

but the result show that the execution in CPU is faster than in GPU !!!

even if I increase the number of database usully the time of CPU lower than GPU

but I must have the opposite : execution time in GPU less than CPU

my code is just subtraction between two tables

code in GPU

global void incrementArrayOnDevice(float *a,float *c,float res ,int N)
{
int idx = blockIdx.x
blockDim.x + threadIdx.x;
if (idx<N) res[idx]=a[idx]-c[idx];
}

code in CPU

for(int i=0;i<600;i++)
res[i]=a[i]-b[i];

so i dont what’s the problem if somebody have any idea about this result ?

is not much use all the resources Geforce card ??

Come back when you have a problem that works on millions of elements. ;) Six hundred? Of course the CPU is faster.

Note that PCI express bandwidth is often slower than the speed of DRAM access of the CPU. Meaning that when you’ve got to copy the data to the GPU and back, you’re definitely going to be slower when the GPU has little work to do.

Give the GPU some real work, something FLOP heavy. Like an FFT, or some other heavy processing. Per data element that is read from memory you’ll have to do significantly more than one FLOP. Ideally with some transcendental functions, because that is where the GPU shines. Then you will see some speedup.

Christian

Come back when you have a problem that works on millions of elements. ;) Six hundred? Of course the CPU is faster.

Note that PCI express bandwidth is often slower than the speed of DRAM access of the CPU. Meaning that when you’ve got to copy the data to the GPU and back, you’re definitely going to be slower when the GPU has little work to do.

Give the GPU some real work, something FLOP heavy. Like an FFT, or some other heavy processing. Per data element that is read from memory you’ll have to do significantly more than one FLOP. Ideally with some transcendental functions, because that is where the GPU shines. Then you will see some speedup.

Christian

As Chris said, the problem size is too small for you to benefit.

For a quick example, try the ‘nbody’ app from the SDK:

CPU

$ ./nbody -benchmark -n=30720 -cpu

30720 bodies, total time for 10 iterations: 32679.305 ms

= 0.289 billion interactions per second

= 5.776 single-precision GFLOP/s at 20 flops per interaction

GPU

$ ./nbody -benchmark

30720 bodies, total time for 10 iterations: 407.776 ms

= 23.143 billion interactions per second

= 462.861 single-precision GFLOP/s at 20 flops per interaction

Run on: Compute 1.3 CUDA device: [Tesla C1060]

As Chris said, the problem size is too small for you to benefit.

For a quick example, try the ‘nbody’ app from the SDK:

CPU

$ ./nbody -benchmark -n=30720 -cpu

30720 bodies, total time for 10 iterations: 32679.305 ms

= 0.289 billion interactions per second

= 5.776 single-precision GFLOP/s at 20 flops per interaction

GPU

$ ./nbody -benchmark

30720 bodies, total time for 10 iterations: 407.776 ms

= 23.143 billion interactions per second

= 462.861 single-precision GFLOP/s at 20 flops per interaction

Run on: Compute 1.3 CUDA device: [Tesla C1060]

thank you zeus13i

for this example

i inderstand that when the GPU have more work to do is power than CPU

thank you zeus13i

for this example

i inderstand that when the GPU have more work to do is power than CPU