Program without CUDA is faster

jabjab23 · December 15, 2008, 9:40am

I made program in C that calls a function that will do something in the GPU.
It is working. So, now I’m trying to test if it is really faster with CPU and GPU working on it.
So, I made the same program that does the same but CPU only works in it. (I mean i dont have any cuda codes here)
I put a timer on both the programs… I used clock() and subtract the final time with the initial time.

THE PROGRAM:
the program has 2 arrays
and it does multiplies the 2 consecutive elements in the array just like below…and them it to another array…

from —> a[0,1,2,3,4,5,6,7]
result ----> b[0,0,6,6,20,20,42,42]

i tried the different sizes of the array up to 700…
I’m wondering why the program without CUDA is faster…

Keldor314 · December 15, 2008, 9:54am

Sounds like you’re spending more time copying the arrays over and back than actually computing. It probably takes the CPU just as much time to send the array to the GPU as it would to just do the computation.

Quoc_Vinh · December 15, 2008, 9:56am

CUDA is just a tool.

If you doesn’t optimize the CUDA code, and(/or) your functions is very small. I don’t think that CUDA more faster than C code.

_Big_Mac · December 15, 2008, 10:30am

Could you show us your kernel code and your kernel call in host code?

Also, 700 elements is not nearly enough to saturate the GPU.

Agustin_Rubio · December 15, 2008, 3:56pm

It also depends on what GPU card are you using…

tatou1234 · December 15, 2008, 4:46pm

the biggest problem of Cuda is copy memory from host to device and vice versa. So you need run with bigger arrays. also is important to optimize like Quoc Vinh said. where is your example code?

Gregory_Diamos · December 19, 2008, 8:57pm

There is no way this is going to be faster on CUDA regardless of what you do. This computation is memory bound, you have to 2 memory loads, one memory store, and one multiply instruction for each element in the array. The time it takes to do the memory loads and store is going to be ~100-1000x the time it takes to do the multiply.

Using CUDA, you not only have to load the data from memory (making the best case time equal to that of the CPU assuming the GPU is infinitely fast), you also have to send it over PCIe, which is ~10-100x slower than operations from main memory.

You need to be doing more computation per data element than just a multiply.

Topic		Replies	Views
Performance in basic algorithm Why isn't faster? CUDA Programming and Performance	4	1711	January 9, 2009
Simple proven (timed) example code where GPU beats CPU, anyone? CUDA Programming and Performance	6	1225	November 1, 2013
Confused about GPU vs CPU speed in multiplication CUDA Programming and Performance	8	6625	February 19, 2009
Is CUDA really that fast? CUDA Programming and Performance	17	11875	September 21, 2009
CUDA trouble CUDA Programming and Performance	3	1022	March 19, 2013
Finding element on array gpu vs cpu why cpu is 3x time faster the gpu CUDA Programming and Performance	2	2887	December 18, 2010
Cuda Latency problems Slow Cuda CUDA Programming and Performance	15	14039	September 5, 2008
more time taken by CUDA rather than reducing time CUDA Programming and Performance	7	4665	November 18, 2010
GPU is Slower than CPU! CUDA Programming and Performance	1	1920	January 17, 2009
Memory Transfer CUDA Programming and Performance	7	3028	October 10, 2008

Program without CUDA is faster

Related topics