# What CUDA GPU can give 10000 times performance of a CPU(1core 3Ghz)?

I never used GPU for general prurpose computing. Does anybody know what CUDA GPU can give 10000 times performance of a 'Intel® Core™ i5-3330 CPU @ 3.00GHz × 4 ’ processor. Assuming, a ‘sum’ operation of a huge list of numbers (not float), coputed in single core of 3GHZ. Converting it to run in parallel in CUDA GPU. What GPU with how many cores can give 10000 times, 1000 times, 100 times performance? I am thinking to run my calculations using GPU to increase performance with a cost effective hardware. I want to know, is it worth the time to convert my programs to CUDA to gain performance.

None, unless you deliberately slow down your CPU code.

At application level, comparing high-end GPUs to high-end CPUs, typical speedups from a GPU solution are between 2x and 10x, with an average of about 5x. This assumes that the CPU code makes use of SIMD-vectorization and multi-threading and is built with full optimizations, and that the use case has enough inherent parallelism to fully utilize the GPU.

If you compare a GPU solution to scalar CPU code running on a single core, that could represent a factor of 50x or so on top of the above speedup. Typically, this is what happened when you see papers reporting a 200x speedup from using a GPU.

If you are summing a truly huge list of numbers comprising many GB of data, you are looking at a memory bound operation, and a roofline analysis should show that the projected GPU speedup is roughly the ratio of the achievable respective memory throughput rates, where GPUs typically have a 5x to 6x advantage over CPUs.

the example operation you gave is memory throughput limited, both on CPU and GPU. So the speedup can be expressed as the ratio of peak memory throughput between both platforms.

Also, if you first have to send all those numbers from the CPU’s own memory (host memory) to the GPU memory (device memory) then all your potential speedup is lost.

Christian