Speed Up Calculations

Hello Everyone,

I have a basic question regarding speedup calculation.

I have a serial application designed to run on a CPU with a quad core.
The time taken by this serial application to execute on the Quad core CPU is t1.

Then, I parallelize this application using CUDA and run it on 512 GPU cores.
The time taken by this application to execute using 512 GPU cores is t2.

Now, I want to calculate the speed up of this CUDA parallelization.

The confusion I have is, which of the following options is correct/wrong.

a) We compare the timings for 1 core of CPU Vs. 1 core of GPU.

b) We compare the timings for four cores of CPU vs 512 cores of GPU. (In this case, the speed up would be: t2/t1)

I think most of people will report the second one
In most of cases, wall-clock time matters given sufficient computational resources

Unfortunately quite often the result on 512 GPU cores is compared to the time on one CPU core, while I think your variant b) is the only fair one.

BTW: If your CPU version of the code can make use of 4 cores, it probably is not a serial application. ;-)

Thanks for your reply !

In case I have a parallel application in openMP running on four cores completing in time t1. And,
I have a serial application in C running on a quad core but not making use of four cores completing in time t2.

should the speedup be (t1/t2) ? if not, why not ?

Allow me to introduce you to <a target=’_blank’ rel=‘noopener noreferrer’ href=‘http://en.wikipedia.org/wiki/Amdahl’s_Law’>Amdahl’s Law. That, and the fact that openMP isn’t particularly efficient at parallelizing an application, are one reason why not.

Those are the software reasons. Hardware reasons could include memory bandwidth, or Intel Turbo Boost.

I’d say the speedup would be t2/t1.
Ken_g6 is citing some reasons why t2/t1 might not be 4 on a quad core, although I only partly agree on the OpenMP one: It very much depends on the specific case. For my scientific applications (which are not memory bound) I actually get very close to 4, like 3.8 or 3.9.