Speed Up Calculations

satyam_shivam · January 7, 2011, 4:13pm

Hello Everyone,

I have a basic question regarding speedup calculation.

I have a serial application designed to run on a CPU with a quad core.
The time taken by this serial application to execute on the Quad core CPU is t1.

Then, I parallelize this application using CUDA and run it on 512 GPU cores.
The time taken by this application to execute using 512 GPU cores is t2.

Now, I want to calculate the speed up of this CUDA parallelization.

The confusion I have is, which of the following options is correct/wrong.

a) We compare the timings for 1 core of CPU Vs. 1 core of GPU.

b) We compare the timings for four cores of CPU vs 512 cores of GPU. (In this case, the speed up would be: t2/t1)

athlonshi · January 7, 2011, 9:30pm

I think most of people will report the second one
In most of cases, wall-clock time matters given sufficient computational resources

tera · January 8, 2011, 12:13pm

Unfortunately quite often the result on 512 GPU cores is compared to the time on one CPU core, while I think your variant b) is the only fair one.

BTW: If your CPU version of the code can make use of 4 cores, it probably is not a serial application. ;-)

satyam_shivam · January 8, 2011, 1:28pm

Thanks for your reply !

In case I have a parallel application in openMP running on four cores completing in time t1. And,
I have a serial application in C running on a quad core but not making use of four cores completing in time t2.

should the speedup be (t1/t2) ? if not, why not ?

Ken_g6 · January 9, 2011, 8:50pm

Allow me to introduce you to <a target=‘_blank’ rel=‘noopener noreferrer’ href='http://en.wikipedia.org/wiki/Amdahl’s_Law’>Amdahl’s Law. That, and the fact that openMP isn’t particularly efficient at parallelizing an application, are one reason why not.

Those are the software reasons. Hardware reasons could include memory bandwidth, or Intel Turbo Boost.

tera · January 10, 2011, 4:14pm

I’d say the speedup would be t2/t1.
Ken_g6 is citing some reasons why t2/t1 might not be 4 on a quad core, although I only partly agree on the OpenMP one: It very much depends on the specific case. For my scientific applications (which are not memory bound) I actually get very close to 4, like 3.8 or 3.9.

Topic		Replies	Views
How Calculate Speed Up with CUDA CUDA Programming and Performance	3	3080	April 11, 2015
Measuring Application Speedup in CUDA using Amdahl's Law -Clarification Needed CUDA Programming and Performance	5	6376	September 2, 2009
Speed Up Calculation CUDA Programming and Performance	8	7799	April 7, 2016
CUDA is slower than expected. Is something missing? CUDA Programming and Performance cuda , gpu , gpu-computing , parallel-computing	4	251	July 7, 2024
Parallel computing question CUDA Programming and Performance	3	4514	June 3, 2011
Best way to report speedups? CUDA Programming and Performance	2	908	February 10, 2010
tool to calculate the speedup of an application that runs on gpu based heterogeneous computing plat CUDA Programming and Performance	8	1005	November 13, 2014
Characterization of the Speed Up on GPGPGU. 400X speed up on a Molecular Dynamics Application. CUDA Programming and Performance	5	1485	December 8, 2009
Phenomenal Speed-up! CUDA Programming and Performance	13	10628	November 13, 2009
speed up, S> no. of core ? is it possible ? CUDA Programming and Performance	5	3731	October 5, 2009

Speed Up Calculations

Related topics