Amdahl's law and GUP's How can CUDA break Amdahl's law?

TheArchitect · June 27, 2009, 1:30pm

I would like to know how CUDA and multi-parrallel GUP’s can break Amdahl’s law.

“For example, if 90% of the program can be parallelized, the theoretical maximum speed-up using parallel computing would be 10x no matter how many processors are used.” [url=“File:AmdahlsLaw.svg - Wikipedia”]http://en.wikipedia.org/wiki/File:AmdahlsLaw.svg[/url]

External Media

Thanks in advance if you know the answer to this question.
~TheArchitect

External Image

Nico · June 27, 2009, 2:00pm

Ehm, you can’t break that law. It’s theoretically solid.
You answered your own question, the higher the percentage of work on the GPU, the higher the theoretical limit for the speedup.
So if you put 99% of the work on the GPU, you have a theoretical speedup limit of 100x

N.

cvnguyen · June 27, 2009, 2:16pm

Note that the percentage of parallelism may be dependent on the data size.

TheArchitect · June 27, 2009, 2:34pm

Care to elaborate? How does this effect Amdahl’s Law? Can this be caculated?

~TheArchitect

:blink:

cvnguyen · June 27, 2009, 2:46pm

That law by itself is a theoretical theorem, so it is unbreakable.

However, the input of the law, i.e. the percentage of parallelism, is manageable. Fortunately, for most GPU-friendly algorithms, this figure usually increases when we expand the data scale. Using many GPU cores to solve a small-scale problem may not help you so much, but they would quite fit large-scale problems (using the same program).

In addition, when more cores are integrated, the parallelism of the hardware may be increased (e.g. hiding memory access latency).

TheArchitect · June 27, 2009, 2:57pm

External Media Thanks for the answer Nico!

Hmm, I have seen applications featured in the CUDA applications section which claim x200 speed-up. They don’t base this on this law? And the claim that a Tesla Super Computer is x250 faster than a standard PC. Wouldn’t the max be somewhere around 19x @ 95% parallelizm for 960 cores?

~TheArchitect

:unsure:

TheArchitect · June 27, 2009, 3:03pm

External Media Interesting!

Are there any general purpose algroythms for caculating these graphs? Such as Dijkstra’s algorithm.

External Media

~TheArchitect

Nico · June 27, 2009, 3:13pm

True, it would be somewhere around 19x @ 95% parallelizm for 960 cores, which leads you to the conclusion that your estimate of the percentage on the GPU is wrong and should be at least 99.5% for a 200x speedup.

N.

TheArchitect · June 27, 2009, 3:58pm

Okay I found Gastafson’s Law for data parallizm regarding Amdahl’s Law after reading cvnguyen’s reply.

External Media

Speedup = (s + p ) / (s + p / N )

= 1 / (s + p / N )

With only 1% of the code in serial the max is 91%. Is it possible to have close to 0% with CUDA? John L. Gustafson acheived close to this:

http://www.scl.ameslab.gov/Publications/Gu…aw/Amdahls.html

What is the best that can be expected with CUDA? I have read other posts claiming around 80x for CUDA.

http://forums.nvidia.com/index.php?showtopic=79694

:mellow:

x248 · June 27, 2009, 4:36pm

if you consider Monte Carlo , then with 1000 of cores
you will have roughly X1000 External Image , specially if the time to compute a path is “big”…(the the time to
manage the threads are negligeable).

it is not against the theorem :"> , it is only that the pourcentage for some algorithms can be very close to 1.
for 10 millions of path it is 99.9999% can be parralelized. :rolleyes:

cvnguyen · June 27, 2009, 6:43pm

Be vigilant! Those speed-up figures (usually hundreds X) claimed by CUDA programmers are the comparison between the GPU-based implementation and the corresponding CPU-based implementation. Some guys even cheated by running programs on state-of-the-art GPU cards (or even on multi-device systems) and comparing with obsolete CPUs. Some others compared well-optimized GPU programs and hastily-designed CPU programs.

The speed-up ratio mentioned in Amdahl’s law (as well as in other similar laws) as desired is the comparison between running the SAME PROGRAM on a single core/node and running on multi-core/multi-node system, all of which must have the same architecture.

hpux735 · July 9, 2009, 2:16am

Yes, it’s usually called "<a target=‘_blank’ rel=‘noopener noreferrer’ href='“http://en.wikipedia.org/wiki/Gustafson’s_law"'>Gustafson’s law.” It relates specifically to how time scales with parallel processing while increasing the data size. Examples of this are image processing, monte-carlo, etc.

Topic		Replies	Views
Amdahl's Law for GPU Is Amdahl's law accepted for GPUs too? CUDA Programming and Performance	30	40881	October 18, 2008
Measuring Application Speedup in CUDA using Amdahl's Law -Clarification Needed CUDA Programming and Performance	5	6390	September 2, 2009
Theoretical Model showing speedup of GPU over CPU? CUDA Programming and Performance	3	1527	May 1, 2011
any theory to predict the speedup CUDA Programming and Performance	4	1648	March 28, 2009
Speed Up Calculations CUDA Programming and Performance	5	7347	January 10, 2011
What is maximum speed-up that can be obtained with GPU? CUDA Programming and Performance	6	12277	June 24, 2016
how to make a experinment fof Amadhls law? CUDA Programming and Performance	0	332	July 6, 2017
Needed strong CPU to max occupancy? CUDA Programming and Performance	1	507	February 17, 2017
speed up, S> no. of core ? is it possible ? CUDA Programming and Performance	5	3740	October 5, 2009
Speed Up Calculation CUDA Programming and Performance	8	7842	April 7, 2016

Amdahl's law and GUP's How can CUDA break Amdahl's law?

Related topics