Amdahl's law and GUP's How can CUDA break Amdahl's law?

I would like to know how CUDA and multi-parrallel GUP’s can break Amdahl’s law.

“For example, if 90% of the program can be parallelized, the theoretical maximum speed-up using parallel computing would be 10x no matter how many processors are used.”

Thanks in advance if you know the answer to this question.


Ehm, you can’t break that law. It’s theoretically solid.
You answered your own question, the higher the percentage of work on the GPU, the higher the theoretical limit for the speedup.
So if you put 99% of the work on the GPU, you have a theoretical speedup limit of 100x


Note that the percentage of parallelism may be dependent on the data size.

Care to elaborate? How does this effect Amdahl’s Law? Can this be caculated?



That law by itself is a theoretical theorem, so it is unbreakable.

However, the input of the law, i.e. the percentage of parallelism, is manageable. Fortunately, for most GPU-friendly algorithms, this figure usually increases when we expand the data scale. Using many GPU cores to solve a small-scale problem may not help you so much, but they would quite fit large-scale problems (using the same program).

In addition, when more cores are integrated, the parallelism of the hardware may be increased (e.g. hiding memory access latency).

:thumbup: Thanks for the answer Nico!

Hmm, I have seen applications featured in the CUDA applications section which claim x200 speed-up. They don’t base this on this law? And the claim that a Tesla Super Computer is x250 faster than a standard PC. Wouldn’t the max be somewhere around 19x @ 95% parallelizm for 960 cores?



:geek: Interesting!

Are there any general purpose algroythms for caculating these graphs? Such as Dijkstra’s algorithm.


True, it would be somewhere around 19x @ 95% parallelizm for 960 cores, which leads you to the conclusion that your estimate of the percentage on the GPU is wrong and should be at least 99.5% for a 200x speedup.


Okay I found Gastafson’s Law for data parallizm regarding Amdahl’s Law after reading cvnguyen’s reply.

Speedup = (s + p ) / (s + p / N )

= 1 / (s + p / N )

With only 1% of the code in serial the max is 91%. Is it possible to have close to 0% with CUDA? John L. Gustafson acheived close to this:…aw/Amdahls.html

What is the best that can be expected with CUDA? I have read other posts claiming around 80x for CUDA.


if you consider Monte Carlo , then with 1000 of cores
you will have roughly X1000 :thumbup: , specially if the time to compute a path is “big”…(the the time to
manage the threads are negligeable).

it is not against the theorem :"> , it is only that the pourcentage for some algorithms can be very close to 1.
for 10 millions of path it is 99.9999% can be parralelized. :rolleyes:

Be vigilant! Those speed-up figures (usually hundreds X) claimed by CUDA programmers are the comparison between the GPU-based implementation and the corresponding CPU-based implementation. Some guys even cheated by running programs on state-of-the-art GPU cards (or even on multi-device systems) and comparing with obsolete CPUs. Some others compared well-optimized GPU programs and hastily-designed CPU programs.

The speed-up ratio mentioned in Amdahl’s law (as well as in other similar laws) as desired is the comparison between running the SAME PROGRAM on a single core/node and running on multi-core/multi-node system, all of which must have the same architecture.

Yes, it’s usually called “<a target=’_blank’ rel=‘noopener noreferrer’ href=’“’s_law”’>Gustafson’s law.” It relates specifically to how time scales with parallel processing while increasing the data size. Examples of this are image processing, monte-carlo, etc.