Amdahl's Law for GPU Is Amdahl's law accepted for GPUs too?

I understood what you mean! I know it myself that “how much faster will my algorithm run on CUDA.”…my problem is how to verify this speed up according to Amdahl’s law!! I want to say this in my report: “See!! the speed up from GPU is equal (or less or more than) the Amdahl’s law predicts.”

Thanks for your help :)

Is your issue just with P not being constant?

Actually, if you really wanted to talk about Amdahl’s law, you can frame it as “CUDA overheads” vs “max performance.” And use it to explain performance of different problem sizes.

Though keep in mind that many of the overheads aren’t constant (though some are), they also change with problem size. So if you wanted to be really fancy, you’d substitute “f” with an equation (which could be straightforward to derived for a simple algorithm like the CUBLAS matrix multiply). Then you’d have everything aligned.

No, my problem is to know how much is P in GPU for a given size of input data?

And about you previous post, I believe (and double checked it) even if f=0, the Amdahal’s law is valid (it’s not obviated), in that case the speed up is the number of parallel processors, which it makes sense. Therefore, if we are running this type of FULLY parallel program on a GPU, how much is the expected speed up in compare of running it on one CPU?

Good idea!

But there’s something which I don’t know how to explain it: the f for all the problem sizes is around 0.1, therefore Amdahl’s law predicts maximum of 10 times speed up. But, when I calculate the speed up by dividing the computation time of GPU and CPU, I found several speed ups ranging from 5 to 200 for smaller and larger sizes of the problem! <img src=‘http://hqnveipbwb20/public/style_emoticons/<#EMO_DIR#>/crying.gif’ class=‘bbc_emoticon’ alt=‘:’(’ />

First you say you understand me, then you show you obviously don’t. The speed up is the “number of parallel processors” only for HOMOGENOUS CLUSTERS and has nothing to do with CUDA, GPUs, and number of shader multiprocessors. When f=0, Amdahl’s law is obviated. How can an equation that says S == P still have any possible meaning? Your confusion comes from trying to maintain that, I guess, the units of measure for S are different from the units of measure for P, so somehow it still says something interesting. Except that is completely non-sensical, the units of measure are the same. (If S = P, the units must be equal.) You’re wrong in thinking that on the one side the unit is “speed-up” and on the other it can be “number of processors.”

Ok, well, that there is indeed a problem. You must be mis-calculating f.

In my case, for different problem sizes, f (as a proportion) is not changing too much, I checked it. Why do you think there is a miscalculating for f? I’m using QueryPerformanceCounter to count the duration spends on CPU, and the duration spends on GPU. Do you mean it is not possible to have a program in which f is this much small?

:)

No, f must be smaller. In this case, Amdahl’s theorem should be correct. If if ‘f’ is your CPU portion and it is 10% of the total, then speeding up the GPU portion can never give you 200x overall perofrmance. So, f must be something like 0.1%.

Hi, Amdahl’s law is really selfevident, if you just think for yourself: “how do I define the speed-up of a program and how much is it if I optimize just a part of it?”

If you have a program that has two pars A and B that both take a second and then you optimize part B so that it doesn’t take any time - 0 seconds! - So how much does your program take after that A + B = 1s - speedup is from 2s to 1s => 50% time 1 / (50%) = 2 => 2x speedup. It’s as simple as that.

Then of course a completely separate question is how much faster something gets done with the GPU than the CPU - for example if part B takes 10 seconds on the CPU and 1 second on the GPU then you can say that the GPU is 10x faster on that problem, but that the total speedup you gain by using the GPU in your program is:

(10 + 1)s → (1+1)s => 11 / 2 = 5.5x.

Amdahls law gives us the trivial rule of optimization: Optimize those parts of code that take the most time.