Theoretical maximum speedup factor

Hey developers, I have a question about the theoretical maximum speedup factor. I only concern linear case. For example, vector add. If I have a device with 14 multiprocessors, then I can expect the theoretical maximum speedup factor is 14?
Thanks a lot:)

If you want a theoretical speed-up you need to compare the flops (floating point operations per second). In order to that you need to multiply the number of instructions per cycle with the speed of the device. A comparison between cpu and gpu is given in the CUDA programming guide in the introductory chapter.

What is your reference point for the “theoretical maximum speedup”? One GPU compared to another GPU? A GPU compared to a CPU? A GPU compared to another accelerator?

The addition of long vectors is a memory-bound task. In that case the speedup in the limit will be bounded by the memory bandwidth of the respective parts, which depends on various factors (e.g. is access contiguous or strided).