Is it possible to get an application speed up on a GPU more than the number of cores it has. For example suppose we have a GPU with 128 cores, is it possible to accelerate an application by a factor of more than 128.

If yes how? I think if S is the speed up, and N is the totral number of cores in a GPU, then only following equation holds true:

S < N or at max S=N

