That’s not possible to predict without knowing what the bottleneck of the algorithm is.
Clock bound? Memory bandwidth bound? Memory size bound? Parallel enough to use more cores? Specific functionality required (float/double)?
Your application may also be limited by specific instructions, like integer shifts, integer multiplies, atomics, or double-precision arithmetic. PCIe throughput is another potential bottleneck. You might want to start exploring your code with the help of the profiler.
I think my real question is: “What makes a GPU superior in terms of speed now days generally speaking” (sorry i’m new to this topic)
In terms of CPU, generally speaking, is the clock speed.
i.e. If I run an algorithm on a CPU A [1.6Ghz] and on a CPU B [3.2Ghz] I would expect a theoretically speed up of 2x of B over A.
Lets say my algorithm is pretty small to run on each board without exceeding any of the hardware capacities, and I’m using float operations only (I think GTX 680 doesn’t handle Double Precision as C2070 do).
So what is the factor to look up to in terms of speed?
[1] Cuda Cores
[2] Core Speed
[3] GFLOPs
[4] Memory Speed
One more questions! How do you compare GPU performance vs CPU performance… is there a unit time that links this two somehow? GFLOPs maybe? I know is not as simple as having just one parameter to measure speed up like when parallelism was made with CPUs only.