Hardware comparison

Hello Guys!

Let’s say I ran the exact same algorithm on two different boards: Tesla C2070 and GeForce GTX 680.

Theoretically, on which board will the algorithm run faster? Which of the following parameters is the most significant to formulate a response.

Compute Capability: [C2070 = 2.0][GTX 680 = 3.0]
Processor cores: [C2070 = 448][GTX 680 = 1536]
Processor core clock: [C2070 = 1.15 GHz][GTX 680 = 1.0 GHz]
Memory clock: [C2070 = 1.50 Ghz][GTX 680 = [6.0 Ghz]
Memory size: [C2070 = 6 GB][GTX 680 = 2 GB]
Memory bandwidth: [C2070 = 144 GB/sec][GTX 680 = 192.26 GB/sec]

Thanks in advance!


That’s not possible to predict without knowing what the bottleneck of the algorithm is.
Clock bound? Memory bandwidth bound? Memory size bound? Parallel enough to use more cores? Specific functionality required (float/double)?

Maybe have a look through this: http://docs.nvidia.com/cuda/cuda-c-programming-guide/

Your application may also be limited by specific instructions, like integer shifts, integer multiplies, atomics, or double-precision arithmetic. PCIe throughput is another potential bottleneck. You might want to start exploring your code with the help of the profiler.

Hey Detlef! thanks for taking the time!

I think my real question is: “What makes a GPU superior in terms of speed now days generally speaking” (sorry i’m new to this topic)

In terms of CPU, generally speaking, is the clock speed.

i.e. If I run an algorithm on a CPU A [1.6Ghz] and on a CPU B [3.2Ghz] I would expect a theoretically speed up of 2x of B over A.

Lets say my algorithm is pretty small to run on each board without exceeding any of the hardware capacities, and I’m using float operations only (I think GTX 680 doesn’t handle Double Precision as C2070 do).

So what is the factor to look up to in terms of speed?
[1] Cuda Cores
[2] Core Speed
[3] GFLOPs
[4] Memory Speed

One more questions! How do you compare GPU performance vs CPU performance… is there a unit time that links this two somehow? GFLOPs maybe? I know is not as simple as having just one parameter to measure speed up like when parallelism was made with CPUs only.

Thanks in Advance!