Chart GPU vs CPU

tomasito · October 14, 2008, 2:38pm

Hi guys,

this is a really general question I have, and perhaps it does not fit in this forum, however I didn’t know where to ask for such an information.

I was working on a GPU implementation of an algorithm. Thinks work, algorithm calulation time was reduced and everything is fine! Nevertheless, now I have to write a report about it and I tried to find a recent comparison between CPUs and GPUs.

I found a chart that compares the CPUs development with the GPU from 2003 to 2006, however this is two years ago, so imensive development took place in the meanwhile. I mean isn’t there anythink out that compares the new Tesla 10 series or the 200series with the new Intel processors???

I had a look to recent articles published on nvidia.com and searched for it on google, but I couldn’t find anything, most article just focus on the passed T-gflops limit, but don’t put in in comparison.

So perhaps some of you has such a chart or knows where to find it!

Thanks!

spg · October 14, 2008, 5:34pm

Latest CUDA manual presented quite a recent graph, featuring GT200 and [I think, don’t know intel processors’ naming] some intel cpu.

gonnet · October 14, 2008, 5:47pm

Perhaps you should also note that this comparision is good for marketting, but it’s totally irrelevant otherwise. Instead of writting a report were you compare those, perhaps you should just discuss how pointless it is :)

just my 2cents…

tomasito · October 14, 2008, 6:02pm

@spg - thx, that is exactly what I was looking for, didn’t come up to me to search in the programming guide, even I was working with this document for weeks. :)

I know, that this graph doesn’t show the real computing difference for programs and as it is done by NVIDIA, it is pure marketing. However this is about the introduction to the issue and for that it is a good start The aim of the project is not just about speed but also about quality, implementation and if gpus are already prepared for general purpose programming… so it is discussed if all this marketing hype can hold what it promises…

Thanks guys!

MisterAnderson42 · October 14, 2008, 6:21pm

Of course. But in any published paper, even in scientific journals, the first and last pages must be marketing. a) Most people read the intro and skip to the conclusions and B) If they don’t see marketing there, they ignore your paper as pointless. It’s just the way the culture works and those few sensible people out there (like us) can’t change the entire culture.

Anyways… the update to the GFLOP/s graph is nice in the new 2.0 guide. And I like how they updated it with a mem bandwidth graph, too (since that is the only number I care about). It is funny, though. Their mem bandwidth graph only goes up to G80 ultra… I guess they didn’t want to show the bandwidth drop in G92 :) Just another example of culture and marketing. But they could have included the bandwidth for G200.

At least for the GPUs, you can get any of these numbers for yourself to make a prettier plot just by browsing through the Specifications tables at www.nvidia.com. I’m not sure where to find such nicely organized data for CPUs, though. It always seems like a PITA to find theoretical GFLOP/s numbers for CPUs.

tmurray · October 14, 2008, 6:26pm

Isn’t theoretical Flop/s for CPUs just (16 bytes/sizeof(datatype you care about–float or double))(number of cores)(frequency)? (first entry is to take into account SSE)

gonnet · October 14, 2008, 6:46pm

Of course. But in any published paper, even in scientific journals, the first and last pages must be marketing. a) Most people read the intro and skip to the conclusions and B) If they don’t see marketing there, they ignore your paper as pointless. It’s just the way the culture works and those few sensible people out there (like us) can’t change the entire culture.

Anyways… the update to the GFLOP/s graph is nice in the new 2.0 guide. And I like how they updated it with a mem bandwidth graph, too (since that is the only number I care about). It is funny, though. Their mem bandwidth graph only goes up to G80 ultra… I guess they didn’t want to show the bandwidth drop in G92 :) Just another example of culture and marketing. But they could have included the bandwidth for G200.

At least for the GPUs, you can get any of these numbers for yourself to make a prettier plot just by browsing through the Specifications tables at www.nvidia.com. I’m not sure where to find such nicely organized data for CPUs, though. It always seems like a PITA to find theoretical GFLOP/s numbers for CPUs.

[snapback]451980[/snapback]

You need not advertise theoretical performance that has absolutely no meaning, aka. don’t feed the troll with your publications :). This reminds me some performance figure on the Cell where people could not really agree on a theoretical bus bandwith, while it was pretty clear that there existed a limit that was typically hit by realistic applications. Substained GFlops may thus make a little more sense if they are taken with great care …

Just like processors which are often compared using BLAS or such kernels, perhaps it is more interesting to reference the best implementation of such and such kernels (typically gemm) to give order of magnitudes… such figures are pretty common in litterature. Once again, they have very little meaning, but i personnaly think they are less bad than “nvidia promised me i would get 10TFlops”.

Hopefuly (or not), i’m convinced we will soon or later have some LINPACK, or one of those spec* test suites … anyone have heard about such a thing by now ?

I definitely agree that finding a proper, and more or less reliable, theoretical performance analysis is a real pain, either on a CPU where we know almost everything of the underlying implementation, and even worse on GPUs where we know much less.

I suppose nvidia figures also result from the hypothesis that all ALUs are used during all cycles, with no memory stalls and so on ?

Anyway, i’m just being picky, but as everyone here seems to take caution with those numbers, there is no point in being dense anymore ;)

MisterAnderson42 · October 14, 2008, 7:08pm

Trust me, I tried. It was impossible to convince the other authors of the paper (who were not programming/hardware experts) not to include it. It was insisted that we needed something in the abstract/introduction that the layperson could understand as an explanation of why the heck we were even considering going through all this effort. And someone who has heard anything at all about HPC has certainly heard about the Top 500 and the race for more GFLOP/s, even if they have no real understanding of what it means.

Yeah. There is a benchmark floating around the forums somewhere that gets very close to this peak (for compute 1.x hardware), so the hardware is actually capable of achieving it. Of course, doing so requires thousands of MAD operations one after the other in each thread. I haven’t seen this benchmark updated for the G200 chips, it would need to be modified to do a MADD and a MUL every tick.

There is something to be said for the theoretical memory bandwidth on these GPUs, though. With coalesced accesses in CUDA, it is relatively easy to attain 80% of the theoretical peak across a wide variety of algorithms.

tomasito · October 14, 2008, 7:21pm

I think you become critical about published figures quite fast when you work in this field. Of course I searched for publications that did similar implementatios and I found quiete a few, however you really have to read bewteen the lines to see how the gain factors are achieved. One common trick is to compare the double CPU implementation with the float GPU one, in my implementation this can slow down the CPU version up to 30%.

paulius · October 14, 2008, 7:59pm

Slide 5 of the Siggraph CUDA session (http://developer.nvidia.com/object/siggraph-2008-CUDA.html) has the GT200 (GTX280) bandwidth included. The FSB curve stops in 2007 since it has been at 1600 MHz since (and QP wasn’t available at the time).

Paulius

tomasito · October 15, 2008, 8:09am

Thanks for the information, I am going to put both graphs, perhaps I will modifiy the bandwith and put the GT200.

Thanks guys, have to write now a report…

paulius · October 15, 2008, 5:43pm

An elaboration on the bandwidth figure. Both the FSB and GPU bandwidths are theoretical peaks (bus width x clock).

Also, the FSB bandwidth is read only bandwidth. Sometimes you’ll see quotes for FSB that add the read and write bandwidths, coming up with a larger number, but the vast majority of apps are read-bandwidth limited. Also, FSB write bandwidth is slower than the read bandwidth (I want to say 75%, but that I’m not positive about), at least prior to the latest version.

Paulius

Topic		Replies	Views
300x to 600x times faster... really? CUDA Programming and Performance	92	34393	February 8, 2010
Intel paper: Debunking the 100X GPU vs. CPU myth CUDA Programming and Performance	36	25197	April 7, 2011
2600x speedup for GA? Is this fake? CUDA Programming and Performance	19	14977	March 12, 2010
GPU Perfomance How much GFlops??? CUDA Programming and Performance	27	37238	August 30, 2009
GPU vs. CPU Comparison over the last years CUDA Programming and Performance	9	23029	January 11, 2010
maximum flops? CUDA Programming and Performance	5	3244	June 15, 2009
Comparing CPU and GPU Theoretical GFLOPS CUDA Programming and Performance	14	29443	May 24, 2014
some detail-questions for a bachelor-thesis CUDA Programming and Performance	5	10411	December 4, 2010
HPLinpack for CUDA Any interest? CUDA Programming and Performance	27	11950	May 10, 2012
benchmarking GPUs CUDA Programming and Performance	9	17476	September 12, 2008

Chart GPU vs CPU

Related topics