does someone know where find more recent data like those in Fig. 1.1 of programming guide (I need a graph like that)? I know how to calculate theoretical GFLOPS peak performances of GPU but I don’t know how to calculate it for CPUs.
For a CPU, the convention is usually FLOP/s = number of cores * core frequency * FLOP/ per core per cycle. You will need to consult the specifications and documentation for whatever CPU you are interested in to get the necessary data. For the Intel Conroes I work with quite a bit, we use FLOP/s = 2 * 3.0G * 4 = 24 GFLOP/s per CPU.
A Core2 will do one 4 element wide MADD - counts as 8 FLOPS - for each core in the CPU
An Atom will do only half of that. Multiply by number of cores and GHz.
And peak performance is just that and especially not real world guestimates. Most of the time you’ll instead be restricted by bandwidth, latencies and/or permutations.
IPC would include what else can be done in parallel like loads and stores and evaluation of loops and branches. For the Core2, this about 4 - where the MADD again counts as two because it really is one MUL followed by one ADD - or if you insist: 6 (peak) . The Atom will do 2 instructions/clock.
For the LINPACK benchmark using Intel MKL, I get about 21 double precision Gflop/s from a theoretical 24 Gflop/s using both cores of an E6850 with DDR2-800 CL5 ram. For a Q9550 I get around 40 Gflop/s out of a theoretical 45.3 Gflop/s with the same DDR2-800 CL5 memory (the Core2 FSB doesn’t scale so well on quad core).
It has been ages since I did any serious LINPACK runs on our iron, but as I remember it, I only got about 5% lower performance using a LINPACK build with GotoBLAS compiled with icc/ifc on a single E6850. So if they are “cheating” they aren’t getting much out of doing so.