GFLOPs of intel xeon e5 2609

MaiSaid · June 21, 2015, 1:01pm

I am working on an intel xeon e5 2609 machine of 8 CPUs , how to know the gflops of my CPUs ? I want to compare the gflops of it with gflops of tesla c2070.

Thanks a lot

njuffa · June 21, 2015, 2:14pm

Xeon E5 2609 is a Sandybridge-class CPU which can execute one AVX multiply plus one AVX add per cycle. An AVX SIMD operation comprises four double-precision or eight single-precision lanes. Therefore, when running AVX code, the theoretical maximum floating-point throughput is 4 [cores] * 2.4e9 [Hz] * (4+4) [floating-point ops] = 76.8 double-precision GFLOPS. Single-precision performance is twice that, namely 153.6 GFLOPS.

Since you have an eight-processor machine, the combined theoretical throughput of all eight CPUs is therefore 614.4 double-precision GFLOPS, 1.228 single-precision TFLOPS. By comparisonn, the theoretical throughput of the C2070 is 515 double-precision GFLOPS, 1.03 single-precision TFLOPS, or about 84% of the combined CPUs.

CudaaduC · June 21, 2015, 5:49pm

8 xeon e5 2609 at $300 per CPU= $2,400 for 1,228 single precision GFLOPS

A single $550 EVGA GTX 980 clocks in at about 5,400 single GFLOPS.

cost per GFLOP for CPU set = $0.51

cost per GFLOP for GPU = $0.102

Also the memory bandwidth difference between a CPU and a current GPU (the tesla C2070 is at least 4 years old) is between 5-15, so that should be considered in any valid comparison.

MaiSaid · June 21, 2015, 6:36pm

Does this calculation of CPU GFLOPS imply on a single threaded written program ? Or it has to be AVX activated somehow?

njuffa · June 21, 2015, 9:12pm

Since my computation includes the contributions from all CPUs and all cores within each CPU it quite obviously does not pertain to single-thread execution. Since I assumed usage of all AVX lanes, it obviously also does not pertain to scalar execution. Utilizing the full floating-point performance of your system requires multi-threaded, SIMDized computation: 32 threads, each using 4-way or 8-way SIMD computation.

Note that threading and SIMD parallelization are two forms of parallelism (namly thread paralellism and data parallelism) that are orthogonal to each other: One can write an application that is multi-threaded, but where each thread only performs scalar computation using a single AVX lane. Likewise one can write a single-threaded application that uses SIMDized code using all AVX lanes.

Topic		Replies	Views
Comparing CPU and GPU Theoretical GFLOPS CUDA Programming and Performance	14	29500	May 24, 2014
what is the double-precision flops rating of the gtx580? CUDA Programming and Performance	16	33456	April 10, 2014
Where do all the little FLOPS come from? still dont understand the spec CUDA Programming and Performance	8	18572	February 23, 2007
GPU vs CPU theoretical single-precision peak performance CUDA Programming and Performance	11	8977	November 26, 2009
GPU vs CPU performance comparison CUDA Programming and Performance	9	14991	August 13, 2009
Theoretical FLOP speed Need clarification(s) CUDA Programming and Performance	8	28350	March 19, 2009
Peak Performance Computation CUDA Programming and Performance	4	2944	June 13, 2012
Double-precision on GTX 280 and coming telsa S1070 CUDA Programming and Performance	11	21576	August 22, 2008
Parallel Computing question3 Teaching and Curriculum Support	0	2319	April 13, 2014
How to compute the GFLOPS of a program? CUDA Programming and Performance	15	27403	June 24, 2011

GFLOPs of intel xeon e5 2609

Related topics