GPU, CPU and Xeon Phi Benchmark/Performance Comparison

samarawickrama · January 15, 2015, 2:01am

Hi,

Is there a good performance comparison of GPU and Intel-Phi coprocessor available?
Regarding SGEMM, can GPU achive significant speedup compared to the Phi/CPU?

Thank you.

CudaaduC · January 15, 2015, 2:32am

Google is one way to find an answer…

[url]http://blog.xcelerit.com/intel-xeon-phi-vs-nvidia-tesla-gpu/[/url]

[url][/url]

"The Tesla GPU is about twice faster than the Xeon Phi, and between 1.2x and 1.9x faster than the CPU. "

which was for 64 bit Monte Carlo.

And I believe that Jimmy P at some point ran his own benchmarks tests comparing the K20 vs the current Phi model, and said that the K20 was a clear winner.

Overall it would depend on the task, as I am sure a GTX 780ti or a GTX 980 would kill a Phi for image processing and brute force exhaustive search(particularly 32 bit)

njuffa · January 15, 2015, 4:17am

Google is your friend. A quick search returned the following two relevant links among the first page of results:

Tesla K40 xGEMM performance:
[url]http://developer.download.nvidia.com/compute/cuda/6_5/rel/docs/CUDA_6.5_Performance_Report.pdf[/url]

Intel Xeon and Xeon Phi xGEMM performance:
[url]http://www.intel.com/content/www/us/en/benchmarks/server/xeon-phi/xeon-phi-sgemm-dgemm.html[/url]

samarawickrama · January 15, 2015, 6:36am

Well, I am interested about the output of research comunity … For example I would like to show you
the following paper I just found …

http://sbel.wisc.edu/Courses/ME964/Literature/LeeDebunkGPU2010.pdf
“Debunking the 100X GPU vs. CPU Myth: An Evaluation of Throughput Computing on CPU and GPU”

Even though many internet references stated that GPUs are extremely fast, it is required to carefully
analyse them … :)

Regarding the Intel-Phi I found the paper:

“HPC Programming on Intel Many-Integrated-Core Hardware with MAGMA Port to Xeon-Phi”

According to that the speed up against the CPU is very low …!

CudaaduC · January 15, 2015, 7:16am

Then by all means do your own research and make up your own mind.

It is going to depend your specific tasks and your ability to successfully implement your application in that hardware/software combination.

njuffa · January 15, 2015, 7:16am

You specifically asked about SGEMM performance, so I pointed you to relevant information. You will find a wide variety of speed-ups reported in the literature. The methodology used by a good number of published papers on GPU to CPU performance comparison leaves to be desired, and some may be trying too hard to make performance gains appear to be as large possible. This is a valid point raised by the “debunking” paper, which however hardly represents an impartial analysis, as will hopefully be self-evident from the authors’ affiliation.

You would want to be skeptical of papers that report GPU speedups of 100x over CPU code, but for many non-trivial real-world scenarios speed-ups of 5x-10x over well-optimized CPU code are certainly possible and have been documented. It all depends on the specific use case. As one example, you may want to take a look at published performance of the AMBER molecular dynamics package: [url]http://ambermd.org/gpus/benchmarks.htm[/url]

Most published papers I have read that compare the performance of the Xeon Phi to K20/K40 class GPUs show a performance advantage for the latter. Again, this will depend on the use case. Depending on your personal use case, you may need to perform your own evaluation if you cannot find a reasonably close scenario evaluated in the literature.

little_jimmy · January 15, 2015, 8:14am

there is of course also the performance economy angle to contend; one i fear the phi would likely lose

the phi seems rather expensive; the last time i checked, you can buy around 4 gpu (titan/ 780ti) workhorses for the price of 1 (entry level) phi

hence, in fairness, considering economy, the phi really needs to beat 4 gpus, and not one

personally, interpreting that really equates to: “end of discussion” (i honestly do not see how a phi would be able to beat 4 gpus)

samarawickrama · January 15, 2015, 12:22pm

Well, that’s other way around! Intel Xeon Phi is very cheap … see the following link …
http://www.colfax-intl.com/nd/xeonphi/31s1p-promo.aspx

However, according to above posts phi is not fast as K40. But with this price
we can have 10 phi cluster or more … I don’t know what will happen then … :)

little_jimmy · January 15, 2015, 1:18pm

i was thinking of the Phi 3100, Xeon Phi 5110P and the Xeon Phi 7120

i do not know what a 31S1P is; however, for that price, i am confident that it is either:
a) a pc board with an intel logo sticker on
b) a ‘celeron’ phi

because that is all you will get for that price

you probably need to cluster 10 31S1P to get near a 3100; but you would still be worse off, as you would massively increase host side overhead

[by the way, you do not happen to know why /tmp/cuda-dbg/9734/session1/cudbgprocess is pushing 28G into virtual memory, do you?]

little_jimmy · January 15, 2015, 2:26pm

up until now, i could hardly perceive a value proposition, when comparing intel phis with their respective gpu equivalent, when equally noting the price differences

but now the 31S1P seems to be an outlier

i see it is passively cooled and thus a server co-processor; that is something to keep in mind of course

but it has the same power rating as a 3120 or 5110, so it is ‘expected’ to do as much work; and its other values like DP flop and memory bandwidth also compare to that of the 3120/ 5110
at the same time, the 31S1P is priced at about a 1/10 of the 3120’s price

hence, this can only mean one of 2 things, because, honestly, the price seems ‘off’:
a) something is giving somewhere
b) intel seriously wish to regain lost market share in hpc with such a price

i am not sure whether the 750ti could be seen as maintaining phi/ gpu equivalence

what am i missing?

cbuchner1 · January 15, 2015, 3:22pm

I think Intel are selling off overstock before they introduce the next generation of fancy HPC hardware.

little_jimmy · January 15, 2015, 3:40pm

“I think Intel are selling off overstock before they introduce the next generation of fancy HPC hardware.”

perhaps. then again, i get the impression that the 31S1p is newer than the 3120 or 5110

i now see references like :

intel’s “fire sale / crazy Eddie sale”

“that Intel’s been running an insane special developer promotion on the Xeon Phi 31S1P Coprocessor”

“Right now you can save 90% off the regular price for an Intel® Xeon Phi™ Coprocessor 31S1P with our promotional price”

if intel would start a price war through lock-in via ‘samples’ - 90% off certain nvidia lines/ christmas in january…?

alexgg · January 16, 2015, 3:09am

When I looked into Phi I read articles saying that programming for it is as hard as writing CUDA code, i.e. it’s a lot more than just slapping OpenMP pragmas on your code.

Also note that 31S1P uses PCIe 2.0

Topic		Replies	Views
Intel paper: Debunking the 100X GPU vs. CPU myth CUDA Programming and Performance	36	25222	April 7, 2011
Modern GPU CUDA Programming and Performance	30	5666	April 11, 2016
L2 cache difference between Tesla and Xeon Phi - impact ? CUDA Programming and Performance	9	3796	August 14, 2013
Seek advice on latest fermis CUDA Programming and Performance	14	1867	September 1, 2011
GPU vs CPU performance comparison CUDA Programming and Performance	9	15005	August 13, 2009
How to compare performance by power and area profiles of a multicore device ... CUDA Programming and Performance	7	858	January 19, 2015
performance between the CPU and GPU ? equivalence between the CPU and GPU ?? CUDA Programming and Performance	12	8161	September 21, 2010
Noob Alert: Tesla K20 slower than GTX 580? CUDA Programming and Performance	24	9148	November 3, 2013
CUDA book by Kirk & Whu available CUDA Programming and Performance	44	12114	February 10, 2010
why the Tesla T4 peak performance test result mismatch with the official doc CUDA Programming and Performance	8	2467	October 19, 2019

GPU, CPU and Xeon Phi Benchmark/Performance Comparison

Related topics