GPU, CPU and Xeon Phi Benchmark/Performance Comparison

Hi,

Is there a good performance comparison of GPU and Intel-Phi coprocessor available?
Regarding SGEMM, can GPU achive significant speedup compared to the Phi/CPU?

Thank you.

Google is one way to find an answer…

http://blog.xcelerit.com/intel-xeon-phi-vs-nvidia-tesla-gpu/

"The Tesla GPU is about twice faster than the Xeon Phi, and between 1.2x and 1.9x faster than the CPU. "

which was for 64 bit Monte Carlo.

And I believe that Jimmy P at some point ran his own benchmarks tests comparing the K20 vs the current Phi model, and said that the K20 was a clear winner.

Overall it would depend on the task, as I am sure a GTX 780ti or a GTX 980 would kill a Phi for image processing and brute force exhaustive search(particularly 32 bit)

Google is your friend. A quick search returned the following two relevant links among the first page of results:

Tesla K40 xGEMM performance:
http://developer.download.nvidia.com/compute/cuda/6_5/rel/docs/CUDA_6.5_Performance_Report.pdf

Intel Xeon and Xeon Phi xGEMM performance:
http://www.intel.com/content/www/us/en/benchmarks/server/xeon-phi/xeon-phi-sgemm-dgemm.html

Well, I am interested about the output of research comunity … For example I would like to show you
the following paper I just found …

http://sbel.wisc.edu/Courses/ME964/Literature/LeeDebunkGPU2010.pdf
“Debunking the 100X GPU vs. CPU Myth: An Evaluation of Throughput Computing on CPU and GPU”

Even though many internet references stated that GPUs are extremely fast, it is required to carefully
analyse them … :)

Regarding the Intel-Phi I found the paper:

“HPC Programming on Intel Many-Integrated-Core Hardware with MAGMA Port to Xeon-Phi”
http://www.netlib.org/utk/people/JackDongarra/PAPERS/multimic.pdf

According to that the speed up against the CPU is very low …!

Then by all means do your own research and make up your own mind.

It is going to depend your specific tasks and your ability to successfully implement your application in that hardware/software combination.

You specifically asked about SGEMM performance, so I pointed you to relevant information. You will find a wide variety of speed-ups reported in the literature. The methodology used by a good number of published papers on GPU to CPU performance comparison leaves to be desired, and some may be trying too hard to make performance gains appear to be as large possible. This is a valid point raised by the “debunking” paper, which however hardly represents an impartial analysis, as will hopefully be self-evident from the authors’ affiliation.

You would want to be skeptical of papers that report GPU speedups of 100x over CPU code, but for many non-trivial real-world scenarios speed-ups of 5x-10x over well-optimized CPU code are certainly possible and have been documented. It all depends on the specific use case. As one example, you may want to take a look at published performance of the AMBER molecular dynamics package: http://ambermd.org/gpus/benchmarks.htm

Most published papers I have read that compare the performance of the Xeon Phi to K20/K40 class GPUs show a performance advantage for the latter. Again, this will depend on the use case. Depending on your personal use case, you may need to perform your own evaluation if you cannot find a reasonably close scenario evaluated in the literature.

there is of course also the performance economy angle to contend; one i fear the phi would likely lose

the phi seems rather expensive; the last time i checked, you can buy around 4 gpu (titan/ 780ti) workhorses for the price of 1 (entry level) phi

hence, in fairness, considering economy, the phi really needs to beat 4 gpus, and not one

personally, interpreting that really equates to: “end of discussion” (i honestly do not see how a phi would be able to beat 4 gpus)

Well, that’s other way around! Intel Xeon Phi is very cheap … see the following link …
http://www.colfax-intl.com/nd/xeonphi/31s1p-promo.aspx

However, according to above posts phi is not fast as K40. But with this price
we can have 10 phi cluster or more … I don’t know what will happen then … :)

i was thinking of the Phi 3100, Xeon Phi 5110P and the Xeon Phi 7120

i do not know what a 31S1P is; however, for that price, i am confident that it is either:
a) a pc board with an intel logo sticker on
b) a ‘celeron’ phi

because that is all you will get for that price

you probably need to cluster 10 31S1P to get near a 3100; but you would still be worse off, as you would massively increase host side overhead

[by the way, you do not happen to know why /tmp/cuda-dbg/9734/session1/cudbgprocess is pushing 28G into virtual memory, do you?]

up until now, i could hardly perceive a value proposition, when comparing intel phis with their respective gpu equivalent, when equally noting the price differences

but now the 31S1P seems to be an outlier

i see it is passively cooled and thus a server co-processor; that is something to keep in mind of course

but it has the same power rating as a 3120 or 5110, so it is ‘expected’ to do as much work; and its other values like DP flop and memory bandwidth also compare to that of the 3120/ 5110
at the same time, the 31S1P is priced at about a 1/10 of the 3120’s price

hence, this can only mean one of 2 things, because, honestly, the price seems ‘off’:
a) something is giving somewhere
b) intel seriously wish to regain lost market share in hpc with such a price

i am not sure whether the 750ti could be seen as maintaining phi/ gpu equivalence

what am i missing?

I think Intel are selling off overstock before they introduce the next generation of fancy HPC hardware.

“I think Intel are selling off overstock before they introduce the next generation of fancy HPC hardware.”

perhaps. then again, i get the impression that the 31S1p is newer than the 3120 or 5110

i now see references like :

intel’s “fire sale / crazy Eddie sale”

“that Intel’s been running an insane special developer promotion on the Xeon Phi 31S1P Coprocessor”

“Right now you can save 90% off the regular price for an Intel® Xeon Phi™ Coprocessor 31S1P with our promotional price”

if intel would start a price war through lock-in via ‘samples’ - 90% off certain nvidia lines/ christmas in january…?

When I looked into Phi I read articles saying that programming for it is as hard as writing CUDA code, i.e. it’s a lot more than just slapping OpenMP pragmas on your code.

Also note that 31S1P uses PCIe 2.0