Tesla C2050-based Supercomputer ranked #2 in the world!

ymc · May 31, 2010, 10:18pm

[url=“http://www.top500.org/list/2010/06/100”]http://www.top500.org/list/2010/06/100[/url]

The Chinese are surely rich and wise to use C2050s in the petaflops game.

ymc · May 31, 2010, 10:24pm

Peak 2.98PFLOPS, ie 6,000 Tesla C02050s??? External Image

Qazax · May 31, 2010, 10:33pm

no there are some intel CPU’s there too

ymc · May 31, 2010, 10:46pm

I know but still most of the PFLOPS should come from the Teslas

Looks like the Chinese like the idea of GPGPU very much. Their 2nd and 3rd ranked Supercomputer is based on AMD 4870 and Tesla respectively. No other countries in top 100 use GPGPU??? External Media

ymc · May 31, 2010, 10:54pm

BTW, is LINPACK a fairly assessment tool for GPGPUs??? I suppose you need a LINPACK that is tailored for GPUs to be fair???

jaafaman · May 31, 2010, 11:03pm

Trust me - this ranking is as fair as it gets.

And that’s one Hell of an accomplishment for the first time out!..

seibert · May 31, 2010, 11:14pm

Roadrunner is the next closest thing (slid from #1 to #3), using a high-end variant of the Cell processor for 90% of the FLOPS.

ymc · May 31, 2010, 11:34pm

[url=“http://www.hpcwire.com/home/specialfeaturetopitem/TOP500-Sluggish-But-Chinese-Supers-May-Portend-Big-Changes-Ahead-95271619.html”]http://www.hpcwire.com/home/specialfeature...d-95271619.html[/url]

Teslas contributed 2.32PFLOPS out of the peak 2.98PFLOPS. So it is more than 4,600 C2050s External Image

ymc · June 1, 2010, 12:27am

But how can you explain that CPU based systems usually have LINPACK running at 70-80% of peak PFLOPS but GPU based systems usually have less than 50% of peak PFLOPS?

plegresley · June 1, 2010, 1:41am

I imagine a lot of it has to do with PCIe bandwidth. But I don’t know how you explain the Mole-8.5 system at only 18.2%:

[url=“http://top500.org/system/10561”]http://top500.org/system/10561[/url]

I’ve never seen a system with such a low Rmax compared to Rpeak.

tmurray · June 1, 2010, 3:56am

hooray :)

ymc · June 1, 2010, 4:52am

Hey Mr Murray, can you explain to us why we can only perform up to a low percentage of our peak??? Thanks a lot!

ymc · June 1, 2010, 5:06am

I just noticed that the #2 is located in Shenzhen

Shenzhen is also the headquarter of BGI who bought 128 Illumina HiSeq 2000 machines to do DNA sequencing. Supposedly the machines can allow them to sequence 1,000+ genomes per year at 30x coverage. I am wondering if that’s what these Teslas will be used for. External Image

jaafaman · June 1, 2010, 8:58pm

Here’s a follow-up article on CNet.

[url=“http://news.cnet.com/8301-13924_3-20006450-64.html?part=rss&subj=news&tag=2547-1_3-0-20”]http://news.cnet.com/8301-13924_3-20006450...g=2547-1_3-0-20[/url]

edit -

If I’m reading this correctly, it would seem the PCIe link is indeed the reason for the disparity in their eyes as well…

Tom_Milledge · June 1, 2010, 10:33pm

Does anyone who knows care take this opportunity to offer their perspective on the state of the art for using HPL to measure the performance of GPU clusters?

dominik · June 1, 2010, 11:04pm

The general notion at ISC’10 (where this has been announced) was:

The Nebulae system is one hell of a power efficient machine, only because of the accelerators (aka Fermis) in it. People complained a lot that the 2-4 MW number is sort of unofficial compared to quoted numbers for e.g. Jaguar at Oak Ridge.
The PFLOP/s counting folks complained a lot about the extreme (in TOP500 standards) deviation between Rmax and Rpeak on the four GPU-accelerated machines. But this is just another notion of why Linpack and HPL is not necessarily a good metric to measure perf. The Nebulae architects most probably didn’t design this baby to max out in HPL :)

Apart from that, it was really fun to note how ISC’10 was centered around accelerators. Intel rebranded their Larrabee as MIC in a keynote (aka they quit the graphics market and joined the HPC world with the-design-formerly-known-as-LRB). Mellanox announced RDMA from CUDA-page-locked memory into their IB fabric as I reported in another thread. Half of the exhibition booths had GPUs in them. Folks at NERSC, at vendors, from the bigbuck labs and from all over Europe and the middle and far east kept on asking: What kind of application level performance can I get from using GPUs, now that their power efficiency is clearly established?

I tried my best to not amplify common GPGPU criticism (compare with a singlecore CPU reference, compare with unoptimised CPU code) in my talk about a recent collaboration on doing seismic wave propagation on 192 GPUs (the largest cluster we could get our hands on). In case someone is interested, the paper DOI is [url=“Modeling the propagation of elastic waves using spectral elements on a cluster of 192 GPUs | SpringerLink”]http://dx.doi.org/10.1007/s00450-010-0109-1[/url] and the slides are available on my homepage: [url=“http://www.mathematik.tu-dortmund.de/~goeddeke/pubs/talks/talk_isc2010.pdf”]http://www.mathematik.tu-dortmund.de/~goed...alk_isc2010.pdf[/url]

ymc · June 2, 2010, 12:32am

So LINPACK will take into account of the link between nodes also? If you use an 12X Infiniband QDR at 96Gbps, then you will get a higher LINPACK score than 10Gbps ethernet?

jaafaman · June 2, 2010, 12:36am

Actually, the interviewee lives about thirty minutes from me and I was trying to contact him through a friend at UT, but no luck thus far. Bad timing this weekend.

What little I’ve gathered was through the friend, obviously, and not the good Doctor. Point taken - I’m not that qualified and I’ll bow out…

plegresley · June 2, 2010, 2:18am

LINPACK certainly uses the network but it is well known that good performance doesn’t dependent on having a great network. The systems these days have so much memory that the ratio of communication to computation is so low that it’s just a FLOPS contest. That’s why you see so many clusters make it in to the Top 500 just using gigabit ethernet. The regular joke is you could probably do the communication using floppies, or USB flash drives these days, and still get a decent score. A better benchmark of system performance is HPCC:

http://icl.cs.utk.edu/hpcc/

But the benchmarks aren’t well defined for heterogeneous computing since it isn’t specified whether they refer to performance only within GPU memory or you need to include the PCIe data transfer costs.

ymc · June 9, 2010, 10:18pm

As I expected, this computer will be used for gene sequencing (actually genome assembly I think). External Image

[url=“http://www.mnn.com/green-tech/computers/stories/china-boasts-worlds-second-fastest-supercomputer”]http://www.mnn.com/green-tech/computers/st...t-supercomputer[/url]

China boasts world’s second-fastest supercomputer

Mother Nature Network

2010-06-08

China’s ambitions to become a major global power in the world of supercomputing were given a boost when one of its machines was ranked second-fastest in a survey.

The Nebulae machine at the National Supercomputing Centre in the southern city of Shenzhen can perform at 1.271 petaflops per second, according to the Top 500 survey, which ranks supercomputers.

A petaflop is equivalent to 1,000 trillion calculations.

The United States still dominates the list, holding top spot with its Jaguar supercomputer at a government facility in Tennessee, and more than half of the systems on the list, released at a supercomputing conference in Germany.

But China has a total of 24 systems on the list, and two in the top ten, with the Tianhe-1 supercomputer in Tianjin ranking number seven.
And the Nebulae, built by Dawning Information Industry Co., Ltd., has a theoretical speed of 2.98 petaflops per second, which would make it the fastest in the world.

The machine’s uses include scientific computing and gene sequencing, according to Chinese state media.

Calls to the company for further comment went unanswered.

The supercomputers on the Top 500 list are rated based on speed of performance in a benchmark test. Submissions are voluntary, so it does not include all machines.

The survey is produced twice-yearly.

Topic		Replies	Views
HPL CUDA Programming and Performance	11	42385	July 18, 2011
More details on new Tesla w/ Fermi GPU posted CUDA Programming and Performance	37	11371	December 12, 2009
Tesla S2050 performance double precision performance too low CUDA Programming and Performance	42	29123	December 8, 2010
Tesla C2050 (Fermi) benchmarking results CUDA Programming and Performance	18	8637	September 22, 2010
Tesla C2050 performance comparision with C1060 CUDA Programming and Performance	63	10183	September 14, 2010
Intel paper: Debunking the 100X GPU vs. CPU myth CUDA Programming and Performance	36	25218	April 7, 2011
Tesla 20-Series Features and Advantages CUDA Programming and Performance	65	151999	December 21, 2010
Tesla C2050 slower than GeForce 8800? CUDA Programming and Performance	14	20905	April 20, 2011
why the Tesla T4 peak performance test result mismatch with the official doc CUDA Programming and Performance	8	2466	October 19, 2019
Technical questions on GTX1080ti multiplication CUDA Programming and Performance	14	1938	November 11, 2017

Tesla C2050-based Supercomputer ranked #2 in the world!

Related topics