questions about constructing supercomputer based on Tesla S1070


I am insteresting about how to construct a supercomputer using tens or hunderds of Tesla S1070, which has been done by Tokyo Tech last year. Does the supercomputer must be built as a custer since it’s recommended that there should be a single CPU core per GPU? How many GPUs is it appropriate to place in a node? What software is adopted for cluster management, Condor Roll or MPI? I will very appriociate if someone can talk more about the detailed configuration of hardware and software.

Thanks in advance.


I can’t tell you from experience, only from hearsay on the forums…but I think you would want to build a rack where the ‘spaces’ alternated between 1U rack servers (e.g. a Dell Poweredge) running whatever HPC software you want (Windows Server HPC, Linux with MPI, etc.) and Tesla S1070’s. The Teslas are just going to act as headless GPU’s for whatever server they are connected to, so you’d just need to write some multi-GPU compatible code for your computations and then split it up with MPI (or whatever).

And yes, you should have at least one CPU core per GPU (so in my example above, you’d want to get a 1U rack server with a quad-core Xeon, or perhaps even dual quad-core Xeons if your code is also very CPU-intensive, or you have other services running on the box or whatever).

Perhaps someone else with experience can give you a little more detailed information.

“Tsubame” was built with Sunfire X4600 compute nodes, which are an 8-way dual core Opteron NUMA design, and originally had a pair of ClearSpeed floating point accelerators sitting in PCIx slots with Infiband interconnects between compute nodes. From what I understand, the Clearspeed boards have been ditched in favour of S1070s, so they have something like 16 CPU cores and 4 GPUs per Telsa equipped node. Not quite sure how they did the PCIe connections, though, because I was under the impression that the Sunfire X4600 was a PCIx bus machine. They are running a Sun supplied software stack based on Grid Engine for scheduling, with Voltaire MPI over Inifiniband verbs.

S1070 should be connected to the host with PCI-E Gen2 Cable and PCI-E Gen2 host interface card, since there are 6 PCI-E slots and 2 PCI-X slots in Sun Fire X4600.

Thank for you and profquail’s answer.

I have referred to some material and roughly figured out what i had wanted.

If you’re thinking of building your own CUDA supercomputer system, there is some advice here:

I am focusing my mind on a S1070-based cluster ;)

the only thing ClearSpeed shows is that there is still a lot of room in lowering power consumption (1000 cores consuming 50W would be great!), but if they don’t get into the gaming industry, nobody will know who ClearSpeed is (like i was 5 min ago).