×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Cray Unveils XC30 Supercomputer

timothy posted about a year and a half ago | from the show-us-the-sheets dept.

Intel 67

Nerval's Lobster writes "Cray has unveiled a XC30 supercomputer capable of high-performance computing workloads of more than 100 petaflops. Originally code-named 'Cascade,' the system relies on Intel Xeon processors and Aries interconnect chipset technology, paired with Cray's integrated software environment. Cray touts the XC30's ability to utilize a wide variety of processor types; future versions of the platform will apparently feature Intel Xeon Phi and Nvidia Tesla GPUs based on the Kepler GPU computing architecture. Cray leveraged its work with DARPA's High Productivity Computing Systems program in order to design and build the XC30. Cray's XC30 isn't the only supercomputer aiming for that 100-petaflop crown. China's Guangzhou Supercomputing Center recently announced the development of a Tianhe-2 supercomputer theoretically capable of 100 petaflops, but that system isn't due to launch until 2015. Cray also faces significant competition in the realm of super-computer makers: it only built 5.4 percent of the systems on the Top500 list, compared to IBM with 42.6 percent and Hewlett-Packard with 27.6 percent."

cancel ×
This is a preview of your comment

No Comment Title Entered

Anonymous Coward 1 minute ago

No Comment Entered

67 comments

Does it have a bench-seat? (3, Insightful)

Jeremiah Cornelius (137) | about a year and a half ago | (#41924097)

It's no Cray, unless it also doubles as stylish atrium furniture.

Re:Does it have a bench-seat? (2)

srussia (884021) | about a year and a half ago | (#41924219)

It's no Cray, unless it also doubles as stylish atrium furniture.

...and space heater!

Re:Does it have a bench-seat? (2)

Jeremiah Cornelius (137) | about a year and a half ago | (#41924437)

Re:MORE CRAY PR0N! (0)

Anonymous Coward | about a year and a half ago | (#41924819)

More, I need more!

Re:MORE CRAY PR0N! (3, Interesting)

psergiu (67614) | about a year and a half ago | (#41925007)

How about Cray T90 - looking like something out of David Lynch's Dune:
http://www.craywiki.com/images/f/fb/T916.jpg [craywiki.com]

now THAT was a computer any CEO was proud to show to visitors.

Not a row of boring cabinets.

Re:MORE CRAY PR0N! (1)

Anonymous Coward | about a year and a half ago | (#41925279)

My favourite still has to be the Thinking Machines CM2:

http://www.mission-base.com/tamiko/cm/cm2-hds.gif

Re:MORE CRAY PR0N! (0)

Anonymous Coward | about a year and a half ago | (#41929341)

http://www.craywiki.com/images/f/fb/T916.jpg [craywiki.com]

now THAT was a computer any CEO was proud to show to visitors.

Reminds me of the Thinking Machines motto: "We are building a machine that will be proud of us."

Re:MORE CRAY PR0N! (1)

bedouin (248624) | about a year and a half ago | (#41935869)

Sorta looks like something that would summon the Cenobites if opened for repairs.

Re:Does it have a bench-seat? (1)

Anonymous Coward | about a year and a half ago | (#41925047)

It's not a Cray, full stop. Like Atari, the Cray name has been bought a completely unrelated company after the original company went bankrupt.

Re:Does it have a bench-seat? (2, Informative)

Anonymous Coward | about a year and a half ago | (#41925157)

It is however, the same group of very clever engineers.

Re:Does it have a bench-seat? (1)

jsfetzik (40515) | about a year and a half ago | (#41931803)

Yup. Cray's advantage has always been the engineering behind the interconnects and the highly optimized compilers. The people behind those are still around.

Re:Does it have a bench-seat? (1)

Celarent Darii (1561999) | about a year and a half ago | (#41940277)

By the way, for those who don't remember Cray computer and their engineers, I very much recommend this video: http://www.youtube.com/watch?v=J9kobkqAicU [youtube.com] Seymour Cray was a brilliant man and he attracted many brilliant engineers to work on his machines. The video gives some history from some actual workers at Cray and the first users of the Cray 1.

100-petaflops? Amazing! (1, Funny)

Anonymous Coward | about a year and a half ago | (#41924325)

That's almost enough to run Vista

Re:100-petaflops? Amazing! (1)

Anonymous Coward | about a year and a half ago | (#41924935)

So then not enough left to play Crysis 2?

Re:100-petaflops? Amazing! (1)

Anonymous Coward | about a year and a half ago | (#41925611)

But a beowulf cluster of those ALMOST could run it!

"Unveiled" my arse (1)

fatphil (181876) | about a year and a half ago | (#41924421)

They've released the output of a raytracer, and little more by the looks of it.

Things that don't exist are not "capable" of anything. (Well, unless you're of a religious persuasion...)

Re:"Unveiled" my arse (2)

suso (153703) | about a year and a half ago | (#41924933)

I'll be revealing my supercomputer that has finally broken the exaflop barrier in about an hour. (opens Blender)

Will it play... (0)

Anonymous Coward | about a year and a half ago | (#41924447)

Will it play Crysis? Will it blend?

details, details (1)

whistl (234824) | about a year and a half ago | (#41924623)

While the article says they 'unveiled' it, it doesn't give any information about the hardware at all. I'm guessing it hasn't actually been built yet. Too bad. The Top 500 Supercomputers list is due to be updated this month.

Re:details, details (3, Informative)

whistl (234824) | about a year and a half ago | (#41924781)

The Cray website (http://www.cray.com/Products/XC/XC.aspx) has more details. 3072 cores (66 Tflops) per cabinet, initially, and the picture make it look like they have 16 cabinets, making 49152 cores total. Amazing.

Re:details, details (1)

Desler (1608317) | about a year and a half ago | (#41925027)

They'll need more than 16 if this is a 100 petaflop computer. So either you are looking at the wrong machine or there's a typo somewhere.

Re:details, details (1)

corvair2k1 (658439) | about a year and a half ago | (#41925093)

The statement is that the xc30 can _make it_ to 100 PF. Nobody will build a 100 PF machine (i.e., 1600 cabinets, 8x more than Jaguar) with this product line, there will be upgrades before then. 32k sq ft of machine room space and cooling is too expensive.

Re:details, details (0)

Anonymous Coward | about a year and a half ago | (#41927759)

Wanna bet? I was previously cleared to visit sites where that would be a good thing, and doable. You're limiting your thinking to a two-dimensional space.

accelerators (0)

Anonymous Coward | about a year and a half ago | (#41932397)

66Tflops per cabinet is what the machine will ship with now. Cray has already announced that cascade will support both nvidia GPUs and Intel's knights Corner coprocessors, which should bump that up to 200-400 Tflops per cabinet, sometime next year. It would still take a lot of cabinets to hit 100 petaflops, but it begins to look plausible when you add accelerators.

Re:details, details (1)

godrik (1287354) | about a year and a half ago | (#41925017)

Actually Super Computing is next week. So the ranking will be available probably on monday!

Damn (2, Funny)

Anonymous Coward | about a year and a half ago | (#41924631)

That shit cray.

On your desktop in 11 years (5, Insightful)

michaelmalak (91262) | about a year and a half ago | (#41924635)

In November, 2001, the fastest supercomputer was 12 TFlops [top500.org]. You can achieve that today for less than $5,000 on your desktop by ganging together four GPGPU cards (such as the 3 TFlops Radeon 7970 for less than $500 each). Go back to 1999 and it's only 3 TFlops and to match today you wouldn't even need a special motherboard.

So just wait 11 years for the prices to come down.

Re:On your desktop in 11 years (1)

Anonymous Coward | about a year and a half ago | (#41924913)

Supercomputers measure double precision FLOPS while the GPGPU vendor cheat and report single precision. And that doesn't take into account the ugly "kernel" programming needed for GPGPU and memory synchronization.

Re:On your desktop in 11 years (4, Interesting)

michaelmalak (91262) | about a year and a half ago | (#41925025)

Supercomputers measure double precision FLOPS while the GPGPU vendor cheat and report single precision.

Ah, OK, Radeon is then 1 TFlop [rpi.edu] for double precision (which is new to the Radeon). So four Radeon 7970's beat the top 1999 supercomputer.

Re:On your desktop in 11 years (1)

bws111 (1216812) | about a year and a half ago | (#41925105)

Except that 1999 supercomputer was capable of doing real work. You have 4 fast GPUs sitting in a box, doing nothing. What is feeding them work, coordinating their inputs/outputs, etc? That is where all the hard work is.

Re:On your desktop in 11 years (0)

Anonymous Coward | about a year and a half ago | (#41925163)

This.

Re:On your desktop in 11 years (1)

michaelmalak (91262) | about a year and a half ago | (#41925165)

What is feeding them work, coordinating their inputs/outputs, etc? That is where all the hard work is.

OpenCL uses C99. It's tricky, maybe even "hard", but far from impossible.

Re:On your desktop in 11 years (1)

bws111 (1216812) | about a year and a half ago | (#41925307)

What I meant was, once you add in all the overhead of scheduling work, passing messages etc, you will find that you are running at a much slower speed than the raw speeds of the GPUs would have you believe. A GPU waiting for work, or memory access, or IO, or whatever is running at 0 FLOPS, regardless of how fast the processor is capable of running. If you can't keep those 4 GPUs running full speed doing actual work at all times, you have nothing near a 3 TFLOPS machine.

Re:On your desktop in 11 years (1)

michaelmalak (91262) | about a year and a half ago | (#41925365)

once you add in all the overhead of scheduling work, passing messages etc, you will find that you are running at a much slower speed than the raw speeds of the GPUs would have you believe

Would you happen to know how that compares to real supercomputers?

I don't have any first-hand experience with supercomputers -- only hearing about and reading about that they also struggle against Amdahl's law.

Re:On your desktop in 11 years (1, Informative)

Anonymous Coward | about a year and a half ago | (#41926055)

Well with supercomputers, the benchmark in the TOP 500 is LINPACK. Which will spit out the amount of double precision FLops. The theoretical performance is Ghz*cores*floating point ops/cycle = GFLops. thats in Gflops. A regular supercomputer with CPUs should never be below 80% of the maximum theoretical performance, if it is, something is wrong. A well tuned CPU cluster can get over 95% of the theoretical performance, a well tuned GPU cluster around 60%.
Staying with a small scale ( 12 TFLops ), a real cluster would need around 30 cards these days. This is just scaling up NUDT's 3 kW cluster at the ISC this year for the student cluster competition that got 2.6 TFLops with 9 cards. Also in that race was a pure CPU cluster that got 2.4 TFLops.

However the LINPACK score does not actually mean that your application will run well. Maybe its an application that has too much communications or has the wrong algorithms to run efficiently on a GPU. Maybe your application is really well suited for GPUs, this is usually the case for anything with large matrices and little communications, just like LINPACK. Because they have such a tight focus on rendering images, i.e making many 4D vector ops, GPUs are really bad at anything that is not vector parallel.

Re:On your desktop in 11 years (1)

Meeni (1815694) | about a year and a half ago | (#41929667)

Correct in general, but extensive research in the last 5 years has lead to many production codes today. GPU accelerators can indeed live to (most) of their promises, and would typically reach 55 to 70% of peak in typical deployments (Tian-he is a good example ~55% efficient). Top notch designs can extract as good as 85% of peak in LINPACK, that is obtained by Sequoia, unvailed last year. We'll see how Titan will fare, its the new Supercomputer GPU giant, that will be announced this year to replace the Jaguar. Its based on Cray XK6, Nvidia accelerated nodes. Usually, Cray machines are over 85% efficient, lets see if they can replicate the tour-de-force with GPUs.

Re:On your desktop in 11 years (0)

Anonymous Coward | about a year and a half ago | (#41925209)

Yeah, if only that box had some other processor specializing in scalar operations and connected to the vector processors via high bandwidth low latency link...

Re:On your desktop in 11 years (1)

michaelmalak (91262) | about a year and a half ago | (#41925303)

Yeah, if only that box had some other processor specializing in scalar operations and connected to the vector processors via high bandwidth low latency link.

An i5 has four cores and is connected to the Radeon via PCIe 3.0 x8.

Re:On your desktop in 11 years (0)

Anonymous Coward | about a year and a half ago | (#41925645)

Woosh.
Though to find supercomputers with of a whole bunch of big dumb vector units slaved to frontend processors, you have to go back further than '99.

Re:On your desktop in 11 years (1)

timeOday (582209) | about a year and a half ago | (#41929419)

The linpack yield of current generation GPU clusters is about 50% [mcorewire.com]. So while your point is valid, "doing nothing" is a rather large exaggeration. For that matter, 50% is the yield on a cluster, so the yield on a single-bus machine is almost certainly higher.

.

From the following, it sounds like 1 Teraflop - not theoretical, but on Linpack - Is available on a desktop [gfxspeak.com], now or very soon:

Intel has been working hard on its many-integrated core (MIC), which it describes as a 50+ core capable of one teraflops real-world performance. Intel revealed its strategy and product branding at the International SuperÂcomputing Conference held last June in Hamburg. It showed the Xeon Phi as an AIB that fits into one PCIe slot. The system had two Xeon E5 processors and a Knights Corner co-processor running the Linpack benchmark and hitting the magic one teraflops number.

August 31, 2012

That would be the fastest supercomputer in the world until early 1997 [top500.org].

Re:On your desktop in 11 years (1)

bws111 (1216812) | about a year and a half ago | (#41931387)

I stand by what I said, although maybe I worded it poorly. I did not mean that the config he proposed was uncapable of doing work. I meant that the only way to achieve the speeds he is talking about is by doing no work (in other words, not benchmarking, just going by what the box says).

Re:On your desktop in 11 years (1)

etash (1907284) | about a year and a half ago | (#41924927)

wrong. top500 measures double precision performance, not single.

Re:On your desktop in 11 years (1)

fuzzyfuzzyfungus (1223518) | about a year and a half ago | (#41925065)

The ugly trick is interconnect performance, unless you aren't planning to scale up very much at all or have the (atypical) good fortune to be attacking nothing but hugely parallel problems.

It's been a while since the supercomputer crowd found rolling their own esoteric CPUs to be worth it(with POWER the possible exception); but if all the silicon you want to devote to the problem won't fit on a single motherboard, you quickly enter the realm of the rather specialized.

At very least, you are probably looking at doing some networking as or more costly than a 10GbE setup. If you want a single system image, well, call your vendor and get your checkbook warmed up...

Re:On your desktop in 11 years (0)

michaelmalak (91262) | about a year and a half ago | (#41925135)

At very least, you are probably looking at doing some networking as or more costly than a 10GbE setup

There is no networking involved in a four-Radeon setup, just a special rackmount motherboard that has a dozen PCIe slots (because each Radeon is triple-width physically).

It's more like 13 years (1)

gentryx (759438) | about a year and a half ago | (#41929389)

The Top500 reports actual performance as measured with LINPACK, hardware vendors report the theoretical performance of their chips, which in the case of GPUs is often quite a bit more than you'd be able to squeeze out with LINPACK.

For comparison: Tsubame 2.0 consists of 1400 nodes with approx. 4200 NVIDIA Tesla C2075, which should yield -- according to your estimate -- 2.1 PFLOPS (4200 * 0.5 TFLOPS [nvidia.com]), yet it is listed at 1.2 PFLOPS [top500.org]. So just add two years to your estimate and you should be fine...

XC30 (4, Funny)

Cid Highwind (9258) | about a year and a half ago | (#41925033)

"Originally named 'Cascade'" ... and now named for a midsize Volvo.

It might not be the fastest supercomputer in the world, but at least it'll be safe.

Re:XC30 (1)

Anonymous Coward | about a year and a half ago | (#41925691)

The Cray product may also be faster than the Volvo product!

Theoretically up to? (0)

Anonymous Coward | about a year and a half ago | (#41925069)

What defines the upper bound for these systems? To some degree, isn't the limit mostly price and an algorithm that is parallel enough, in the sense you could keep adding nodes? Is there some limit to the number of nodes the interconnects and software can address, or do they assume some reasonable upper bound on price or size, or assume certain category of algorithms so they can estimate when communication between nodes becomes a bottleneck?

Re:Theoretically up to? (1)

corvair2k1 (658439) | about a year and a half ago | (#41925201)

It's the number of nodes that can be connected into a single machine multiplied by the theoretical peak performance of each node (implying zero actual communication). The limit on the number of nodes can be limited by a range of things, from how many nodes are addressable by the networking hardware to an #IFDEF on the maximum number of nodes the software is willing to support.

A question (0)

Anonymous Coward | about a year and a half ago | (#41925367)

Do these supercomputers run simultaneous jobs or is it one thing at a time? I'm wondering if process scheduling and context switching would cause the computer spin its wheels. Also, multiprocessing is probably not all that beneficial for the kinds of simulations run on the supercomputers.

What will they use it for? (0)

Anonymous Coward | about a year and a half ago | (#41925689)

Do they just keep running SPEC benchmarks on it until they get Nr.1?
Or do they have actual useful applications ready for this thing?
Just buy a rack of FPGAs and be done with it.

it has begun... all hail our combine overlords.. (0)

Anonymous Coward | about a year and a half ago | (#41925937)

...until the Freeman saves us...

Super computers
Black mesa
Cascade
Resonance cascade
2012...

I'm just... Saiyan...

"Just 5%"??????? (1)

rubycodez (864176) | about a year and a half ago | (#41927559)

For a company with a market cap of less than half a billion to have made 1 in 20 of the Top500 is an extraordinary achievement. IBM -> $215 billion, HPQ ->$27 billion

Re:"Just 5%"??????? (1)

RicktheBrick (588466) | about a year and a half ago | (#41928571)

Now lets asks how much power this computer will need? Lets say it can do a billion flops per watt. 100 petaflops is 100,000 trillion flops. A trillion flops is 1000 billion flops so a trillion flops is 1000 watts at a billion per watt. So 100,000 trillion flops would 100 million watts. So lets hope they can do at least 50 billion flops per watt so that would mean 20 watts per trillion flops or 2 million watts. At 10 billion flops per watt would mean 5 times that or 10 million watts. Now lets assume they could do the computing for 1,000,000 users at a time instead of those 1,000,000 user using 1,000,000 computers. Now lets assume that each of those 1,000,000 user's computers used only 100 watts so the 1,000,000 users would use 100 million watts. If evenly divided 100 petaflops would still mean 100 billion flops per user. Now lets say each computer cost only $500 or times a million would mean 500 million dollars. I would hope that this supercomputer would cost less than that. So I am saying that this supercomputer should cost far less than a equal number of personal computers and run at a small fraction of their power requirements. I have looked at the power requirements of a 19 inch monitor and they use about $4 a year so even 1,000,000 of them would not add to much to the power requirements. So 100 companies with only 10,000 computers each could build one of these and save money.

Re:"Just 5%"??????? (1)

Meeni (1815694) | about a year and a half ago | (#41929723)

$500 million is aprox. the entire budget over the lifetime of the computer (including the electric bill, which is becoming increasingly the dominant cost to amortize). Typical build cost is around $100M.

However, there is a false dichotomy in your comparison. The supercomputer is not designed to perform the job of 1 billion workstations. It is designed to perform a single task that could not be done on another machinery. Just like you cannot build a supertanker in a million bathtubes but need a shipyard, you cannot simulate the entire climate of the earth (or protein folding, or DNA analysis, or nuclear explosions, or whatever) on a million workstations. You need a machine that has the network and storage/memory capacity to tackle the grotesquely enormous problem you want to solve.

Re:"Just 5%"??????? (1)

Anonymous Coward | about a year and a half ago | (#41930011)

It's not really an achievement but a business model :-) they have 17% of the top 100, that is just their "sweet spot"
Cray is the Ferrarri of Computing ....

n nodes of 1/n capacity (0)

Anonymous Coward | about a year and a half ago | (#41928503)

Eh, I don't get it. It's a cluster. So what? How is this more impressive than n clusters of 1/n capacity? It's not like you or I are going to get to schedule all cpus simultaneously. My guess is, nobody gets to.

Numbers can be misleading.... (0)

Anonymous Coward | about a year and a half ago | (#41929989)

Cray has only 5..6 % of the top 500 but 17% of the top 100 only bypassed by IBM.
And HP is only "hanging in there" quite far behind, not only do they have only 6% of the "top 100", there is nothing in there with a really recent innovative tech, and it's obvious that they are more or less leaving this market...

This interestingly leaves only IBM and Cray as "top very very high end vendors", with Hitachi, Fujitsu and Bull as distant third level vendors...

Choice of CPUs (1)

unixisc (2429386) | about a year and a half ago | (#41930041)

Why does Cray still stick to Xeons? This would have been a perfect application for Itanium III, and they would have hit their petaflop goals easier

sandy storm (0)

Anonymous Coward | about a year and a half ago | (#41984421)

let the sandy storm does his work on this machine, sandy storm is on the sandy bridge of the intel chips to set an digital storm on the bridge

Check for New Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Sign up for Slashdot Newsletters
Create a Slashdot Account

Loading...