Beta

Slashdot: News for Nerds

×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

30,000-Core Cluster On Amazon EC2

Unknown Lamer posted more than 2 years ago | from the still-needs-a-few-more-cores dept.

Cloud 59

Joining the ranks of accepted submitters, hooligun writes with an article in Ars Technica about a rather large cluster built on EC2. From the article: "The details are impressive: 3,809 compute instances, each with eight cores and 7GB of RAM, for a total of 30,472 cores, 26.7TB of RAM and 2PB (petabytes) of disk space. Security was ensured with HTTPS, SSH and 256-bit AES encryption, and the cluster ran across data centers in three Amazon regions in the United States and Europe."

cancel ×

59 comments

use? (0)

Anonymous Coward | more than 2 years ago | (#37458688)

what exactly was it used for?

Re:use? (-1)

Anonymous Coward | more than 2 years ago | (#37458700)

what exactly was it used for?

Mining BitCoins.

Re:use? (0)

Anonymous Coward | more than 2 years ago | (#37458848)

There's no way. That would have made it into the title and the summary; you know how we love BitCoin.

Re:use? (0)

ge7 (2194648) | more than 2 years ago | (#37458704)

RTFA. It says for viagra and other pharma stuff.

Re:use? (2, Funny)

bigjocker (113512) | more than 2 years ago | (#37458776)

They are using it to pump the economy. The heating produced by this cluster must be cooled with extra air conditioning systems, increasing the demand for power and for air conditioning unis, thus creating new jobs and incentivizing the research for new energy sources.

Re:use? (0, Offtopic)

jsnipy (913480) | more than 2 years ago | (#37458890)

to run Crysis

Re:use? (1)

0123456 (636235) | more than 2 years ago | (#37458914)

to run Crysis

Ray-traced :).

Re:use? (1)

hedwards (940851) | more than 2 years ago | (#37459030)

Hacking the Gibson?

Re:use? (1)

gregrah (1605707) | more than 2 years ago | (#37463898)

I love the fact that there are at least 5 answers above mine, and no one has actually RTFA, so no one actually knows.

"2PB (petabytes) of disk space" (-1)

Anonymous Coward | more than 2 years ago | (#37458718)

Wait, I thought PB meant "Polar Bears"...!

Re:"2PB (petabytes) of disk space" (-1)

Anonymous Coward | more than 2 years ago | (#37459474)

Wrong again, it actually means Peanut Butter.

Imagine (-1, Redundant)

l-ascorbic (200822) | more than 2 years ago | (#37458744)

A Beowu... Never mind.

Why explain Petabytes (0)

Anonymous Coward | more than 2 years ago | (#37458780)

If you don't know the scale from yocto to yotta, then you need hand in your geek card.

Re:Why explain Petabytes (1)

tehcyder (746570) | more than 2 years ago | (#37467704)

If you don't know the scale from yocto to yotta, then you need hand in your geek card.

How many digits do yu know pi to? I always say that if you don't know at least the first thousand, you're no geek, and should have your geek card forcibly removed at gunpoint.

First time accepted submitter Beowulf (1)

clyde_cadiddlehopper (1052112) | more than 2 years ago | (#37458858)

Imagine the possiblilities. /. on steroids.

HTTPS for security? (0)

ChipMonk (711367) | more than 2 years ago | (#37458894)

Let's hope their European nodes didn't use any certs from Diginotar.

But at least they weren't using RSA tokens for authentication.

Re:HTTPS for security? (0)

Anonymous Coward | more than 2 years ago | (#37459092)

Are you serious or just trolling? What has private infrastructure + HTTPS have to do with public certificates? Yeah, nothing.

Re:HTTPS for security? (0)

Anonymous Coward | more than 2 years ago | (#37461800)

Security was ensured with HTTPS, SSH and 256-bit AES encryption

This is more like a marketing copy than a Slashdot story. These are just technologies/techniques. How they're used can determine security. Everybody has SSH, yet some of them get hacked.

Re:HTTPS for security? (0)

Anonymous Coward | more than 2 years ago | (#37463120)

Let's hope the USA nodes aren't under the jurisdiction of the...oh. =/

$1279 per hour (2)

certron (57841) | more than 2 years ago | (#37458900)

Before anyone else asks what I was about to, the full title of the article is: $1,279-per-hour, 30,000-core cluster built on Amazon EC2 cloud

How does that compare to the cost-per-core-hour for other Amazon EC2 offerings? Is this a value meal deal or just a lot of burgers?

Re:$1279 per hour (3, Informative)

mikeytag (1835928) | more than 2 years ago | (#37459006)

The article said each instance had 7GB of memory and 8 cores. That would translate to the High-CPU Extra Large Instance Type:

High-CPU Extra Large Instance 7 GB of memory, 20 EC2 Compute Units (8 virtual cores with 2.5 EC2 Compute Units each), 1690 GB of local instance storage, 64-bit platform
Source: http://aws.amazon.com/ec2/ [amazon.com]

That instance type will run you $0.68/hour standard or $0.24/hour spot. (US-East Pricing) (Spot pricing allows you to take advantage of unused EC2 instances at a discount. Also worth noting is that spot pricing changes over time.)

30,000 cores equates to 3,750 instances across different regions. Here is the breakdown on hourly pricing for standard and spot. (Reality is it was probably a mixture of both and the pricing for different regions varies).

Standard US-East: $2,550/hour
Spot US-East: $900/hour

The exact mix of machines in each region wasn't specified but $1,279/hour sounds about right if there is a mix of standard vs spot across different regions.

Re:$1279 per hour (0)

Anonymous Coward | more than 2 years ago | (#37459722)

Once you get past the initial administrative costs, commercial clusters usually only discount for guaranteed subscriptions or making your jobs lower-priority in the queues. I've never heard of a cluster giving bulk discounts, perhaps because it's rare for a reasonably-designed cluster to lack customers. If anything, a large cluster user would cause more headaches for the people running the cluster -- sort of like how larger dinner groups are charged a large gratuity because they cause a headache in coordination and making sure other customers are well-served.

Re:$1279 per hour (1)

geekthecat (546223) | more than 2 years ago | (#37463882)

It More like a SH!t burger! Wow! The performance is not impressive at all, yet the money there making is. Because the servers they are using are probably IBM POWER7. The article states that last jun 7,000 cores on EC2 was capable of ranking at 232 on the Top 500 list of super computers with a performance of 41.82 Teraflops. So looking over the list and comparing how a 7,000 cores of POWER7 system will do. We see at rank 50 a 6,912 cores of POWER7 with a performance of 212.12 Teraflops. Now lets up the cores to 30,000, so (30/7=4.28571) now 4.28571*41.82 = 179.22857 Teraflops for an INTEL i7 solution, and (30/6.9 = 4.34783) now 4.34783 * 212.12 = 922.2617 Teraflops to an IBM POWER7 solution. Wow! IBM's POWER7 has over 5 times the performance of INTEL's i7. So the cost of 30,000 POWER7 cores would be (30,000/16 = 1,875) which equates to 1,875 PS702 Blade severs each at a cost of $16,544.00 so the total cost would be (1,875 * $16,544.00 = $30,020,000.00) roughly 30 million and the cost of 30,000 INTEL i7 would be (30,000/12 = 2,500) so 2,500 HP Proliant BL2x220 blade server would cost (2,500 * $15,059 = $37,647,50) roughly $37 million. Now if AMAZON charges $1,279 per hour for 179.22857 Teraflops of 30,000 cores of i7 prcessors, then 30,000 cores of POWER7 would be 922.2617 Teraflops an increase in performance by a factor of (922.2617/179.22857 = 5.14573) so 5 time the performance 5 time the profit. Profit would be (5 * $1,279.00 = $6,395.00) so $6,395.00 * 24 hours * 365 days = $56,020,200.00 million using IBM's POWER7 or ($56,020,200.00 / 5.14573 = $10,886,735.2154) about $11 million using INTEL's i7 processors. So in conclusion you'll pay more for INTEL's i7 products and they're 5 times slower then IBM's POWER7. Just goes to show how many idiots there are in IT.

security for sure (0)

Anonymous Coward | more than 2 years ago | (#37458904)

Security ensured by HTTPS, SSH and 256-bit AES? That's alright then, no need to worry about security any more.

Re:security for sure (0)

Anonymous Coward | more than 2 years ago | (#37459080)

Not very smart are you?

You can only get data in/out via HTTPS, the data was encrypted with 256-bit AES, and the only means of login was SSH with public keys.

Assuming (yea, assuming) the HTTPS daemon and SSH installation was bulletproof, that's a pretty tough nut to crack.

Re:security for sure (1)

GameboyRMH (1153867) | more than 2 years ago | (#37461210)

Of course if Amazon had to, they could rip the storage encryption key from the VM's RAM...

Isn't EC2 really a cluster? (1)

pz (113803) | more than 2 years ago | (#37458990)

Help me understand something here ... isn't EC2 really one gargantuan cluster far bigger than 30,000 cores? So why is it news that it ran a big job? Was there some significant step forward in software that allowed features that were not previously available on EC2?

Re:Isn't EC2 really a cluster? (1)

Happy Finish (722598) | more than 2 years ago | (#37459088)

Help me understand something here ... isn't EC2 really one gargantuan cluster far bigger than 30,000 cores? So why is it news that it ran a big job? Was there some significant step forward in software that allowed features that were not previously available on EC2?

TFA is angled more at the fact that anyone can go out and rent something like this for their own ends.

Windows 8 (0)

andyring (100627) | more than 2 years ago | (#37458994)

Finally - a computer that can fully handle Windows 8!

Re:Windows 8 (1)

Hotweed Music (2017854) | more than 2 years ago | (#37459060)

You mean that operating system that hasn't come out yet that runs on 1 GB of ram and 800 MHZ?

Re:Windows 8 (0)

Anonymous Coward | more than 2 years ago | (#37459062)

It can't even do that - no display ;-)

Re:Windows 8 (0)

Anonymous Coward | more than 2 years ago | (#37459166)

Cluster or cluster-ing? You can can't do what you already are. Windows 8 should make that apparent.

Re:Windows 8 (0)

Rizimar (1986164) | more than 2 years ago | (#37459200)

But it still can't run Crysis

Re:Windows 8 (1)

Fortunato_NC (736786) | more than 2 years ago | (#37460294)

I know I'm feeding the troll but...

I'm running the Windows 8 developer preview (64-bit) on a five and a half year old laptop. Granted, I kicked the RAM up to 4GB ($44 shipped from NewEgg) and replaced the Core Duo with a Core 2 Duo (a T5600, $25 used on fleabay buy it now), but it runs well at 1900x1200 on hardware I basically rescued from the dumpster. You need to update your stock lines and stop mindlessly bashing.

Impressive (0)

Anonymous Coward | more than 2 years ago | (#37459020)

I assume the next step is Permutation City?

But, what was their password? (1)

G3ckoG33k (647276) | more than 2 years ago | (#37459124)

But, what was their password? So many details about that computer, but no password...

See what you can do (1)

Osgeld (1900440) | more than 2 years ago | (#37459174)

when you dodge taxes

Link a percentage of the top 500 together! (1)

Commontwist (2452418) | more than 2 years ago | (#37459272)

How powerful would one estimate linking multiple cloud and, of ten percent of the top 500 supercomputers would be? That would be one massive number cruncher.

Re:Link a percentage of the top 500 together! (0)

Anonymous Coward | more than 2 years ago | (#37459330)

Not very powerful, a single horse can pull weight.

Re:Link a percentage of the top 500 together! (1)

mscman (1102471) | more than 2 years ago | (#37460158)

Cloud computing and the Top500 computers are comparing different things. Generally, "Clouds" cannot efficiently run codes you would run on a Top500 machine, and vice-versa. They are large machines serving different purposes.

Re:Link a percentage of the top 500 together! (1)

F.Ultra (1673484) | more than 2 years ago | (#37460660)

They are actually set up quite similarly, the key difference is that cloud usually uses virtualization while the super computers doesn't so there is about 5-10% slowdown which you have to compensate by using more nodes.

Re:Link a percentage of the top 500 together! (0)

Anonymous Coward | more than 2 years ago | (#37463994)

Communication between nodes is very important for a lot of jobs that run on the supers. It's hard enough getting low latency / high bandwidth communications between all the nodes on a super. Linking them in the cloud would really just slow them down.

Re:Link a percentage of the top 500 together! (1)

F.Ultra (1673484) | more than 2 years ago | (#37473594)

The machines that make up the cloud is set up quite like supers. It's just that they might get a little lousier connection (GB Ethernet instead of Infiniband) but that is by choice of Amazon, there is nothing stopping them from selling IB connected nodes in their cloud. Of course in this specific example they connected more than one cloud with each other and then they had to communicate over the Internet. But that was also by choice and could be compared to when you connect different super centers which happens from time to time.

Re:Link a percentage of the top 500 together! (1)

mscman (1102471) | more than 2 years ago | (#37468872)

No, they really aren't. I work on a top 20 machine, and can tell you that attaching this via a high-latency interconnect (read: the web) would completely kill the purpose of using this machine. And no, you cannot just "compensate by using more nodes." Amdahl's law kills that idea right out. I've worked in both "cloud computing" (back when it was known as "grid computing") and HPC or High-Performance Computing. While they are similar in some ways, they are designed to fulfill different purposes and are best suited to different job types. The codes being run by Cycle for this project are EP codes, ones that would not necessarily benefit from the top 50 machines in the world. These machines are better suited for MPP work, which depends more on low-latency, high-speed interconnects.

Re:Link a percentage of the top 500 together! (1)

F.Ultra (1673484) | more than 2 years ago | (#37473564)

Ok so the nodes in the cloud is not connected via Infiniband but by Gigabit Ethernet but what made you think that they where connected via the web? And still I don't think that it invalidates that they are constructed quite similarly. Oh and of course you can offer inifiband clustered nodes as the cloud.

Bandwidth? (1)

oliverk (82803) | more than 2 years ago | (#37459308)

Didn't we just read that the US has fallen to #25 on the international speed list? So, is this like serving up Skynet over a 28.8 modem?

Charity? (1)

damuhatori (1203278) | more than 2 years ago | (#37459354)

They should donate a couple of hours a month to curing a disease.

Joining the ranks of accepted submitters, (0)

Khyber (864651) | more than 2 years ago | (#37459624)

Nobody gives two fucks. There's over 2 million registered UIDs on this site. Slashdot isn't some popularity contest. Quit turning Slashdot into fucking Digg or Reddit.

Re:Joining the ranks of accepted submitters, (-1)

Anonymous Coward | more than 2 years ago | (#37460352)

Caremad?

Re:Joining the ranks of accepted submitters, (0)

Anonymous Coward | more than 2 years ago | (#37468996)

Gotta agree with you here. It was bad enough with "first time accepted submitter", and now the struggle to come up with a clever new wording every time is getting very tiring.

Seriously Expensive (0)

Anonymous Coward | more than 2 years ago | (#37459678)

It is great that folks can do this kind of stuff in the cloud. However, looking at the cost structure for this cluster, it becomes very expensive, very quickly, at around $11,204,040 per year to operate assuming 24x7 operation. Also, this kind of HPC configuration is not for everyone. HPC environments can require high speed interconnects with super low latency, like Infiniband or Myrinet which is something I doubt that Amazon has invested in due to the cost of these types of solutions. If your application utilizes message passing interface (MPI) this solution is most likely not for you.

So for one off types of jobs, Amazon may be a great choice where you do not need to make a large initial upfront investment in technology and your HPC usage is spotty at best. However if you need a long term solution that will be operating 24/7 or you have security requirements that prevent operation in the cloud, you are better off making the investment into high performance computing. Just keep in mind that this type of solution is not for everyone.

More on the security... (0)

theendlessnow (516149) | more than 2 years ago | (#37459700)

You can verify the certificates used with DigiNotar... well.. site looks down... maybe when they are back up...

Wow (0)

kybur (1002682) | more than 2 years ago | (#37459728)

Imagine a beowulf cluster of these!

Weather prediction (0)

Anonymous Coward | more than 2 years ago | (#37460474)

Seeing how their total lack of reliability kept some webpages (meneame.net) down for about two days after that storm last month i just hope they put it to work on weather prediction.

Why don't you (0)

Anonymous Coward | more than 2 years ago | (#37460726)

Why don't you post an html address for this machine, so we can see if it can survive being slashdotted?

Crysis? (0)

Anonymous Coward | more than 2 years ago | (#37460792)

But will it run Crysis at max settings?

Take that anonymous (0)

Anonymous Coward | more than 2 years ago | (#37461644)

A few months back, old Anonymous tried to 'take out' Amazon by using the LOIC (Low Orbit Ion Cannon) in 'hive mind' mode. They found that even with dozens of kids sending thousands of requests per second they couldn't DDOS a.k.a. "Slashdot effect" Amazon. There was disappointment in geekdom. Lesser targets fell easily, but this one proved too strong, "We don't have enough machines" one hapless and dejected nerd wrote. Now you know why Sparky! A single machine running LOIC can take out one or two hyperthreaded cores, but if they have thousands of hyperthreaded cores, you will need half as many machines to run the attack (for it to be effective). They never had that many recruits willing to let their IP address be collected by the feds (or enough who didn't realize that their IP address would be sniffed by the feds), and so were unable to take Amazon down. A big fat load balancer feeding 30000 hyperthreaded cores can swallow everything the LOIC can feed it and not lose a byte or break a sweat. I heard that Amazon has 'on demand' systems that power up and respond to external requests. If 1000 new machines started sending loic requests at a given time, you could temporarily DDOS amazon (for the amount of time it takes for their servers to ramp up and handle the load). I suspect that would last only for a few minutes.

communication latency (1)

Orp (6583) | more than 2 years ago | (#37462054)

Neat, but for any job that isn't embarrassingly parallel, communication latency and speed will kill you when your nodes are spread across continents. If you're not doing any communication, well then groovy. Usually these large core servers are only 'earning their keep' when you're taking advantage of very fast interconnect hardware and doing things that can't be done by just a bunch of CPUs.

You've GOT to respect Amazon & Microsoft (0)

Anonymous Coward | more than 2 years ago | (#37475206)

Mainly because they're LITERALLY QUITE IMPERVIOUS to the "unstoppable attack online" - the DoS/DDoS!

How/Why?

Well, because they've "overbuilt" their ENTIRE infrastructure for one thing, hardware-wise, for telecommunications!

They also monitor their levels of "hits" their sites get, & IF they get too high (as they do in DDoS)? They can stall any that are coming from unrouteable addresses (think 172.x.x.x, 192.x.x.x, etc./et al).

Microsoft also has settings in its IPStack that help "stall out" DDoS/DoS too:

SynAttackProtect, EnableDynamicBacklog, MaximumDynamicBacklog, MinimumDynamicBacklog, TcpMaxDupAcks, TcpMaxHalfOpen, TcpMaxHalfOpenRetried

Those ALLt work IN COMBINATION with one another @ THE OPERATING SYSTEM'S IP STACK LEVEL!

(Also in combination with hardware measures noted above both MS & Amazon do, to stall off "the unstoppable attack method" (the DoS/DDoS)).

APK

P.S.=> It's the "how & why" you NEVER see Amazon OR Microsoft getting news that "anonymous/lulzsec" (& the like) "took down MS/Amazon via DoS/DDoS"...

(Because you KNOW that'd be "big news" IF it went down, of course... especially around here with all the "Pro-*NIX" sentiment (from the sockpuppet FUD spreading trolls that keep 100 user accounts to attempt to fool others with that bullshit))... apk

Check for New Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Create a Slashdot Account

Loading...