×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

IBM Building 120PB Cluster Out of 200,000 Hard Disks

Soulskill posted more than 2 years ago | from the go-big-or-go-home dept.

Data Storage 290

MrSeb writes "Smashing all known records by some margin, IBM Research Almaden, California, has developed hardware and software technologies that will allow it to strap together 200,000 hard drives to create a single storage cluster of 120 petabytes — 120 million gigabytes. The data repository, which currently has no name, is being developed for an unnamed customer, but with a capacity of 120PB, it's most likely use will be a storage device for a governmental (or Facebook) supercomputer. With IBM's GPFS (General Parallel File System), over 30,000 files can be created per second — and with massive parallelism, and no doubt thanks to the 200,000 individual drives in the array, single files can be read or written at several terabytes per second."

cancel ×
This is a preview of your comment

No Comment Title Entered

Anonymous Coward 1 minute ago

No Comment Entered

290 comments

What's it for? (2)

yomammamia (1916736) | more than 2 years ago | (#37218564)

A billionaire's porn collection?

Re:What's it for? (1)

Scareduck (177470) | more than 2 years ago | (#37218760)

What's it for? No surprise, domestic spying.

Re:What's it for? (5, Funny)

Given M. Sur (870067) | more than 2 years ago | (#37218864)

What's it for? No surprise, domestic spying.

I think you mean "protecting your freedoms, fellow patriot."

Re:What's it for? (-1)

Anonymous Coward | more than 2 years ago | (#37218914)

What's it for? No surprise, domestic spying.

It is like the government wants to see a bunch of fatties saying and doing a bunch of really stupid things.

Why else would you want to spy on Americans?

Re:What's it for? (1)

swan5566 (1771176) | more than 2 years ago | (#37218790)

Satellite companies/government agencies is one sector that could use this. They gather terabytes' worth of new data every day.

Re:What's it for? (2)

yomammamia (1916736) | more than 2 years ago | (#37218886)

Could be a company that intends to rent out space to such agencies and for such uses or for cloud computing (amazon).

Re:What's it for? (1)

Hatta (162192) | more than 2 years ago | (#37218930)

Why would a billionaire need porn?

Re:What's it for? (0)

Anonymous Coward | more than 2 years ago | (#37219056)

Since when is *need* an impetus for whatever billionaires acquire and collect?

- T

Re:What's it for? (1)

Hatta (162192) | more than 2 years ago | (#37219144)

I'm just suggesting that billionaires would have a better option than porn. Why collect porn when you could collect porn stars?

Depressing (0)

Anonymous Coward | more than 2 years ago | (#37218572)

Anyone else find it depressing that the two top suspects for the use of this system are Facebook and presumably a spy agency?

Can humanity come up with no better use for the biggest iron than a bunch of frivolous, narcissistic ad profiling and covert spying on people living in an allegedly free country?

No wonder F@H doesn't post more progress. Our hardware is going towards people sharing their naked bong photos and government spooks cataloging your naked bong photos.

Re:Depressing (2)

m50d (797211) | more than 2 years ago | (#37218702)

I can see the likes of the LHC or the AEA using something like this - they generate enough data. But if it were a "good guy" why would they keep it secret?

Re:Depressing (4, Insightful)

PPH (736903) | more than 2 years ago | (#37218736)

Facebook and presumably a spy agency?

You're repeating yourself.

emo? (1)

luis_a_espinal (1810296) | more than 2 years ago | (#37218952)

Anyone else find it depressing that the two top suspects for the use of this system are Facebook and presumably a spy agency?

Can humanity come up with no better use for the biggest iron than a bunch of frivolous, narcissistic ad profiling and covert spying on people living in an allegedly free country?

No wonder F@H doesn't post more progress. Our hardware is going towards people sharing their naked bong photos and government spooks cataloging your naked bong photos.

You are trying too hard looking for something to be upset about (in a very attention-whorish manner to boot.)

Re:emo? (1)

luis_a_espinal (1810296) | more than 2 years ago | (#37219162)

And just to prove my point.

Can humanity come up with no better use for the biggest iron than a bunch of frivolous, narcissistic ad profiling and covert spying on people living in an allegedly free country?

Yes. It ain't that hard to come to that answer, you know? The slashdot's story half-seriously hints at either a government agency (NSA) or somebody like Facebook. And obviously in Emo fashion, you took it as an statement about humanity. It's more a statement about you.

I find these type of opinions rather simplistic as other opportunities in large-scale application engineering abound:

  1. Data collection and simulation done by the DoE or DoT (not necessarily just a DoE-related agency)
  2. A High-Energy particle collider
  3. Big-Iron for large-scale Online Transaction Processing (think Airline reservation systems)
  4. Algo-Trading
  5. Pharma
  6. Or even IBM's own venture into, God knows, web service platform providers or online searching in direct competition with Google, MS or Amazon.

With the exception of the first two, all others potential clients could request anonymity. Would be nice to know for what purpose this behemot is being built. Would be even better if people could rub a pair of neurons together and come with similar sample lists (it's not rocket science) as opposed to go ZOMG humanity sux plz hold me!.

I wonder.. (2)

eexaa (1252378) | more than 2 years ago | (#37218586)

...about the sound and torque generated when all these disks start to spin-up.

Re:I wonder.. (0)

Anonymous Coward | more than 2 years ago | (#37218650)

And the heat, assuming they're using all their old Hitachi Deskstar drives.

Re:I wonder.. (1)

ae1294 (1547521) | more than 2 years ago | (#37218750)

And the heat, assuming they're using all their old Hitachi Deskstar drives.

That sounds like a plot to a disaster movie... "Sir, the cluster won't shut down! We're looking at a full melt down!"

Re:I wonder.. (0)

Anonymous Coward | more than 2 years ago | (#37218722)

...as if millions of magnetic heads suddenly cried out in terror...

Re:I wonder.. (1)

crow (16139) | more than 2 years ago | (#37218798)

If the torque were an issue (which it's not), you could mount the drives in alternating directions to balance them out.

Re:I wonder.. (2)

eexaa (1252378) | more than 2 years ago | (#37219046)

My geek nature disapproves such torque-negating behavior. Instead, it totally wants to see the petabytes spin at some insane RPM, cancelling the gravity and possibly crushing some enemies.

Paranoid much? (1)

skids (119237) | more than 2 years ago | (#37218594)

it's most likely use will be a storage device for a governmental (or Facebook) supercomputer.

Actually, given the explosion of data storage needs in the bio-informatics area, it's most likely use would be in storing DNA sequences for research purposes.

Re:Paranoid much? (1)

ByOhTek (1181381) | more than 2 years ago | (#37218660)

The human genome can effectively be stored in about 750MB (each base being only 2 bits). The largest genomes are only abut 10x that size. IIRC the FASTA files for it take only about 3GB uncompressed.

Even with specific protein sequences, etc. I think that's a bit excessive the bio-informatics field.

Also, I'm not sure if even the NIH could afford that kind of storage cluster.

Re:Paranoid much? (1)

yomammamia (1916736) | more than 2 years ago | (#37218756)

"The human genome"? That's a bit of a misnomer. With compression and differential storage however the point is still valid.

Re:Paranoid much? (3, Informative)

Anonymous Coward | more than 2 years ago | (#37218840)

modern gernome compression techniques only store the edits needed to convert the reference genome to your genome. And the diff file is just around 24 MB per person. I am an ex-bioinformatician.

Re:Paranoid much? (1)

tomknight (190939) | more than 2 years ago | (#37218842)

Data requirements are doubling faster than disk storage capabilities. We're needing to find ways of dealing with this, but ideally without simply asking for more money for more disks. I've just been told a new academic here will need about 200TB in a few months. I can see my (fairly small set) of Bioinformatics researchers needing a PB before the end of next year.

Re:Paranoid much? (2)

biodata (1981610) | more than 2 years ago | (#37218858)

Our modest lab turns out roughly 100GB a week of finished sequence, from a single sequencer, which is only a very small fraction of the temporary disk storage needed along the way to get to finished sequence. Genome centres with many machines will turn out an order of magnitude (or two) more, and believe me, these machines are kept busy week after week. Once we have finished sequences, the assembly process adds a multiple to this. Yes, a genome is only XMB, but when you have to effectively sequence it 40 times to get the overlaps you need to assemble the thing, it soon mounts up. The sequencer machine companies are now touting similar scale machines on the basis that any lab can afford one to do their own sequencing. Sequence volumes have been outstripping Moore's law for some time now, and it isn't going to stop anytime soon. That said, I think Facebook and their CIA funders are probably more likely to have the money for this than anyone doing anything useful for humanity.

Fill 'er up (4, Funny)

mmarlett (520340) | more than 2 years ago | (#37218596)

All I know is that if you put it on my computer, I'll have it filled in two years and have no idea what's actually on it.

Must Be (0)

Anonymous Coward | more than 2 years ago | (#37218610)

downloading to much from TL again.

When that thing crashes (1)

jader3rd (2222716) | more than 2 years ago | (#37218670)

When that thing crashes somebody is going to be mad. I wonder how long restoring from backup is going to take.

How are they going tho power that thing? (0)

Anonymous Coward | more than 2 years ago | (#37218688)

If I'm not mistaken one hard drive needs about 12~14W, so assuming that half of those are under load at a time how are they going to power that thing?

Not counting with all the needed AC and support computers, network, etc...

Would be a good fit for CERN LHC (2)

Tynin (634655) | more than 2 years ago | (#37218714)

My understanding is that the LHC generates so much data, that most of it is discarded immediately without going to disk. Seems like this would be a good solution to there data problems.

Re:Would be a good fit for CERN LHC (0)

Anonymous Coward | more than 2 years ago | (#37218778)

Sites with LHC data (CERN and associated institutes) already use about 200PB of storage, barely enough for the current mode of operation.

Re:Would be a good fit for CERN LHC (-1)

Anonymous Coward | more than 2 years ago | (#37218836)

A good solution to THEIR data problems.

Re:Would be a good fit for CERN LHC (1)

Tynin (634655) | more than 2 years ago | (#37219062)

A good solution to THEIR data problems.

Irregardless, grammer and spelin ain't no science, its a art form... and for all intensive porpoises, I lost power last night do to the slight'est bit of wind from whether system Irene (I live in south florida) from about ~6:30PM till ~3:00AM... at lease my generator worked, and FPL was on my road buy 8PM. Not sure why I came into work... I'm so tierd.

p0rn (0)

Anonymous Coward | more than 2 years ago | (#37218720)

or it is build for some ones porn collection.

Not the government. (4, Interesting)

girlintraining (1395911) | more than 2 years ago | (#37218724)

It's not the government guys, at least not the cloak and dagger kind. They're too paranoid to let you know how much data they can store. They also don't want you to know that even with all that data, they're still only able to utilize a fraction of it. People are still going through WWII wire intercepts *today*. No, the problem in the intelligence community is making the data useful and organized as efficiently as possible, not collecting it.

That leaves only one real option: Scientific research. Look at how much data the Hadron Supercollider produces in a day. ..

Re:Not the government. (4, Insightful)

DrgnDancer (137700) | more than 2 years ago | (#37218946)

This is generally something I have a hard time convincing people of. I've worked for spooky organizations. Not at the highest levels or on the most secret projects, but in the general vicinity. The government is not monitoring you. Not because they lack the legal capability (though they do, and that is mostly, but not always, respected), but because they lack the technical ability. There are only so many analysts, only so much computer time, only so much storage. Except in cases of explicit corruption or misuse of resource, those analysts, that computer time, and that storage is not being wasted on monitoring Joe and Jane average.

I'm not going to say that there aren't abuses by the people who have access to some of this stuff; they are human and weak like the rest of us and are often tempted to take advantage of their situation I'm sure. In general however, unless you've done something that got a warrant issued for your information, the government doesn't care. They just don't have the resources to be big brother, even if they want to be.

Re:Not the government. (1)

Anonymous Coward | more than 2 years ago | (#37219014)

Entirely besides the point. They could, therefore it can be abused, therefore there must be EXTREME oversight. Period.

If the only difference between today and a surveillance state is some manpower, then all it takes is a change in policy. I don't trust them that much, I do now, and always will, advocate that the organs of state secrecy be dismantled. We don't need a state to begin with, these groups, even less so.

In terms of what the people need, it is little more than in ineffective jobs program.

Re:Not the government. (1)

DrgnDancer (137700) | more than 2 years ago | (#37219304)

It's not besides the point, it's the practical side of the point. This doesn't mean we should ignore questions of morality, how much power is too much, how much monitoring is appropriate, etc... It just mean that while these philosophical questions are both interesting and relevant you don't really need to worried about the practical implications day by day. Practically, the government *can't* watch you all the time, or really at all, unless you are the subject of some investigation worth those resources (or someone is doing something they shouldn't be). That doesn't mean we shouldn't seek to limit and control then from a legal and regulatory perspective, but it does mean you probably don't need to worry about spies in your attic.

Re:Not the government. (0)

Anonymous Coward | more than 2 years ago | (#37219050)

>> People are still going through WWII wire intercepts *today*

This is true. But you neglect to mention that the people doing this are historians.

I propose a name for it ... (1)

tomhudson (43916) | more than 2 years ago | (#37218740)

FTFS:

The data repository, which currently has no name, is being developed for an unnamed customer,

It's the tech equivalent of Prince - it's "the data repository with no name." We can denote it with some sort of unicode glyph that slashdot will mangle.

And of course it has amazingly fast read speeds - if each drive has a 32 meg cache, that's 6.4 terabytes just for the cache.

BTW, it's for the ^@#%^&^+++NO CARRIER

1.21 Jigawatts (0)

Anonymous Coward | more than 2 years ago | (#37218754)

So if I had that kind of power, would I want to power a 120PB cluster or a flux capacitor. Decisions, decisions.

Re:1.21 Jigawatts (1)

Abstrackt (609015) | more than 2 years ago | (#37218828)

I'd go with the flux capacitor personally. Then you can go back in time, invest in IBM, Microsoft, Google, and Apple when shares are still cheap and buy the 120PB cluster. Assuming you drive a DeLorean anyway.

Proof of corporate favoritism by government (0, Offtopic)

OzPeter (195038) | more than 2 years ago | (#37218792)

The government happily stands by when a major corporation announces that it has 120 petabytes (ie petafiles - my emphasis) under its control, yet if the average joe schmo even thinks about how they'd like a petafile or two at home the FBI, CIA, TSA, ICE (and every other TLA) hauls his ass off to jail and and etches a scarlet letter on his forehead.

Such harassment by the government of simple people who aren't hurting anyone else needs to be stopped. Think of the children -- how are they going to cope when their own father/uncle/priest gets charged with accessing petafiles? They'll be the laughing stick of their peers!

Re:Proof of corporate favoritism by government (0)

Anonymous Coward | more than 2 years ago | (#37218966)

What are you talking about? I'm disenfranchised with the control as well, but who has been taken to jail for owning large amounts of data?

Re:Proof of corporate favoritism by government (0)

Anonymous Coward | more than 2 years ago | (#37219010)

it was a really bad joke... peda/peto... took me a while to understand as well.

Re:Proof of corporate favoritism by government (1)

rubycodez (864176) | more than 2 years ago | (#37219142)

the government is too busy with its War on Terrabytes to worry about the petafiles

Loading times (1)

ifrag (984323) | more than 2 years ago | (#37218806)

Perhaps this cluster can load Deus Ex : Human Revolution levels in a reasonable amount of time!

Re:Loading times (0)

Anonymous Coward | more than 2 years ago | (#37219280)

What? Levels take just a few seconds to load on my average gaming build.

Good job for a HS kid... (1, Interesting)

spagthorpe (111133) | more than 2 years ago | (#37218812)

Run around with a shopping cart and swap out drives as they fail. Kind of like they did back in first computer days with vacuum tubes.

Constant failures? (1)

LordNimon (85072) | more than 2 years ago | (#37218854)

With 200,000 hard drives, won't there always be at least one hard drive that is failing? You'll need an IT guy 24/7 swapping out the failed drives. As soon as he swaps out one drive, another one will fail. It just seems kinda ridiculous.

Re:Constant failures? (0)

Anonymous Coward | more than 2 years ago | (#37218896)

No worries, they're using deskstars, so they'll all fail at once.

Re:Constant failures? (2)

SuperQ (431) | more than 2 years ago | (#37218912)

This is what MTBF is all about. "Enterprise" drives are rated at 1.2 million hours MTBF. 1,200,200 hours / 200,000 drives = 6 hours per drive failure. Not too bad, only 4 a day.

Re:Constant failures? (1)

bigredradio (631970) | more than 2 years ago | (#37218940)

Since they can't backup to tape, maybe they will convert their old tape library to swap out hard drives.

Re:Constant failures? (1)

Jeng (926980) | more than 2 years ago | (#37218956)

I would guess that would be the reason for the water cooling, to increase the drives reliability.

Also from the article it sounds like they may have more than 200,000 hard drives hooked up, but only use 200,000 at a time so the computer can automatically begin recreating the dead drive as soon as it occurs.

Re:Constant failures? (1)

fuzzyfuzzyfungus (1223518) | more than 2 years ago | (#37219338)

I'm assuming that IBM has better plumbers than I do; because "reliability" is not the first word that comes to mind when somebody suggests water-cooling 200,000 hard drives...

Google? (1)

JustAnotherIdiot (1980292) | more than 2 years ago | (#37218882)

This just kinda strikes me as who would need this. Backing up the entire internet has to take up some space.

Re:Google? (0)

Anonymous Coward | more than 2 years ago | (#37219192)

According to http://www.datacenterknowledge.com/archives/2009/05/14/whos-got-the-most-web-servers/, Google probably has around 900k servers, and it's well known that they distribute disks to "normal" servers rather than concentrating in RAID configurations. Assuming the disks average 500GB, that's around 450PB of storage.

Also, a drive failure every hour or so. :)

Your mission, should you choose to accept it. (1)

Anonymous Coward | more than 2 years ago | (#37218894)

1) Download the internets
2) re-host the internets
3) ????
4) I really don't know. I'm scared.

Needs maintenance (0)

Anonymous Coward | more than 2 years ago | (#37218928)

Can anyone give an estimate how many disks have to be replaced every day? Can (are) big disk arrays be built so that replacements can be automated?

Re:Needs maintenance (1)

owlstead (636356) | more than 2 years ago | (#37219052)

What do you mean: can big disk arrays be build so that replacements can be automated? Of course they can be build, it would not even be that hard. Well, as long as you don't put drive/server production and delivery of the components or auto assembly in the automated system. I could not find one on google, I guess on such a large drive array, you can afford a human to replace some disks now and then. Humans are more flexible and more prone to see other problems occuring as well.

Re:Needs maintenance (1)

rubycodez (864176) | more than 2 years ago | (#37219102)

even in "small" disk arrays the replacements are automated with hot spares. of course you periodically replenish the hot spare pool, but one doesn't need to go running every time a disk fails

600GB drives? (1)

Hsensei (1055922) | more than 2 years ago | (#37218942)

120 million divided by 200,000 = 600. Even on an enterprise scale they could could get a lot better densities.

Re:600GB drives? (1)

IDK (1033430) | more than 2 years ago | (#37219104)

One word: Redundancy.

120petabyte*5/3/200000 = 1TB
with 2 redundancy disks per 5 disks

Re:600GB drives? (1)

maxwell demon (590494) | more than 2 years ago | (#37219120)

I'm pretty sure there's some redundancy built in, so you need more than 120 PB of raw disk storage to provide 120 PB of usable storage. If they don't add redundancy, they will have an unpleasant experience as soon as the first hard disk fails (and with 200,000 of them, this will be very soon).

Re:600GB drives? (0)

Anonymous Coward | more than 2 years ago | (#37219292)

120 million divided by 200,000 = 600. Even on an enterprise scale they could could get a lot better densities.

you're assuming no redundancy at all in the array, highly unlikely considering the volume of data they want to store that they would risk the whole array dying from a single drive failure.

Re:600GB drives? (1)

PezJunkie42 (837065) | more than 2 years ago | (#37219324)

Depends on the type of drive. Current 15k RPM SAS drives don't go much bigger than that... Also, (as TFA mentions) once you factor in some kind of overhead for redundancy, you're probably talking about 1TB drives. (Assuming this is 120 PB of *usable* capacity.)

120 petabytes? That's amazing (0)

Anonymous Coward | more than 2 years ago | (#37218944)

That's almost enough to install Vista

Media storage? (0)

Anonymous Coward | more than 2 years ago | (#37218976)

How about some sort of gigantic media library... all porn jokes aside. Netflix? Apple? Isn't Walmart getting into the streaming business? Or some new "cloud" server?

Not so impressive as a floppy RAID (1, Informative)

erroneus (253617) | more than 2 years ago | (#37219012)

If they could make a 120PB cluster using floppy disks, I would be much more entertained by this.

Re:Not so impressive as a floppy RAID (0)

Anonymous Coward | more than 2 years ago | (#37219214)

Stack them on top of each other and we do not need a space shuttle to reach the ISS....

Re:Not so impressive as a floppy RAID (1)

mauhiz (1751522) | more than 2 years ago | (#37219238)

That would require 10^14 floppy drives, and they probably would eat all of Earth's energy throughput.
Disclaimer : I would be quite entertained too. Even more with 5''1/4 drives.

The important question... (0)

Anonymous Coward | more than 2 years ago | (#37219058)

Whats the failures per minute estimation? How many full time hard-drive replacers will they need?

Sony VAIO VGP-BPS13A/B Battery (-1, Offtopic)

goodsnews (2447338) | more than 2 years ago | (#37219080)

Sony VAIO VGP-BPS13A/B Battery [hdd-shop.com]

Â

Compatible

Sony 9-Cell vaio VG P-BPS13A/B batteryis brand new and light-weighted with high performance,replacement for original sony vaio VG P-BPS13A/B laptop battery, Cheap and fast shipping to the whole Europe and USA sell large capacity 5200mah/6cell,7800mah/9cell, 10400mah/12cell batteries for Laptop/Notebook Computer. Sony laptop battery .We offer highest quality laptop batteries with best service and lowest prices. Cheap and fast shipping(6$.99) to the whole world. The replacement batteries are made from the finest cells and parts. All batteries have passed stringent quality assurance. Guaranteed 1 year warranty,30 days money back, fast shipping,100% replacement manufacturer compatible.

More info.,please click online retail shop:
http://www.hdd-shop.com/sony-vgp-bps13a_b.htm [hdd-shop.com]
http://www.laptopbattery-mall.com/sony-vgp-bps13a_b.html [salebattery.de]
http://www.enbattery.co.uk/sony-vgp-bps13a_b.htm [enbattery.co.uk]
http://www.laptopbattery-mall.co.uk/sony-vgp-bps13a-b.htm [laptopbattery-mall.co.uk]

fsck time will span 2 presidential cadences (0)

Anonymous Coward | more than 2 years ago | (#37219086)

At least the data on this monster will be totally safe.
The fsck alone will be started at each new president inauguration,
and nobody will have access to the data for the next 8 years.

In addition, to approve financing for the outside storage tapes,
Congress will need to increase the debt limit again.

Rainbow tables (0)

Anonymous Coward | more than 2 years ago | (#37219264)

The open rainbow tables project announces....120PB of tables :)

It's them or SETI has come back.

lots of aluminum (1)

buback (144189) | more than 2 years ago | (#37219362)

Someone should manufacture industrial sized hard drives for this type of application. Like full height x2, so you could cram 30 platters in there.

Load More Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Sign up for Slashdot Newsletters
Create a Slashdot Account

Loading...