Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

PetaBox: Big Storage in Small Boxes

timothy posted more than 9 years ago | from the always-impressive dept.

Data Storage 295

An anonymous reader writes "LinuxDevices.com is reporting that a Linux-based system comprising more than a petabyte of storage as been delivered to the Internet Archive, the non-profit organization that creates periodic snapshots of the Internet. The PetaBox products, made by Capricorn Technologies, are based on Via mini-ITX motherboards running Debian or Fedora Linux. The IA's PetaBox installation consists of about 16 racks housing 600 systems with 2,500 spinning drives, for a total capacity of roughly 1.5 petabytes, according to the article. Now to strap one of those puppies to my iPod!" The Internet Archive continues to astound.

cancel ×

295 comments

Sorry! There are no comments related to the filter you selected.

First Post (-1, Offtopic)

Anonymous Coward | more than 9 years ago | (#12878999)

w00t

speaking of boxes, (-1, Flamebait)

Anonymous Coward | more than 9 years ago | (#12879000)

the nifgger i have imprisoned in my room is getting restless. what should i do? is the box-holding paradigm finally worn out? what deos slashdot suggest for next generation nigger restraining devices?

Re:speaking of boxes, (-1, Flamebait)

Anonymous Coward | more than 9 years ago | (#12879017)

I think I can help, but can you just clarify what a nifgger is please?

Re:speaking of boxes, (-1, Troll)

Anonymous Coward | more than 9 years ago | (#12879152)

It's overripe, you waited too long. Mini black holes seem a promising technology to keep them in a good condition for a much longer time. However it's only in the experimental phase right now. I suggest shipping the nigger right now while you still can get some cash for it.

Good to see. (5, Funny)

Anonymous Coward | more than 9 years ago | (#12879004)

For all the jokes out there about people 'downloading the internet' it's good to know someone is actually doing it.

Re:Good to see. (4, Funny)

FireballX301 (766274) | more than 9 years ago | (#12879029)

Who the heck cares about the rest of the internet, can this thing hold all the pr0n?

Re:Good to see. (2, Insightful)

bigberk (547360) | more than 9 years ago | (#12879145)

people from my univ might recognize this... there was a famous guy in our engineering faculty who, back in the 90s, had written some kind of an automated porn downloading app. It was running on their UNIX servers but he left it running unattended. apparently he had no quota because within a few days he had filled up the entire system storage with porn, several hundreds of megabytes worth which was very substantial back then.

I had a similar experience, I was playing around on irc back when we were swapping video files through DCC. apparently some downloading got out of hand and paged the admin, who contacted me and politely pointed out that I had a process running wild and filling /tmp... oops, must be an experiment gone wrong I had to say

Re:Good to see. (1)

-brazil- (111867) | more than 9 years ago | (#12879305)

Reminds me of that little program I had to write for a class whose job it was to print all the permutations of the numbers from 1 to n (n being a parameter). Happy that I'd found a very efficient solution I tested it with n=3, n=4, and all was fine. Then, for kicks, I started it with n=20. I knew that it would take a while, so I did a back-of-an-envelope calculation of how much RAM it might need to assemble the result... and came up with something like 20GB. Which was more than all the HDs of the 50 computers in the pool together had at that time. Oops.

Re:Good to see. (5, Funny)

Anonymous Coward | more than 9 years ago | (#12879080)

But does it run Lin... um.

How about a Beo.. oh damn

Re:Good to see. (0, Redundant)

BlackMesaLabs (893043) | more than 9 years ago | (#12879097)

In soviet russia, internet downloads YOU!

Re:Good to see. (1)

Panaphonix (853996) | more than 9 years ago | (#12879270)

In America, Google does it for no money!

Re:Good to see. (1)

slarshdot (211836) | more than 9 years ago | (#12879102)

I think this can be compared to fat chicks.

They maybe tight, but its still a large unit.

(not that i would know hehe)

Storage galore! (2, Funny)

Bananatree3 (872975) | more than 9 years ago | (#12879006)

If, If only I could get a hold of one of those, I could Rival GOOGLE! Yes! I can become the next internet craze with my super, duper search engine crawling the web! I have the space, now I just need a connection in the middle of Alaska fast enough to rival google...

You hear about the Petabox? (5, Funny)

Dancin_Santa (265275) | more than 9 years ago | (#12879009)

Michael Jackson was heard breathing a sigh of relief. He thought it was where they sent Petafiles.

R. Kelly was scrambling to find the company's phone number.

Ouch (1)

Dancin_Santa (265275) | more than 9 years ago | (#12879061)

Didn't realize the moderators were Michael Jackson supporters [milkandcookies.com] .

Which reminds me of why Michael Jackson likes twenty eight year-olds. Because there's twenty of them.

Re:You hear about the Petabox? (4, Funny)

pyrrhonist (701154) | more than 9 years ago | (#12879104)

Michael Jackson was heard breathing a sigh of relief. He thought it was where they sent Petafiles.

Hmmm, this seems almost familiar [slashdot.org] ...

Let's analyze this situation:

  • The time on our posts is exactly the same.
  • There's a difference of only 3 in the post id values.
  • I was unable to foresee the R. Kelly connection.
This can only mean one thing... You are the Kwisatz Haderach!

GET OUT OF MY MIND!!!

Re:You hear about the Petabox? (1)

pyrrhonist (701154) | more than 9 years ago | (#12879171)

The really funny part is that my post got modded redundant.

Re:You hear about the Petabox? (0, Redundant)

kyrre (197103) | more than 9 years ago | (#12879302)

The really funny part is that my post got modded redundant.

What is funny about that? It is redundant. It had already been said. The idea behind moderation is not to reward witty types for their wittines, it is about filter out noise for the reader. If a joke is told two times one of them really is redundant. It would be noise.

Oh, this comment is also redudant. One can probably read all about this here in the Slashdot FAQ [slashdot.org]

If you don't agree with the redundant modifier you could always set redundant to be +/-0 points in stead of -1.

It's for PetaFiles! (1, Funny)

pyrrhonist (701154) | more than 9 years ago | (#12879012)

LinuxDevices.com is reporting that a Linux-based system comprising more than a petabyte of storage as been delivered to the Internet Archive

Wait, is a petabyte sized file called a petafile?

If so, then this storage must be for all the recent Michael Jackson coverage.

Re:It's for PetaFiles! (0)

Anonymous Coward | more than 9 years ago | (#12879144)

Err, do you call a gigabyte file a gigafile? Or a megabyte file a megafile? Didn't think so.

Mandatory (2, Funny)

huntse (782732) | more than 9 years ago | (#12879016)

Imagine a beowulf cluster of these babies... ...oh, it already is one. nevermind, I'll get my coat.

archive.org (4, Interesting)

Nasarius (593729) | more than 9 years ago | (#12879020)

Internet Archive, the non-profit organization that creates periodic snapshots of the Internet.

They do a lot more than that! I've just been downloading some Warren Zevon [archive.org] shows from their Live Music Archive.

Re:archive.org (1)

BrianGa (536442) | more than 9 years ago | (#12879044)

Don't skip over the 2,851 Grateful Dead shows [archive.org] !

copyright (5, Interesting)

DualG5GUNZ (762655) | more than 9 years ago | (#12879023)

Not to sound like an advocate or anything... But how is it that the Internet Archives project resists claims of copyright infringement and the likes when they have copies of entire websites in their records?

Re:copyright (3, Informative)

seifried (12921) | more than 9 years ago | (#12879050)

You can exclude them from your website using the robots.txt:

User-agent: ia_archiver
Disallow: /

For example if you go to archive.org and plug my site into the wayback machine:

We're sorry, access to http://www.seifried.org/ [seifried.org] has been blocked by the site owner via robots.txt.

and you can also request them to expunge your site from the archive.

They go out of their way to make it easy to prevent your site being copied (more so then most search engines).

Re:copyright (1)

Baricom (763970) | more than 9 years ago | (#12879070)

I feel they make it too easy. IA blocks not only the present version of the site, but also every page of every past version.

I can't get older pages of a web site I operated several years ago because a robots.txt file was inadvertently added that blocks it. At the time, I didn't know about the Internet Archive, and as a result potentially years of this site's history is gone.

Re:copyright (0)

Anonymous Coward | more than 9 years ago | (#12879081)

Unfortunately I'm sure that legal pressures and the cost of legal defense will almost guarantee that the Archive never gets much more aggressive with its archiving than it is now.

Re:copyright (0)

Anonymous Coward | more than 9 years ago | (#12879179)

Not good enough. They need my express permission to use my copyrighted material, not a note from me to make it not ok.

Re:copyright (1, Funny)

Anonymous Coward | more than 9 years ago | (#12879239)

I accidently clicked your site and somehow it copied itself into my computers memory. Please don't sue me.

Re:copyright (-1, Flamebait)

masklinn (823351) | more than 9 years ago | (#12879247)

How about a nice big
fuck you
Wrapped in shiny ribbons?

Re:copyright (2, Interesting)

IntergalacticWalrus (720648) | more than 9 years ago | (#12879249)

If they actually did that, the archive would be worthless.

Besides, the IA only archives HTML pages, and small images in them, nothing else. If you consider your HTML content to be unproductible copyrighted material, might I ask why the hell is it publically accessible on the Web in the first place?

Re:copyright (2, Interesting)

spacefight (577141) | more than 9 years ago | (#12879254)

The Internat Archive is fucking up big time with their robots.txt stuff. If you exclude a site from beeing shown, it doesn't show anything, correct. But: If this site goes offline, the archived pages of that former site are all available, not blocked at all.

Re:copyright (1)

Leroy_Brown242 (683141) | more than 9 years ago | (#12879106)

"Historical Purposes"

Petabox? (4, Funny)

eclectro (227083) | more than 9 years ago | (#12879025)


Isn't that what naked girls climb out of to protest fur coats?

Thank you, I'll be here all week.

Mega Systems (0)

Anonymous Coward | more than 9 years ago | (#12879026)

Everyone, please feel free to chime in so I don't feel like such a goon for saying this, but damn, these big systems are a reason to live, aren't they? I mean, I saw the rack of red on the link, and it just makes me drool. It's not so much the storage, but the logistics of the thing. I mean, I get the same feeling when I watch Jurassic Park, and whats-her-name is pumping up the electrical charger to get the main switch going. Charging a switch? Welcome to flavour country.

Anywho, just wanted to expose my hard-on for hardware, my raison d'etre. Someone, give me a job a datacenter, or a power plant. I beg you.

Re:Mega Systems (2, Funny)

name773 (696972) | more than 9 years ago | (#12879203)

large bundles of neatly organized cable... ohh man.

Re:Mega Systems (1)

poor_boi (548340) | more than 9 years ago | (#12879294)

You're a twisted, demented man. You need to have your petaphiliac tendencies treated by a squad of trained monkeys wielding high voltage cattle prods powered by UPS.

Okay, I admit it. (1)

halcyon1234 (834388) | more than 9 years ago | (#12879028)

Forget the jokes. That setup kicks the ass out of any beowulf cluster. Heh.

Redundancy (0)

Anonymous Coward | more than 9 years ago | (#12879032)

Haven't read TFA yet, but what are they doing with regard to redundancy? With that many drives whirring around more than a couple are likely to go bad over time. Do they have a set of dedicated redundant drives to serve as backups?

Re:Redundancy (1)

Eric604 (798298) | more than 9 years ago | (#12879084)

TFA only says it does NOT use RAID but JBOD (just a bunch of disks).

Umm.. (1)

coldeeze (832166) | more than 9 years ago | (#12879035)

What kind of power bill are those guys getting and is their service really worth it?

Only 1.5 petabytes? (0)

Anonymous Coward | more than 9 years ago | (#12879037)

1.5 petabytes? That hardly enough to hold a decent porn collection.

IPod? (2, Funny)

NegativeOneUserID (812728) | more than 9 years ago | (#12879043)

Right, sure, like anyone believes that you want that much storage for music. You just want to use it for pr0n.

Re:IPod? (2, Funny)

BlackMesaLabs (893043) | more than 9 years ago | (#12879231)

Decide to use it for "Pr0n" and you're gonna NEED a beowulf cluster of them...

great usage. (4, Informative)

Bananatree3 (872975) | more than 9 years ago | (#12879045)

Seriously, I think archive.org deservese sutch a storage system. I have very often wanted to go back to view an archive of a website a while ago, but the cache on Google was from yesterday. It also gives multiple archives of the website based on day which can be quite handy, especially for news related sites. I think they quite well deserve it.

Downloading Kazaa (-1, Offtopic)

CriminalNerd (882826) | more than 9 years ago | (#12879048)

I don't think the group that takes periodic "pictures" of the Internet ever tried taking a "picture" of the Kazaa network. XD Hmm...There must a TON of adware and spyware on the drives too. lol "OMFG!!! I got 1,024,576 pop-ups!" "Quick! Take a picture of the Internet with them! :O"

Re:Downloading Kazaa (3, Informative)

HyperChicken (794660) | more than 9 years ago | (#12879062)

Not "periodic", continuous. Own a website? Check your logs for the user-agent "ia_archive".

And if Linux had a working GFS... (0)

Anonymous Coward | more than 9 years ago | (#12879063)

this would actually be useful!

Sorry, I'm just bitter after almost a decade of Sistina's promises to get their global file system working 100%. We were one of their victims, err, customers.

Modded case to come (1)

icecow (764255) | more than 9 years ago | (#12879073)

I give 72 hours tops before one of those fettish case modders makes a 'peta' case. Oh shit, I was thinking chia.

'small box' (5, Funny)

MonoSynth (323007) | more than 9 years ago | (#12879078)

So the inventor of the microprocessor dies and suddenly the definition of 'small box' for computer components is again reduced too 'fits in a big room'....

Puppies (3, Funny)

Sinner (3398) | more than 9 years ago | (#12879082)

An anonymous reader writes "LinuxDevices.com is ... according to the article. Now to strap one of those puppies to my iPod!"
I'm sorry, baby dogs? That's so last week. I've got an arctic seal pup strapped to my iPod. You should see the looks I get on the subway. Bling, baby, Bling.

Re:Puppies (1)

Leroy_Brown242 (683141) | more than 9 years ago | (#12879096)

bastard. I just coughed up some rice.

hehehe

Re:Puppies (2, Funny)

Sinner (3398) | more than 9 years ago | (#12879100)

You gonna eat that?

maybe i'll be quoted in 15 years.. (4, Funny)

qda (678333) | more than 9 years ago | (#12879087)

"nobody needs more than a perabyte of storage"

Re:maybe i'll be quoted in 15 years.. (2, Funny)

Anonymous Coward | more than 9 years ago | (#12879117)

Well, I'd hope somewhere along the line somebody will fix that typo for you. Otherwise, you'll forever be quoted as "nobody needs more than a perabyte [sic] of storage."

No RAID?! (1)

kf6auf (719514) | more than 9 years ago | (#12879088)

I am more than slightly concerned about the lack of RAID in the system. They said that they had some sort of painful experience with RAID 5 not scaling to petabyte-size storage and therefore recommend JBOD. I wouldn't expect RAID 5 to scale to petabyte-size storage because of the parity all being done at once and in the same place but there has to be a way around this that still allows for redundancy. Take a RAID 50, with a lot of RAID 5 arrays in the hundred-terabyte range and a RAID 0 array striping over them, still provide redundancy with only slightly greater inefficiency and dividing up the parity process to the smaller RAID 5 arrays. Also, $2/GB seems kind of high to me, given that hard drive prices are down to $0.33/GB and you're putting 4 in each mass produced box.

what about redundancy? (0)

Anonymous Coward | more than 9 years ago | (#12879089)

So if the storage is JBOD then what about redunancy when a drive fails?

Electricity $$$ ? (3, Funny)

kasnol (210803) | more than 9 years ago | (#12879092)

Wow - have they calculate how much is the running cost per day ? I might just stay with my iPod instead for the time being~
Haha~

Re:Electricity $$$ ? (2, Informative)

TheFlyingGoat (161967) | more than 9 years ago | (#12879120)

50kW at 10 cents per kilowatt hour = $120/day.

I doubt it draws at a constant 50kW, though. It's probably an average (was given in TFA).

My math might be completely wrong, given I don't have a clue how to calculate kilowatt hours. Is it just kW * hours_used_daily? :)

Re:Electricity $$$ ? (1)

masklinn (823351) | more than 9 years ago | (#12879261)

I doubt it draws at a constant 50kW, though. It's probably an average (was given in TFA).
I think you meant "peak", because there isn't much difference as far as price goes between constant 50kWh and average 50kWh

And yes, to compute energy consumption (in kWh) you merely multiply the power drawn from the grid (in kW) by the consumption timeframe (in hours).

Therefore if a unit uses 50kW, it consumes 50KWh worth of energy.

1.5 Petabytes? (3, Interesting)

TheFlyingGoat (161967) | more than 9 years ago | (#12879093)

Where can you purchase 600GB drives these days? (1.5PB / 2500 drives)

The math doesn't work when you multiply the number of systems out either: 600 systems * 1.6TB/system = 960TB. That's just under a petabyte, or am I missing something?

Also, if you've got those in a RAID5 setup, you're 'only' talking about approx 800TB of usable space. That's far less than the 1.5 petabytes claimed.

800TB is a lot of space, but there must be a cheaper/easier way than purchasing 600 systems to do it.

Re:1.5 Petabytes? (1)

AaronLawrence (600990) | more than 9 years ago | (#12879114)

You're missing something: 4 drives in each system. ->150GB.

Re:1.5 Petabytes? (2, Informative)

TheFlyingGoat (161967) | more than 9 years ago | (#12879130)

No. They say 2500 drives (actually 2400 since it's 4 per system in 600 systems), which comes out to 600GB per drive for 1.5PB.

They don't like RAID (4, Interesting)

billstewart (78916) | more than 9 years ago | (#12879324)

I was a bit puzzled by that also - the article said the things come in racks of 40 or 64TB, and 16 racks times 64TB is about 1PB, not 1.5.

Also, the article says they don't like RAID, due to bad experiences with RAID5, and the system is configured as JBOD (Just a Bunch Of Disks). It doesn't say why, or what users should do to get equivalent protection. My guess is that depending on RAID within a box means you're still vulnerable if the box's CPU or disk controller decides to scribble the disks, or the power supply decides to catch fire or short out and deliver 240VAC on the +5V line or whatever. So if you want a RAID-like set of redundancy, set up your applications or file system mounting or something to calculate the protection disk in software and hand it off to another 1U box for storage.

The overhead of the motherboards here is not that high - they're about $150-200, and support 4 disks that probably cost $200-300 each, so they're only about 20% of the cost, which is not bad. The article didn't say they're using SATA, and it sounded like it's some IDE variant instead, but if you're only using 100 Mbps Ethernet to connect to the box and not the optional GigE, it's not the bottleneck anyway. If you wanted an alternative design, you could probably do something with a couple of 4-way SATA controllers per CPU, with a lot of disks stacked vertically in a 3-4U box looking like an X-serve or something. But that wouldn't necessarily have much of an advantage.

terrifying (0)

Anonymous Coward | more than 9 years ago | (#12879098)

1. According to the specs this thing is 600 1.6TB JBOD array's.. They must handle redundancy on top of the storage mechanism, but they don't mention it anywhere..

2. The blurb says that they have roughly 1.5PB of storage space but by my calculations it comes out to roughly 1 PetaByte (40 servers per rack, 15 racks of systems = 600 servers ( 4 * 400GB per server ) = 1 PB

No redundancy? WTF? (2, Informative)

melted (227442) | more than 9 years ago | (#12879111)

I've actually read TFA. They recommend JBOD configurations to their clients. One drive goes titsup and you've lost 400GB of data. Do they at least offer some kind of mirroring/redundancy solution to back the data up to another array?

Re:No redundancy? WTF? (1)

grimJester (890090) | more than 9 years ago | (#12879196)

I'd hate to be the guy who has to burn it all to cd-r when management realizes thay need backups.

Re:No redundancy? WTF? (4, Informative)

Depili (749436) | more than 9 years ago | (#12879235)

Acording to the archive.org (http://www.archive.org/web/petabox.php [archive.org] ) they indeed have some redundancy, but not raid. They are operating each system as a separete node, and mirroring nodes. The above link also sheds light on other questions regarding TFA

A Great Historical Tool (5, Insightful)

simrook (548769) | more than 9 years ago | (#12879113)

The Internet represents a great historical tool. Case and point is what happened on 9/11. Being able to go back and see the progression, paranoia, patrotism, and early iraq/afgahanistan/binladen/hussien posts and opinions on various new sites is amazing. cnn, fox, the ny times, all are archived several times on 9/11 on archive.org.

I for one think that archive.org should turn into some UN effort, with a mission to chronical and store daily/timely snapshots of the internet and the culture at the time, preserving it for future generations. What a tool for future historians!

The ability to look at a large representation of socity at one single critical moment in time, and being able to have first hand sources for all that information is something that can truely change the way history is recorded (and not in the bad newspeak ingsoc way either). Infact, a wholeistic archive of what happens day-to-day, in an easily accessible format, might well help written history to be more representative of actual history (instead of, say the history Bush wants us to believe; that the Iraq war was for human right and not wmd's). I love Foucault.

The internet archive rocks... really hope this project continues full blast.

- Peace

Re:A Great Historical Tool (-1, Troll)

Anonymous Coward | more than 9 years ago | (#12879156)

There is no United Nations. There is an international community that occasionally can be led by the only real power left in the world, and that is the United States, when it suits our interest, and when we can get others to go along.

If the UN Secretariat building in New York lost 10 stories, it wouldn't make a bit of difference.

Re:A Great Historical Tool (1)

Yjam (893817) | more than 9 years ago | (#12879314)

"by the only real power left in the world, and that is the United States, when it suits our interest, and when we can get others to go along."
The only real power in the world, well, maybe are you right but you should have a look at what/where/who is Hitachi [hitachi.com] (HDD in the PB system are Hitachi ones). And maybe you'll then see that the actual power nowodays is in Asia. Not anymore in North America or in Old Europe.

Re:A Great Historical Tool (0)

Anonymous Coward | more than 9 years ago | (#12879160)

Which is why the 21st century will be 'lost in history' thanks to DRM, Patents and CopyRights (or wrongs).

Re:A Great Historical Tool (2, Funny)

venicebeach (702856) | more than 9 years ago | (#12879183)


Yes, otherwise such cultural gems as goatse.cx would be lost into the void forever...

Re:A Great Historical Tool (2, Insightful)

Anonymous Coward | more than 9 years ago | (#12879248)

The 911 targets where chosen in a way everyone would notice. Not exactly amazing that it's well reported on, it would have been if it happened 20 years ago. But that was just a single attack. If you look at the much bigger recent events that you mention, like the war on Iraq, you'll see that there really is hardly any detailed reporting. You have a lot of propaganda by the attackers, some propaganda from the Iraqi government, and some reports by angry people getting in the middle. You still have a completely unclear view of what happened.

We already had people writing diaries and making lots of pictures in WWII. The improvement isn't that great.

The MPAA and RIAA (3, Interesting)

PrivateDonut (802017) | more than 9 years ago | (#12879115)

are going to make a killing of the IA when they have finished, it isn't like they haven't made enough money off others as it is, so they may let this one slide in the name of conserving data. On that note, is the IA downloading EVERYTHING or selectively downloading to prevent such issues as copyright infringment?

Re:The MPAA and RIAA (0)

Anonymous Coward | more than 9 years ago | (#12879154)

From the IA FAQ [archive.org]

"How can I remove my site's pages from the Wayback Machine? The Internet Archive is not interested in preserving or offering access to Web sites or other Internet documents of persons who do not want their materials in the collection. By placing a simple robots.txt file on your Web server, you can exclude your site from being crawled as well as exclude any historical pages from the Wayback Machine. Internet Archive uses the exclusion policy intended for use by both academic and non-academic digital repositories and archivists. See our exclusion policy. You can find exclusion directions at exclude.php. If you cannot place the robots.txt file, opt not to, or have further questions, email us at info at archive dot org."

Fedora... (0, Redundant)

YourMotherCalled (888364) | more than 9 years ago | (#12879116)

<funny comment about how fedora always gets dumped on but is obviously not all that bad.>

Wayback and Slashdot (4, Funny)

mcrbids (148650) | more than 9 years ago | (#12879163)

Go ahead. Try Slashdot in the wayback machine.

Slashdot has looked virtually identical since 1998!

Re:Wayback and Slashdot (1)

hostyle (773991) | more than 9 years ago | (#12879188)

How strange for a linux based community - usually famed for fixing things that aren't broken.

It doesn't matter. (0)

Anonymous Coward | more than 9 years ago | (#12879167)

Once peak oil arrives there will be a total economic collapse. Companies like Capricorn will go bankrupt as Americans just try to save enough for food.

nothing new... (0)

Anonymous Coward | more than 9 years ago | (#12879172)

I remember seeing this box at the Univ. of San Francisco Flashmobcomputing event. Brewster Kahle (founder: IA) was showing it off. I saw it a few weeks later running at IA's Presidio office. This was a while ago...

small box (0)

Anonymous Coward | more than 9 years ago | (#12879194)

Since when can 16 racks be described as small?

Okay great achievement and all, but the title is simply not pedantic enough....

Just imagine... (0, Redundant)

M3rk1n_Muffl3y (833866) | more than 9 years ago | (#12879213)

...a beowolf cluster of these.

Sorry, it had to be done.

article not clear (1)

planckscale (579258) | more than 9 years ago | (#12879214)

so are these machines (individual pc's) not hot swappable? Taking down the entire machine because a node goes down seems extreame. I would think that VIA isn't pushing out enough of these chips and M-10000's to get this thing together. $2/GB is cheap I wonder what filing system it uses?

Re:article not clear (1)

imsabbel (611519) | more than 9 years ago | (#12879263)

Its a cluster, so of course you can take one node out.
But its only the raw meat. In order to really use it, you need a storage solution taking care of things like redundency, node restore, ect.

What's wrong with hot swap and RAID 5? (1)

fgrieu (596228) | more than 9 years ago | (#12879228)

Quoting http://linuxdevices.com/news/NS2659179152.html [linuxdevices.com]

"We experimented with hot-swap, but found it caused as many problems as it solved. It actually induced failures, so we backed away."
(we) "tried then backed away from RAID, instead opting to recommend JBOD"
"We had a painful experience with RAID 5, which does not scale well to petabyte-level storage."

Why the hell are the reports of these guys so far from what the accepted industry practice is, according to IT magazines?

Re:What's wrong with hot swap and RAID 5? (1)

Lussarn (105276) | more than 9 years ago | (#12879250)

Maybe because as they say RAID 5 (Or at least not the implementation they where using) didn't scale well to petabyte-levels. They could of course have done many smaller RAID 5 arrays and still keep redundancy. Don't know why they didn't.

Re:What's wrong with hot swap and RAID 5? (2, Interesting)

imsabbel (611519) | more than 9 years ago | (#12879273)

Because you are comparing apples to oranges.

They dont use hot swap and raid5 for the same reason google doesnt run on mainframes:
Its just cheaper to let a higher level logic take care of that stuff instead of strapping redundancy on every node...
Why hot swap if it isnt needed? The rest of the node will be mirrored somewhere else, so for the cost of fitting out everything with HS bays you could get 5 or 10% more nodes...
Same for raid5: good high performance Raid5 controllers would increase the system cost by 50% or something. And then its not less expensive than just mirroring nodes.

Re:What's wrong with hot swap and RAID 5? (2, Interesting)

tim_uk (123339) | more than 9 years ago | (#12879275)

Why the hell are the reports of these guys so far from what the accepted industry practice is, according to IT magazines?

GOK, I have 3Pb of storage syncronised across two data centres here, all in 7+1 RAID5. Mostly self healing too, if a drive pops, then a spare drive in the same array builds itself into that stripe set, enabling hot replacement of the dead drive.

I would love to know what their "painful experience" was!

Using JBOD for this seems a tad courageous, to say the least.

And then, of course, there's backup...

Courageous? Try insane. (1)

Otto (17870) | more than 9 years ago | (#12879337)

I can't think of a single reason to use a JBOD setup when you could just as easily use RAID 0.

If you don't need redundancy, great, fine, you can be redundant elsewhere. I'm down with that. But RAID 0 is so easy to implement as opposed to a JBOD setup and works so much better that there's essentially no reason to ever use JBOD except pure laziness.

I mean, with either one, if you lose a drive, you lose the array, but at least with RAID 0 you get the benefits of striping in both read and write operations, basically doubling your throughput speed.

Re:What's wrong with hot swap and RAID 5? (1)

masklinn (823351) | more than 9 years ago | (#12879288)

Why the hell are the reports of these guys so far from what the accepted industry practice is, according to IT magazines?
Different needs have different solutions. IA probably doesn't need perfect 24/7 uptime.

They don't have "industry constraints", therefore don't need "industry practices"

What's in a name? (1)

MadCow42 (243108) | more than 9 years ago | (#12879298)

A friend of mine used to work for Sony... he swears this is a true story:

Sony had a petabyte tape backup system they wanted to sell into North America... called the "Peta-file". Thankfully, Sony NA managed to have the name changed prior to it's introduction here.

So, PetaBox is slightly better... slightly. :)

MadCow.

tried that... (0)

Anonymous Coward | more than 9 years ago | (#12879301)

Built a VIA based storage cluster as a test some time back. Surprised that they have decided to go into production. 2 harddrives on an IDE channel is not a good idea, VIA boards are not highly reliable, 100Mb ethernet is just too slow if you want to copy the contents of one machine.

Also they havent really worked on the software side - its just a bunch of machines you have to rsync to, which really gets to be a pain to manage when you have that many.

And in time for morning Slashdotting, too! (1)

Armadni General (869957) | more than 9 years ago | (#12879322)

Capricorn Technologies says it has completed delivery of more than a petabyte of storage to the Internet Archive, a non-profit organization based in San Francisco that creates periodic snapshots of the Internet. Capricorn's PetaBox products are based on Via mini-ITX boards running Debian or Fedora Linux, and deliver the lowest cost-per-GB and cost-of-ownership available, the company claims.

Capricorn started as a project within the Internet Archive (IA) to develop inexpensive storage devices based on Linux and commodity PC components. The project was spun out in June of 2004, resulting in the formation of Capricorn Technologies. The company has since supplied its PetaBox products to a number of universities, research centers, libraries, and national archives, both within the US and overseas, according to CEO C.R. Saikley. The IA remains Capricorn's largest customer, however, Saikley says.

The IA is an online digital library with very large collections of audio, video, texts, web sites, and software. For example, it claims to host footage of more than 20,000 live concerts, and snapshots of the Internet dating back to 1996, accessible through the well-known Wayback Machine, which currently hosts over 40 billion web pages.

The IA's PetaBox installation comprises about 16 racks housing 600 systems with 2,500 spinning drives, for a total capacity of roughly 1.5 petabytes. Despite its large size, the IA's PetaBox installation draws only about 50kW of power, Saikley says, and is maintained by one full- and one half-time person who spend a disproportionate amount of time working on older systems. "We've improved reliability considerably," Saikley claims.

The IA systems boot Debian or Fedora Linux from a central PXE boot server, and are remotely monitored using nagios. "The beauty of nagios is that it is so readily extensible," says Saikley. "If the register exists on the board, nagios can figure out how to read it. We typically provide hard disk temperatures, cpu temperatures, ping response, capacity utilization, that sort of thing."

The PetaBox can also be managed by Linux cluster management software, according to Saikley.

The PetaBox

Capricorn claims that its PetaBox storage devices provide the lowest ownership cost and cost-per-GB available. The company offers 40- and 64-terabyte models comprised of racks with 40 1U systems. The 1U systems are available in 1- and 1.6-terabyte models that are essentially the same but for hard-drive capacity. Both systems run Debian or Fedora Linux on Via mini-ITX motherboards.

The PetaBox is based on Via mini-ITX motherboards

Each 1U system includes a Via M-10000 mini-ITX board with a 1GHz Via C3 processor and 512MB of RAM, expandable to 1GB. Each includes four Hitachi ATA hard drives with 8MB caches and a claimed 8.5ms of typical latency.

Saikley says Capricorn did extensive testing to qualify hard drives for capacity, reliability, and cost, finally choosing Hitachi. "Although Hitachi does not offer an 'enterprise' or '24x7' SATA drive, our testing found their drives to be as reliable as anything out there, enterprise distinction or not," Saikley said.

The 1U PetaBox units (shown stacked in a rack, on the right) include all I/O on the front panel, reducing the need to access the back panel while maximizing its cooling capacity. Drives are housed in EZ-Latch bays that can be easily changed after the 1U unit is removed from the rack and had its cover removed. "We experimented with hot-swap, but found it caused as many problems as it solved. It actually induced failures, so we backed away. But you still have to make it easy to replace disks," Saikley said.

Similarly, Saikley says Capricorn tried then backed away from RAID (redundant arrays of inexpensive disks), instead opting to recommend JBOD (just a bunch of disks) configurations to most of its clients. "We had a painful experience with RAID 5, which does not scale well to petabyte-level storage," Saikley notes.

PetaBox options include a 16 x 2 LCD display and gigabit Ethernet (10/100 is standard). The PetaBox is configured by default to boot from a USB key, then from a PXE boot server, and finally from the local hard drive. However, boot order can easily be changed in the BIOS.

Each 1.6-terabyte 1U system draws 80 Watts of power (typical), or about 50 Watts per terabyte, according to Capricorn. Each measures 17.25 x 18 x 1.72 inches (43.8 x 45.7 x 4.4 cm), and weighs 18 lbs, 12 oz (8.5 kg).

According to Saikley, Capricorn is currently positioning itself for increased production levels, following recent improvements to its manufacturing process. "We have been constantly improving the efficiency and effectiveness of our manufacturing processes. By positioning ourselves for increased production levels, we are better able to pursue our relentless commitment to driving the cost of storage down."

Availability

The PetaBox is available now, priced at approximately $2/GB, in 40- and 64-terabyte capacities. Further details are on the company's website.

NAS or SAN or ??? (1)

joib (70841) | more than 9 years ago | (#12879328)

I read the article, and the website of the company, but I couldn't find out how you're supposed to access all this data? It's hardly practical that every node exports it's own NFS, is it? Is it supposed to use some kind of cluster file system such as (Open)GFS?

Or is the user expected to do some kind of in-house thingy, like google or (presumably) the internet archive?

I read and I thought... (0, Offtopic)

manojar (875389) | more than 9 years ago | (#12879343)

I saw the topic and I thought what the hell those animal guys have to do with slashdot...?
Load More Comments
Slashdot Login

Need an Account?

Forgot your password?

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>