Beta

Slashdot: News for Nerds

×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Internet Archive Gets 4.5PB Data Center Upgrade

timothy posted more than 5 years ago | from the shipping-containers-are-rad dept.

Data Storage 235

Lucas123 writes "The Internet Archive, the non-profit organization that scrapes the Web every two months in order to archive web page images, just cut the ribbon on a new 4.5 petabyte data center housed in a metal shipping container that sits outside. The data center supports the Wayback Machine, the Web site that offers the public a view of the 151 billion Web page images collected since 1997. The new data center houses 63 Sun Fire servers, each with 48 1TB hard drives running in parallel to support both the web crawling application and the 200,000 visitors to the site each day."

cancel ×

235 comments

Where do they store 4.5TB off site (5, Interesting)

wjh31 (1372867) | more than 5 years ago | (#27336227)

one would assume that something like this does regular off-site back-ups, which must add up to a hell of a-lot, could someone with experiance in such matters shed a little insight into the logistics of backing up such a vast system

Re:Where do they store 4.5TB off site (3, Informative)

fuzzyfuzzyfungus (1223518) | more than 5 years ago | (#27336293)

TFA indicates that they have a mirror at the library of Alexandria. Unless things have changed since last I read about them, the mirroring is pretty much it. The Internet Archive does very impressive work; but they don't have that much money. No Real Big Serious Enterprise tape silos here.

Re:Where do they store 4.5TB off site (3, Insightful)

medelliadegray (705137) | more than 5 years ago | (#27336791)

i find it impressive they have all that hardware for a mere 200k users a day.

Re:Where do they store 4.5TB off site (2, Informative)

Anonymous Coward | more than 5 years ago | (#27336811)

In Brewster Kahle's December 2007 TED talk he mentions a third mirror in the Netherlands.
http://www.ted.com/index.php/talks/brewster_kahle_builds_a_free_digital_library.html [ted.com]

As he puts it, the Archive is mirrored on 'a fault line, a flood plain, and in the Middle East'.

Funny thing is I can't find another reference to the Netherlands mirror. The Bibliotheca Alexandrina site mentions a plan to eventually have four sites (California, Alexandria, Europe, and Asia), but that's it. Anyone know what happened with the Netherlands site?

Re:Where do they store 4.5TB off site (5, Funny)

LiquidCoooled (634315) | more than 5 years ago | (#27336301)

one would assume that something like this does regular off-site back-ups, which must add up to a hell of a-lot, could someone with experiance in such matters shed a little insight into the logistics of backing up such a vast system

floppy disks.
lots of floppy disks.

Re:Where do they store 4.5TB off site (2, Funny)

Bearhouse (1034238) | more than 5 years ago | (#27336405)

Not reliable enough.
I suggest that this important resource be backed up to punched cards.
This would also enable handy comparisons in units that us oldies understand, such as ELOCs
(Equivalent Library of Congress).
I'd calculate it myself, but seem to have mislaid my slide rule...

Re:Where do they store 4.5TB off site (3, Funny)

nextekcarl (1402899) | more than 5 years ago | (#27336631)

I'd suggest also using stone slabs. Water can do serious damage to paper, and don't get me started on fire hazards. Good old Stone Slabs resist both of those really well. I'm not sure what the write speed is, however, so you'll probably need to hire many stonecutters to work in parallel.

Re:Where do they store 4.5TB off site (1)

jd (1658) | more than 5 years ago | (#27336775)

When you get right down to it, any hard-coded data on silicon is just data on a stone slab. Since you can compile SystemC into a hardware spec, you can write stone slabs as fast as you can generate C.

Re:Where do they store 4.5TB off site (2, Funny)

houghi (78078) | more than 5 years ago | (#27336619)

If they need much more, I have some AOL disks laying around that they can use.

Re:Where do they store 4.5TB off site (4, Funny)

MichaelSmith (789609) | more than 5 years ago | (#27336335)

Its like the two USB hard disks I use for backups. Pick up the container and swap it with the container from secure storage,

Re:Where do they store 4.5TB off site (4, Interesting)

DigiShaman (671371) | more than 5 years ago | (#27336365)

Umm, how many forklifts and 18 wheelers does it take to swap out 4.5 petabytes worth of data each day?

Re:Where do they store 4.5TB off site (1)

Wingman 5 (551897) | more than 5 years ago | (#27336695)

Jeeze how much bandwidth would you need to fill that thing in one day... runs to google.
(4.5 petabyte) / (1 day) = 54.6133333 GBps
According to List of device bandwidths [wikipedia.org] the closest things to filling it in one day are:
  • HyperTransport 3.1 (3.2 GHz, 32-pair) 409,600 Mbit/s 50 GB/s
  • PC3-16000 DDR3-SDRAM (triple channel) 480.4 Gbit/s 48.4 GB/s

Re:Where do they store 4.5TB off site (1)

wealthychef (584778) | more than 5 years ago | (#27336779)

Can you say, Parallelism?

Fool (0)

Anonymous Coward | more than 5 years ago | (#27336997)

The internets isn't like a truck! It's a series of tubes!

Re:Where do they store 4.5TB off site (4, Funny)

MrEricSir (398214) | more than 5 years ago | (#27336355)

It's simple, the backups are compressed -- they simply remove all those useless zeroes from the binary data.

Re:Where do they store 4.5TB off site (2, Funny)

TheGratefulNet (143330) | more than 5 years ago | (#27336689)

It's simple, the backups are compressed -- they simply remove all those useless zeroes from the binary data.

in music today, there is a so-called 'loudness war' and I think I've discovered what it is: they're removing the zeroes, thinking that 'all ones' will make the music even louder!

I wonder if its reversable? where do the zeroes go? can they be unzeroed? we should try to find them.

Re:Where do they store 4.5TB off site (1)

Sponge Bath (413667) | more than 5 years ago | (#27336709)

Followed by run length encoding of the remaining ones.

Re:Where do they store 4.5TB off site (1)

CannonballHead (842625) | more than 5 years ago | (#27336373)

[subject correction]
PB, not TB... hehe.

Re:Where do they store 4.5PB off site (1, Offtopic)

Chosen Reject (842143) | more than 5 years ago | (#27336887)

[subject correction]
PB, not TB... hehe.

Re:Where do they store 4.5TB off site (1)

clarkkent09 (1104833) | more than 5 years ago | (#27336393)

It's 4.5PB, which is a whole different thing, and TFA says it's mirrored at the library of Alexandria, Egypt. I guess that counts as off-site :)

Re:Where do they store 4.5TB off site (5, Funny)

commodore64_love (1445365) | more than 5 years ago | (#27336447)

They'd better have it backed-up. Last time the Alexandria library burned-down, we lost about one thousands years of collected information from ancient Greece and Rome. Ooopsie.

library of Alexandria, Egypt (0)

Anonymous Coward | more than 5 years ago | (#27336477)

Didnt that burn down a few thousand years ago?

They store 4.5PB in Egypt! (4, Funny)

CannonballHead (842625) | more than 5 years ago | (#27336401)

The Internet Archive also works with about 100 physical libraries around the world whose curators help guide deep Internet crawls. The Internet Archive's massive database is mirrored to the Bibliotheca Alexandrina, the new Library of Alexandria in Egypt, for disaster recovery purposes.

Re:They store 4.5PB in Egypt! (4, Funny)

Anonymous Coward | more than 5 years ago | (#27336719)

Egypt could be a good choice. The area is fairly famous for reliable persistent storage. From papyrus scrolls to stone engravings, things tend to keep there better than most places. There really aren't many other geographical areas on earth that can claim the same kind of data retention rates over the time periods they've dealt with. Though despite their impeccable track record with avoiding hardware failures, they've done significantly worse when it comes to data loss due to theft and/or hackers/pirates.

The one curious part about that choice is that the library at Alexandria is the one notable case where mass amounts of data were irreparably lost. So it's odd that they'd choose to entrust their data to that specific institution. Perhaps they felt that since it's under new management, the previous problems will have been resolved.

However, had the choice been mine, I would have chosen to store my offsite data in Luxor. It's data retention was quite good, and included one data store that was preserved in its entirety for over 3000 years. As an added benefit, it seems that they've opened a second location [luxor.com] that's significantly more convenient for the IA since there's no overseas transmission to worry about.

Re:Where do they store 4.5TB off site (1, Funny)

Anonymous Coward | more than 5 years ago | (#27336511)

They have Charlie Babbitt on their staff. No need to replicate.

You can ship it over OC-192... (4, Interesting)

Ungrounded Lightning (62228) | more than 5 years ago | (#27336571)

... one would assume that something like this does regular off-site back-ups, which must add up to a hell of a-lot,..

As I recall from one of Brewster's talks: Part of the idea was that you can install redundant copies of this data center around the world and keep 'em synced.

You can ship 4.5 petabytes over a single OC-192 link in about 71 days.

Re:You can ship it over OC-192... (5, Funny)

TheGratefulNet (143330) | more than 5 years ago | (#27336659)

You can ship 4.5 petabytes over a single OC-192 link in about 71 days.

yeah, but just at the 70th day, someone will pick up the phone and the whole thing will have to be resent.

Re:You can ship it over OC-192... (0)

Anonymous Coward | more than 5 years ago | (#27336969)

You can ship 4.5 petabytes over a single OC-192 link in about 71 days.

Assuming you have OC-192s to every location over trans-oceanic distances dedicated only to this.

The initial sync would probably be done locally with a trunked 10 GigE, and then ship the duplicate container to the back up location. Then send deltas over your trans-oceanic links.

Syncing snapshots (and only sending the deltas) is very easy with ZFS (which is presumably what they're using if they're on Solaris 10).

Re:Where do they store 4.5TB off site (1)

TheGratefulNet (143330) | more than 5 years ago | (#27336627)

one would assume that something like this does regular off-site back-ups

there are BIG fat cables you connect, wait 3 seconds, then do a massively parallel 'dd if= ...'

Re:Where do they store 4.5TB off site (2, Insightful)

pedrop357 (681672) | more than 5 years ago | (#27336711)

4.5TB isn't that bad. Heck, we have 1TB tapes right now. 5 of them can be carried in a small bag.

It's the 4.5PB that the Internet Archive could use that's hard to store offsite. 4500 1TB tapes can be pretty unruly.

Re:Where do they store 4.5TB off site (3, Interesting)

Anonymous Coward | more than 5 years ago | (#27336905)

one would assume that something like this does regular off-site back-ups, which must add up to a hell of a-lot, could someone with experiance in such matters shed a little insight into the logistics of backing up such a vast system

Create snapshot of zpool (think LVM VG):
# zfs snapshot mydata@2009-03-24

Send snapshot to remote site:
# zfs send mydata@2009-03-24 | ssh remote "zfs recv mydata@2009-03-24"

Create a new snapshot the next day:
# zfs snapshot mydata@2009-03-25

Send only the incremental changes between the two:
# zfs send -i mydata@2009-03-24 mydata@2009-03-25 | ssh remote "zfs recv mydata@2009-03-25"

Now this looks a lot like rsync, but the difference is that rsync has to traverse the file system tree (directories and files), while ZFS only has to look at the 'birth time' (think ctime) of each block of data (not even the full file metadata) to see if it's newer than the first snap shot. If you're talking about tens (or hundreds) of thousands of directories, and an order of magnitude more files, that's a lot of overhead if nothing has changed. For 48 TB raw (what a Sun X4500 can have), ZFS can see nothing has changed in a few minutes.

Creation of snapshots is instantaneous and there is no overhead in them (except that the space from deleted files isn't reclaimed / reused). There are people who create them every five seconds, and sync with a remote server--so at most you would lose five seconds worth of data if your disk died.

All changes are also ACID, so if you start your send-recv, and the transmission dies part way through, the receiving end won't have a partial copy of the data latest snapshot--it's all or nothing of the last good change.

Story is meaningless without LOC measurement (5, Funny)

Dr_Banzai (111657) | more than 5 years ago | (#27336233)

I have no idea how much 4.5 PB is until it's given in units of Libraries of Congress.

Re:Story is meaningless without LOC measurement (5, Interesting)

Wingman 5 (551897) | more than 5 years ago | (#27336347)

from http://www.lesk.com/mlesk/ksg97/ksg.html [lesk.com] The 20-terabyte size of the Library of Congress is widely quoted and as far as I know is derived by assuming that LC has 20 million books and each requires 1 MB. Of course, LC has much other stuff besides printed text, and this other stuff would take much more space.

1. Thirteen million photographs, even if compressed to a 1 MB JPG each, would be 13 terabytes.
2. The 4 million maps in the Geography Division might scan to 200 TB.
3. LC has over five hundred thousand movies; at 1 GB each they would be 500 terabytes (most are not full-length color features).
4. Bulkiest might be the 3.5 million sound recordings, which at one audio CD each, would be almost 2,000 TB.

This makes the total size of the Library perhaps about 3 petabytes (3,000 terabytes).

so 230 libraries by the old standard or 1.5 by the new standard

Re:Story is meaningless without LOC measurement (2, Insightful)

dln385 (1451209) | more than 5 years ago | (#27336763)

from http://www.lesk.com/mlesk/ksg97/ksg.html [lesk.com] The 20-terabyte size of the Library of Congress is widely quoted and as far as I know is derived by assuming that LC has 20 million books and each requires 1 MB. Of course, LC has much other stuff besides printed text, and this other stuff would take much more space.

1. Thirteen million photographs, even if compressed to a 1 MB JPG each, would be 13 terabytes. 2. The 4 million maps in the Geography Division might scan to 200 TB. 3. LC has over five hundred thousand movies; at 1 GB each they would be 500 terabytes (most are not full-length color features). 4. Bulkiest might be the 3.5 million sound recordings, which at one audio CD each, would be almost 2,000 TB.

This makes the total size of the Library perhaps about 3 petabytes (3,000 terabytes).

so 230 libraries by the old standard or 1.5 by the new standard

Compress each audio file to a 5 MB MP3. That's 17.5 TB. Total size would be 750 terabytes.

So the data would be 6 LOC.

Re:Story is meaningless without LOC measurement (2, Insightful)

merreborn (853723) | more than 5 years ago | (#27336925)

Bulkiest might be the 3.5 million sound recordings, which at one audio CD each, would be almost 2,000 TB.

You compressed the video, and the photographs, but not the audio? And why do you need a full CD for every sound recording? Surely many of them are far shorter than a full CD?

Re:Story is meaningless without LOC measurement (3, Informative)

commodore64_love (1445365) | more than 5 years ago | (#27336399)

83 terabyte in the LOC, so 4.5 petabytes == 54 Libraries of Congress

4.5 petabytes == 4500 terabyte hard drives, times $75 each == ~$340,000 == how much taxpayers spend, each hour, to maintain the LOC

Re:Story is meaningless without LOC measurement (1)

commodore64_love (1445365) | more than 5 years ago | (#27336463)

P.S. My "83 terabyte" quote comes directly from the Library of Congress statistics, mid-2008.

Re:Story is meaningless without LOC measurement (1)

clarkkent09 (1104833) | more than 5 years ago | (#27336469)

Bah, LOC is outdated. 4.5PB = 1 Shipping Container

Re:Story is meaningless without LOC measurement (1)

v1 (525388) | more than 5 years ago | (#27336667)

a new 4.5 petabyte data center

4.5 PB? Is that the best you can do? sheesh, amateurs....

Though it also did surprise me they only get 200,000 hits/day. I expected the WayBack Machine [archive.org] to get a lot more traffic than that.

Re:Story is meaningless without LOC measurement (1)

clarkkent09 (1104833) | more than 5 years ago | (#27336741)

I think that's 200K unique visitors. According to alexa, archive.org is the 386th most visited site on the internet last week which is not to be sneezed at

Mental images of libraries (1)

macraig (621737) | more than 5 years ago | (#27336581)

Riiiight... because you happen to have a really really good mental image of exactly how many rooms/shelves/books/pages are stored in the Library of Congress!

(Which incidentally doesn't happen to be static, BTW; yo momma's LoC ain't the same size as my LoC.)

Storage Envy (5, Funny)

jacksinn (1136829) | more than 5 years ago | (#27336241)

Does lusting after all their space make me a peta-phile?

Re:Storage Envy (1)

fm6 (162816) | more than 5 years ago | (#27336411)

Yes.

Re:Storage Envy (-1, Offtopic)

commodore64_love (1445365) | more than 5 years ago | (#27336419)

Yes.

Does lusting after the Disney girls (like Emily Osment) make me a pedophile? Nope. Just a man. ;-)

Re:Storage Envy (1)

Rude Turnip (49495) | more than 5 years ago | (#27336701)

Why don't you have a seat over there...

Re:Storage Envy (0)

Anonymous Coward | more than 5 years ago | (#27336825)

I do believe you have won.

Own the internet! (5, Funny)

Anonymous Coward | more than 5 years ago | (#27336243)

so all one need to do to "own the internet" is to drive a big rig and ... lift the container off their parking lot?

Re:Own the internet! (5, Funny)

peragrin (659227) | more than 5 years ago | (#27336357)

well if you plug in a laser printer you can print off a hard copy for your boss.

Slight problem? (5, Funny)

girlintraining (1395911) | more than 5 years ago | (#27336249)

I can now theoretically steal "the internet" with a flatbed truck and a lift. There's something to be said for conventional data centers: They're rather hard to load onto a truck and drive off with.

Not "THE" but "A" internet... (1)

denzacar (181829) | more than 5 years ago | (#27336653)

You would be stealing A backup copy of THE Internet. An incomplete one at that, but still quite extensive.

Now... If you were somehow able to steal that copy AND break [youtube.com] the internet [youtube.com] ... your stolen internet may be considered THE internet.

Re:Slight problem? (4, Interesting)

rackserverdeals (1503561) | more than 5 years ago | (#27336767)

Here's a video tour [youtube.com] of one if you need it for reference.

Don't forget to turn off the water and unplug the ethernet cables. Just be very careful with the power cords.

Re:Slight problem? (3, Funny)

fightinfilipino (1449273) | more than 5 years ago | (#27336841)

so the Internet really is a big truck, hauling all of our lulz and our memes across the globe.

take THAT, Ted Stevens!

Re:Slight problem? (1)

diablovision (83618) | more than 5 years ago | (#27336863)

Nothing is stopping you from putting it inside a building, cementing it into its foundation, or surrounding it with appropriately weaponized sharks.

Re:Slight problem? (1)

bigsteve@dstc (140392) | more than 5 years ago | (#27336981)

But don't forget to wrap the container in cling-film before you drop it into the shark tank.

Minor problem with weaponized sharks (0)

Anonymous Coward | more than 5 years ago | (#27336993)

Nothing is stopping you from putting it inside a building, cementing it into its foundation, or surrounding it with appropriately weaponized sharks.

The presence of weaponized sharks implies the need for a moat. Somehow I doubt the city and county governments would appreciate its construction on the premise, as the presence of the said sharks would preclude passing it off as a swimming pool.

Housed in a metal shipping container.... (1)

MichaelSmith (789609) | more than 5 years ago | (#27336261)

Well I hope it is bolted down.

Re:Housed in a metal shipping container.... (1)

stonedcat (80201) | more than 5 years ago | (#27336391)

And lets hope those bolts go into something solid, like not dirt.

Only one new datacenter? (0)

Anonymous Coward | more than 5 years ago | (#27336271)

Just imagine what you could do with a beowulf cluster of 4.5 PB datacenters. You could create regular archives of the internet archives!

(As a webserver administrator, I can't stress how important it is to keep backups.)

4.5 PB... yumm (0)

Anonymous Coward | more than 5 years ago | (#27337015)

Just imagine what you could do with a beowulf cluster of 4.5 PB datacenters. You could create regular archives of the internet archives!

Actually, I was thinking the largest collection of pr0n the world has ever seen (to date.)

Nice use for a bunny-rabbit (1)

davecb (6526) | more than 5 years ago | (#27336275)

Yes, "thumper" refers to the rabbit. I have a Sun Managed Storage slide somewhere about how data tends to, er, multiply...

--dave

What about 1996 and earlier? (4, Interesting)

commodore64_love (1445365) | more than 5 years ago | (#27336277)

Are there any resources the let us see websites from 1996, 95, 94, or 93? I would love to revisit the web as it appeared when I first discovered it (1994 at psu.edu).

Re:What about 1996 and earlier? (4, Funny)

Tumbleweed (3706) | more than 5 years ago | (#27336431)

I would love to revisit the web as it appeared when I first discovered it (1994 at psu.edu).

No, you wouldn't.

Re:What about 1996 and earlier? (2, Funny)

Matheus (586080) | more than 5 years ago | (#27336433)

The entire internet prior to 1996 is archived on an old PC that I'm currently trying to get the 5GB disk restored on.. why I've kept all that old porn for so long completely escapes me tho. :)

Re:What about 1996 and earlier? (2, Informative)

Profane MuthaFucka (574406) | more than 5 years ago | (#27336569)

Because after 1996 women shaved all their hair off due to a mistaken belief that men prefer their women to look like little girls. We don't, we like the big bushes, and that is why you must save that porn for the good of mankind.

Re:What about 1996 and earlier? (0, Offtopic)

Hurricane78 (562437) | more than 5 years ago | (#27336941)

Speak for yourself. I like my pussy shaven. But as someone who never licks his girlfriend's pussy, you would not know why that is, would you?

Re:What about 1996 and earlier? (1, Informative)

scottrocket (1065416) | more than 5 years ago | (#27336801)

Yes, "The Wayback Machine", at archive.org. Coincidentally, I was there just last night, looking at a January '98 Slashdot.

Re:What about 1996 and earlier? (1)

scottrocket (1065416) | more than 5 years ago | (#27336895)

Oops. '93-'96 - apparently in my universe, '98 is before '93.

That is a shit ton of space (1)

webheaded (997188) | more than 5 years ago | (#27336281)

Unfortunately the Wayback Machine will still be slower than hell. :p

Re:That is a shit ton of space (1)

jd (1658) | more than 5 years ago | (#27336813)

Fortunately, Hell has now been upgraded to 2 mb/s, thanks to British Telecom.

They had it on South Park (1)

chadimus (818049) | more than 5 years ago | (#27336283)

It sometimes takes the form of a giant blue linksys router. So that we may better worship it.

In reality... (1)

tacarat (696339) | more than 5 years ago | (#27336383)

The internet is only about 2TB once you've removed all the redundant copies of 2g1c and goatse.cx.

Re:In reality... (1)

EdZ (755139) | more than 5 years ago | (#27336475)

Then you lose any data that may be stored in the arrangement of those many redundant, redistributed and reencoded copies. Distributed steganography, if you will

In Other News (5, Informative)

Erik Fish (106896) | more than 5 years ago | (#27336407)

Incidentally: FileFront [filefront.com] is closing in five days, taking with it any files that aren't hosted elsewhere.

I am told that many of the Half-Life mods [filefront.com] hosted there are not available anywhere else, so get while the getting is good...

Re:In Other News (1)

bluesatin (1350681) | more than 5 years ago | (#27336533)

It's a sad sad day when FileFront shuts it's doors.

Please don't make us start downloading things from FilePlanet again, it makes me cry a little inside.

)':

Never underestimate the bandwidth ... (4, Insightful)

Ungrounded Lightning (62228) | more than 5 years ago | (#27336449)

... of a 4.5 petabyte datacenter in a shipping container in transit.

Re:Never underestimate the bandwidth ... (1)

Wingman 5 (551897) | more than 5 years ago | (#27336527)

(4.5 petabyte) / (1 year) = 153.114984 MBps
  now that would be some bad lag.

Re:Never underestimate the bandwidth ... (0)

Anonymous Coward | more than 5 years ago | (#27336601)

I'd reply in a year and get modded +5 Funny, except /. closes discussions after a short time.

30 comments... (0)

Anonymous Coward | more than 5 years ago | (#27336459)

and not a single "finally a place big enough to store all of my porn" reference? Y'all are slacking tonight.

on a very slightly serious note, how much content would be referenced by, say, TPB? Sure the trackers are small, but that's got to be huge.

63 x 48 = 3024Tb (3, Insightful)

eotwawki (1515827) | more than 5 years ago | (#27336501)

So wehre does the 4.5PB come in to this?

Re:63 x 48 = 3024Tb (1)

glitch23 (557124) | more than 5 years ago | (#27336603)

The article doesn't make it clear so I can only guess that the missing storage capacity is part of some SAN. Maybe the 48 1TB hard drives are only local storage (obviously) but are in addition to some existing SAN that they didn't mention in this particular article. Either that or the article is just wrong about the 4.5PB database.

Re:63 x 48 = 3024Tb (1)

SirLoadALot (991302) | more than 5 years ago | (#27336749)

Good point. My best guess would be that they are actually 1.5 TB drives. That would get the numbers about right.

Re:63 x 48 = 3024Tb (1)

NickW1234 (1313523) | more than 5 years ago | (#27336787)

That was my assumption as well. Of course, that's not accounting for any redundancy.

Re:63 x 48 = 3024Tb (4, Informative)

spinkham (56603) | more than 5 years ago | (#27336809)

TFA says "...eight racks filled with 63 Sun Fire x4500 servers with dual- or quad-core x86 processors running Solaris 10 with ZFS. Each Sun server is combined with an array of 48 1TB hard drives." (emphasis mine)

I would guess this means there's a x4500 with 24TB in local disks, and 48TB in attached storage per machine. (24+48)*63 does give us the quoted number

Re:63 x 48 = 3024Tb (1)

rackserverdeals (1503561) | more than 5 years ago | (#27336877)

I don't think that's right. Sun's site has a video tour of it. Haven't finished it yet but it's here [sun.com] .

Re:63 x 48 = 3024Tb (1)

rackserverdeals (1503561) | more than 5 years ago | (#27336909)

The new datacenter is only 3PB. I guess the total storage, with the old data centers is 4.5 PB.

So 48x63 gives you 3PB of raw storage. I'm guessing there using less because I can't imagine them running it in raid 0.

Whoopsie! (1)

Profane MuthaFucka (574406) | more than 5 years ago | (#27336509)

That wasn't the ribbon, it was the powercord! Someone's going to be embarassed!

3PB or 4.5PB (0)

Anonymous Coward | more than 5 years ago | (#27336543)

I guess /.'s readers can no longer multiply, but 63 servers * 48TB/server = 3024TB =~ 3PB.

I'm guessing they had 1.5PB already?

Andy

  P.S. yes, I'm looking for a class 8 truck and a set of hydraulic jacks... but before I steal the Internet Archive, as a consumer, I DEMAND that the entire thing fit in my shirt pocket, and have an Apple logo on it!!!

63 x 48 =3024 (0, Redundant)

eotwawki (1515827) | more than 5 years ago | (#27336557)

So where does the 4.5PB come into this?

Math (3, Informative)

PowerKe (641836) | more than 5 years ago | (#27336579)

63 servers * 48 disk of 1 TB = 3024 TB. According to the announcement [archive.org] on the archive.org 3 Petabytes would be right.

Re:Math (0)

Anonymous Coward | more than 5 years ago | (#27336617)

And so, 1 library of congress.

Slashdotted (1)

thefolkmetal (970306) | more than 5 years ago | (#27336641)

I don't know if I'm the only one who read it this way, but the summary makes it seem like these servers have a bit of a job on their hands as it is, what with hosting the site and doing their web-crawling/archiving...and we slashdotted this thing? We're going to blow that little metal building up.

"Sun Fire" (3, Informative)

fm6 (162816) | more than 5 years ago | (#27336647)

The new data center houses 63 Sun Fire servers

That's not very specific. "Sun Fire" is a brand that for a while got applied to all of Sun's rack-mount servers (except for NEBS-compliant servers, which were and are called "Sun Netra"). A little confusing, of course, which is why they've started calling new SPARC boxes "Sun SPARC Enterprise" to differentiate them from those mangy x64 "Sun Fire" systems. Except that there are still SPARC systems called "Sun Fire", so I guess the confusion factor didn't get any better...

Anyway, the specific server being used here is the Sun Firex X4500 [sun.com] , a system with no less than 48 1 TB disks in a 4U space. Notice that this model is EOLed; presumably iarchive got a deal on some remaindered machines.

The shipping container is something we've seen before [slashdot.org] .

Re:"Sun Fire" (2, Informative)

ximenes (10) | more than 5 years ago | (#27336831)

Since they're using one of Sun's modular datacenters that is actually on the Sun campus, I would imagine that they got some financial incentives / support from Sun for all of this.

The X4500 is EOL as you mention, although it was still sold a few months back. It lives on as the X4540, which really isn't that different; the main thing is it's moved to a newer Opteron processor type and is a fair bit cheaper. So they didn't really miss out on anything.

It's kind of interesting to me that they went this route, as opposed to a bunch of servers talking to a bunch of storage separately. This seems to be an exact use case for the X4500-type system, which as far as I'm aware is pretty unique.

they cut the ribbon? (1)

unfunk (804468) | more than 5 years ago | (#27336649)

They cut the ribbon? How are they supposed to access that much data unless they buy a new one?

Re:they cut the ribbon? (1)

jd (1658) | more than 5 years ago | (#27336833)

Easy. Ribbon's only good for short-distance parallel links. If they've got backups in Egypt, they must be using serial cables.

Load distributions (1)

Just Some Guy (3352) | more than 5 years ago | (#27336783)

From TFA (yeah, I know):

a Web site that gets about 200,000 visitors a day or about 500 hits per second on the 4.5 petabyte database.

So they get all 200,000 hits in a 7-minute window? I picture a sysadmin going insane for a few moments then napping in a hammock for the rest of the day.

Re:Load distributions (1)

diablovision (83618) | more than 5 years ago | (#27337017)

hit != visitor

LOCs? (-1, Redundant)

Anonymous Coward | more than 5 years ago | (#27336869)

How many Librarys Of Congress is that?

mood 0p (-1, Flamebait)

Anonymous Coward | more than 5 years ago | (#27336871)

Load More Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Create a Slashdot Account

Loading...