Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Google Sorts 1 Petabyte In 6 Hours

Soulskill posted more than 4 years ago | from the sort-of-fast dept.

Google 166

krewemaynard writes "Google has announced that they were able to sort one petabyte of data in 6 hours and 2 minutes across 4,000 computers. According to the Google Blog, '... to put this amount in perspective, it is 12 times the amount of archived web data in the US Library of Congress as of May 2008. In comparison, consider that the aggregate size of data processed by all instances of MapReduce at Google was on average 20PB per day in January 2008.' The technology making this possible is MapReduce 'a programming model and an associated implementation for processing and generating large data sets.' We discussed it a few months ago. Google has also posted a video from their Technology RoundTable discussing MapReduce."

Sorry! There are no comments related to the filter you selected.

Kudos to Google (5, Funny)

Anonymous Coward | more than 4 years ago | (#25865203)

for knowing how important the Library of Congress metric is to us nerds!

Kudos to Niggers (-1, Troll)

Anonymous Coward | more than 4 years ago | (#25865439)

for destroying the language and the culture of every country in which they are found! Just think about it ... they are the only ethnic group who can be in the USA for more than 1-2 generations without sounding like a native English speaker. Even native Mandarin Chinese speakers who come to the USA have children who sound like native English speakers. Niggers have had how many hundreds of years now? Oh but we're supposed to call it "Ebonics" and pretend like this is somehow not a failure of theirs. Better to fuck up the entire language for everyone than to tell a 13% minority to get with the program. Brilliant! Don't worry, at least that way everyone's widdle feelings won't get hurt. Because that's more important than anything else, right? Right, you pantywaists?!

Re:Kudos to Niggers (-1, Offtopic)

Anonymous Coward | more than 4 years ago | (#25865517)

Who got sand in your vagina?

Re:Kudos to Niggers (-1, Offtopic)

Anonymous Coward | more than 4 years ago | (#25865619)

Your an asshole. If you think that the small subset of blacks you talk about represent them all, then I invite you to many trailer parks in Louisiana full of white people. I will venture a guess that more white 'wiggas' speak the way you describe than black people. In addition, I think it is these white people in America who contribute more to our cultural destruction than any other group. There is way more white trash in America fucking it up than blacks. But I am not supposed to say that because Sarah Palin says those are the "real Americans." You and your kind disgust me, and I hope your dog dies.....

Re:Kudos to Niggers (-1, Flamebait)

Anonymous Coward | more than 4 years ago | (#25865861)

Whites MAKE Jews TAKE Blacks BREAK

Re:Kudos to Niggers (-1, Offtopic)

Anonymous Coward | more than 4 years ago | (#25866715)

There's a lot less wiggas than niggas on TV, in movies, in the music business, etc.

It's not about numbers, it's about the public's (black or white) inability to filter out such (wiggas, niggas) crap.

Re:Kudos to Niggers (-1, Offtopic)

ironicsky (569792) | more than 4 years ago | (#25866111)

Seriously, Admins you need to clean this stuff out. I like reading slashdot at work, but this stuff causes it to be blocked by our "censorship" program. If I cant read slashdot, what am I supposed to do for 8 hours in my cubicle. Work is out of the question! :-)

Re:Kudos to Niggers (-1, Offtopic)

Anonymous Coward | more than 4 years ago | (#25867077)

I wonder if your workplace censors the word fuck. Because I say fuck censorship.

Also, here's another nigger for you.

GRRRR (-1)

Anonymous Coward | more than 4 years ago | (#25865785)

cia / nsa / Hss homeland security be happy then

and thanks to whomeever i cant post here
doesn't even deserve a email about it.
been longer then 24 hrs

and why is my comment about some jerks replacing ubuntu with solaris NOT a WASTE of time
if ya want ubuntu with ubuntu apps
makes more sense to get the ubuntu linux kernal then some other stupid OS

and you system due to it being hard to read means that IM NOT HUMAN????

Re:Kudos to Google (1)

Zencyde (850968) | more than 4 years ago | (#25865875)

Who cares how many Libraries of Congress it takes. I want to know how long it will take to hack the Gibson!

Re:Kudos to Google (5, Funny)

canuck57 (662392) | more than 4 years ago | (#25865929)

for knowing how important the Library of Congress metric is to us nerds!

But at least now we know Google can sort out petafiles.

Re:Kudos to Google (4, Funny)

shutdown -p now (807394) | more than 4 years ago | (#25866571)

Bah! To pay true homage, they need to add it to the list of units in Google Calc!

Re:Kudos to Google (1)

tyrione (134248) | more than 4 years ago | (#25867713)

True, but when the actual Library of Congress entire Library is converted digitally then they can brag on comparisons. However, I doubt they will want to seeing as how that will be far larger than a petabyte of data.

Unit conversion (4, Funny)

Zarhan (415465) | more than 4 years ago | (#25865255)

Yay! We finally have unit conversion from 1 LoC to bytes! So...20 PB = 6LoC, means that 1 LoC = 3,333... PB :)

Re:Unit conversion (3, Informative)

xZgf6xHx2uhoAj9D (1160707) | more than 4 years ago | (#25865383)

Don't you mean 1PB = 12LoC?

Re:Unit conversion (4, Informative)

Neon Aardvark (967388) | more than 4 years ago | (#25865429)

No, 1 PB = 12 LoC, so 1 LoC = 0.0833... PB

Also, I'd like to make some kind of swimming pool reference.

Re:Unit conversion (2, Interesting)

Anonymous Coward | more than 4 years ago | (#25865609)

Assuming it was written in binary in a font that allows for 1 digit per 2mm, the length of the data would be 183251938 m, or 1145324 times the perimeter of an olympic-sized swimming pool.

Re:Unit conversion (1)

owlnation (858981) | more than 4 years ago | (#25865951)

No, 1 PB = 12 LoC, so 1 LoC = 0.0833... PB. Also, I'd like to make some kind of swimming pool reference.

Yes, but how much is that in football fields?

Re:Unit conversion (1, Interesting)

Anonymous Coward | more than 4 years ago | (#25866067)

Yes, but how much is that in football fields?

You silly sod, you can't measure something in football fields! There's internationalization to take into account!

Canadian football fields are 100x59m, American football fields are 109x49m, and the rest of the world doesn't even play the same game on a football field. And THEIR sport has a standard range, anywhere from 90-120m by 45-90m (Thank you wikipedia [wikipedia.org] ).

You've now introduced variable-variables! We can't get an absolute number!

Re:Unit conversion (1)

Yvan256 (722131) | more than 4 years ago | (#25866743)

Can we get an absolute variable instead?

Re:Unit conversion (1)

Tubal-Cain (1289912) | more than 4 years ago | (#25866781)

I vote for i.

Re:Unit conversion (2, Informative)

ewanm89 (1052822) | more than 4 years ago | (#25867369)

Well, American's don't even play *foot*ball with there feet.

Re:Unit conversion (1, Funny)

xZgf6xHx2uhoAj9D (1160707) | more than 4 years ago | (#25867871)

This is an excellent point. No American football player has used his feet since the NFL adopted hoverchairs into the rules in 1974.

Re:Unit conversion (2, Informative)

Zarhan (415465) | more than 4 years ago | (#25866033)

Oh darn. Clearly I was converting pound-congresses to kilos first.

Re:Unit conversion (1)

RancidPeanutOil (607744) | more than 4 years ago | (#25867727)

can we work elephant volume into it as well? Assuming a spherical elephant of course... QED

Re:Unit conversion (1)

neoform (551705) | more than 4 years ago | (#25865551)

What format are they using for the books when doing this calculation as to the size of the LoC?

Raw Text?

PDF?

JPEG? ....

BCA's? (0)

Anonymous Coward | more than 4 years ago | (#25865703)

Can we convert that to number of bad car analogies?

Re:Unit conversion (1)

UltraAyla (828879) | more than 4 years ago | (#25865751)

I like your thinking, but would like to modify it (I realize it was a joke). Considering the rate at which LoC archives data, we should put some datestamps on it so that, including the other correction, 1PB = 12 081123LoC. Just a thought

That's Easy (4, Interesting)

Lord Byron II (671689) | more than 4 years ago | (#25865299)

Consider a data set of two numbers, each .5 petabyte big. It should only take a few minutes to sort them and there's even a 50% chance the data is already sorted.

Re:That's Easy (5, Insightful)

Blakey Rat (99501) | more than 4 years ago | (#25865335)

I came here to post the same thing. If they sorted a petabyte of Floats, that might be pretty impressive. But if they're sorting 5-terabyte video files, their software really sucks.

Not enough info to judge the importance of this.

Re:That's Easy (5, Informative)

farker haiku (883529) | more than 4 years ago | (#25865387)

I think this is the data set. I could be wrong though. The article (yeah yeah) says that

In our sorting experiments we have followed the rules of a standard terabyte (TB) sort benchmark.

Which lead me to this page [hp.com] that describes the data (and it's available for download).

How is this flamebait? (-1, Offtopic)

Anonymous Coward | more than 4 years ago | (#25865575)

Mods, he probably should have read the article. If he did, he'd have noticed that what they sorted were tons of 100-byte records. But how does that make it "Flamebait?" People who mod like this should be identified and listed in a Slashdot "Hall of Shame" of some sort and never allowed to mod or meta-mod again. Think that's extreme? Look at how blatantly this moderator failed.

Here's the definition of "Flamebait" from the Slashdot Faq [slashdot.org] :

Flamebait -- Flamebait refers to comments whose sole purpose is to insult and enrage. If someone is not-so-subtly picking a fight (racial insults are a dead giveaway), it's Flamebait.

Now, someone please tell me how the parent post fits this criteria. I dare the moderator to explain himself or herself. You won't, of course, because that would require balls and the ability to admit that you made a mistake. Y'know, things that respectable people have.

Posted anonymously because I fully expect that, instead of understanding that preventing/correcting such blatant incompetence is the best way to avoid rants like this one, the other mods will instead play shoot-the-messenger and take out their impotent frustrations on me either because I pointed out this stupidity or because I wasn't care-bear nice about it (because that's so much more important than truth, right?).

Re:How is this flamebait? (-1, Offtopic)

Hurricane78 (562437) | more than 4 years ago | (#25865707)

You should know, that people always block something that would destroy their reality or self-respect too much. It does not matter if it's true.
This is, because else, they would become insane. Ask a studied psychologist about it.

If you really want him to change his mind, you must form your words in a way, that allows him to still accept himself and his reality.
The best way is, to tell him he can make his life even better by changing his mind that way, and that not he was wrong but he fell for a trick or something like that.
Yeah, it's distorting reality. But it's his distorted reality that you use. And do you want it to work, or just flame? :)

I know that our instinct tells us to back-attack like you did. But this does not work if he does not respect your opinion and does not listen to you.
The above way on the other hand works nicely if done right. You can even make new friends out of enemies and fix their distortions somewhat.

By the way: Every human has distortions. But an alpha-male knows how to convince others, that his views are the best. ;)

Re:How is this flamebait? (1, Interesting)

iwein (561027) | more than 4 years ago | (#25865803)

There is a thing called meta humor, I'll give you an example:

You got baited into a flame in a very elaborate scheme to mock your intelligence (or lack thereof).

There is no category meta-flamebait, so you're proving the mods right I'd say.

I hope this helps.

Re:That's Easy (1)

mR.bRiGhTsId3 (1196765) | more than 4 years ago | (#25865657)

I dunno, it depends on what criteria they are using to sort video files. If by file name, then yeah, not so impressive, but if their sorting based on a measure of relevance of the contained contents, my jaw would drop and my eyes would pop out.

Re:That's Easy (5, Informative)

Anonymous Coward | more than 4 years ago | (#25865389)

From TFA: they sorted "10 trillion 100-byte records"

Re:That's Easy (4, Funny)

sakdoctor (1087155) | more than 4 years ago | (#25865407)

And yet google don't even convert petabytes to libraries of congress in the google calculator.
Or perhaps I got the syntax wrong.

Re:That's Easy (4, Funny)

sakdoctor (1087155) | more than 4 years ago | (#25865419)

Huh? This isn't the parent post I was trying to reply to.

Re:That's Easy (1)

nebulus4 (799015) | more than 4 years ago | (#25865475)

Consider a data set of just one number, about 1 petabyte in size, then it shouldn't really take much time to sort it, since we already know the data is sorted. Perfect excuse for using 4000 computers to beta-test Duke Nukem Forever.

Re:That's Easy (0)

Anonymous Coward | more than 4 years ago | (#25866135)

I suppose in your model those numbers are stored on dvds and you're moving them around with a truck.

Re:That's Easy (1)

JamesP (688957) | more than 4 years ago | (#25866895)

Chances are now they are going to ask potential employees being interviewed there how to do it using half the time and one tenth of the machines...

Re:That's Easy (1)

jtgd (807477) | more than 4 years ago | (#25867223)

For it to take minutes to sort two numbers they would have to be identical for the first few gigabytes. If they differed in the first byte, it would only take a microsecond to sort them.

Need to benchmark against the best sorts (4, Insightful)

Animats (122034) | more than 4 years ago | (#25865371)

Sorts have been parallelized and distributed for decades. It would be interesting to benchmark Google's approach against SyncSort [syncsort.com] . SyncSort is parallel and distributed, and has been heavily optimized for exactly such jobs. Using map/reduce will work, but there are better approaches to sorting.

Re:Need to benchmark against the best sorts (1)

perlchild (582235) | more than 4 years ago | (#25865639)

And Google is trying to make money off mapreduce(as an api of sorts), so now you're surprised they're using their massive resonance over the market, especially geeks, in order to heighten awareness of their product?

On the other hand, what they're trying to prove is mapreduce's worth, as a workload divider(how to break-up 20PB for sorting), not necessarily how optimal it is in the current situation. They have a better test/sample of mapreduce, but it's a trade secret to them(how it's used to index the pages for google search), so they can't release that. I imagine they'll try another test, until they get a big name signing up to use mapreduce as an api.

Re:Need to benchmark against the best sorts (1, Insightful)

Anonymous Coward | more than 4 years ago | (#25867679)

I guess it's up to SyncSort to run a benchmark and publish the results, no?

Re:Need to benchmark against the best sorts (2, Interesting)

Pinball Wizard (161942) | more than 4 years ago | (#25867845)

Parallel/distributed sorting doesn't eliminate the need for map/reduce, it just helps spread the problem set across machines.

Here's the thing though...its the distributing of the problem set and the combining of the results that is the hard part - not map/reduce.

Map and reduce are simple functional programming paradigms. With map, you apply a function to a list - which could be either atomic values or other functions. With reduce, you take a single function(like add or multiply, for instance) and use that to condense the list into a single value or object.

That's my understanding of map/reduce from my functional language classes in school and that's exactly how Google describes it. I don't really see what the big deal is with map/reduce in itself.

Like I said, its the distributing the problem among thousands of machines that is the hard part.

Findability without the complexity (-1, Offtopic)

cnnetc (1413919) | more than 4 years ago | (#25865377)

As information continues to pile up behind the corporate firewall, companies and executives are fast recognizing that effective findability is more than a nice-to-have -- it's a must-have for their business. In fact, in a recent survey by AIIM, 62% of respondents saw findability as "imperative or significant" to their overall business goals and success, while only 5% reported that it wasn't a factor.Findability is a complex problem, and our goal is to provide businesses with a simple solution. That's why we've put together 'Enterprise Findability Without the Complexity [blogspot.com] ' - a look into our philosophy and approach to search for businesses. We've noticed that approaches to findability can vary dramatically, which can have a significant impact on subsequent results. For instance, a traditional architecture, as demonstrated in this video [blogspot.com] , might include a plethora of servers, such as front-end web servers, index servers, query servers, database servers, and SAN storage. Not to mention load balancing servers, identity servers, disaster recovery servers, patch deployment servers, and volume license management servers. What a mouthful!On the other hand, there is the appliance based model - i.e., one box that does it all. The Google Search Appliance can search 10 million documents with just one box, and pull information together from across a business - whether it lives in a database, intranet, business application or content management system. Not to mention it looks pretty snazzy [blogspot.com] too.You can read the full document here [blogspot.com] . We look forward to hearing your thoughts.

Just for you, I posted this to your shitty blog (-1, Offtopic)

Anonymous Coward | more than 4 years ago | (#25865743)

Hey CNNETC, fuck you and your pathetic attempts to use Slashdot to drive traffic to your blog [slashdot.org] . You are a bottom-feeder and a carrion eater, not unlike the average cockroach or maggot, only able to leech off of someone else's popularity because you damned sure don't have the talent to build your own. You spamming sack of shit!

You look forward to hearing our thoughts, do you? Your blog says "Your comment has been saved and will be visible after blog owner approval." I hope you enjoy what's in your approval queue you fucking spammer!

What about vista? (-1, Flamebait)

gapagos (1264716) | more than 4 years ago | (#25865399)

Yes but... can they run Vista?

Finally... (5, Funny)

aztektum (170569) | more than 4 years ago | (#25865403)

I will be able to catalog my pr0n in my lifetime:

Blondes, Brunettes, Red heads, Beastial^H^H^H^H^H "Other"

tagging (4, Interesting)

Hao Wu (652581) | more than 4 years ago | (#25865509)

I will be able to catalog my pr0n in my lifetime:

It's not enough to sort by blond, black, gay, scat, etc. Some categories are a combination that don't belong in a hierarchy.

That is where tagging comes in. Sorting can be done on-the-fly, with no one category intrinsically more important.

Re:tagging (5, Funny)

gardyloo (512791) | more than 4 years ago | (#25865671)

pr0n for Geeks, volume 18: Sorting On-the-Fly

Re:Finally... (2, Funny)

Pugwash69 (1134259) | more than 4 years ago | (#25865539)

How do you catalogue the topics? I mean "Clown" and "Monkey" are so different, but something with both elements could be difficult to sort.

Re:Finally... (1)

Fumus (1258966) | more than 4 years ago | (#25866061)

For the love of puppies. Learn to spell "bestiality". Half the population can't spell it right :/

Re:Finally... (0)

Anonymous Coward | more than 4 years ago | (#25866487)

"For the love of puppies" indeed, you sick fuck.

Re:Finally... (0)

Anonymous Coward | more than 4 years ago | (#25867131)

and the other half doesn't want to...

One ups Yahoo & Hadoop (3, Interesting)

DaveLatham (88263) | more than 4 years ago | (#25865443)

It looks like Google saw Yahoo crowing about winning the 1 TB sort contest using Hadoop [yahoo.net] and decided to one up them!

Let's see if Yahoo responds!

Re:One ups Yahoo & Hadoop (1)

Anpheus (908711) | more than 4 years ago | (#25865667)

Hadoop uses MapReduce :) From their site:

Hadoop implements MapReduce, using the Hadoop Distributed File System (HDFS) (see figure below.) MapReduce divides applications into many small blocks of work. HDFS creates multiple replicas of data blocks for reliability, placing them on compute nodes around the cluster. MapReduce can then process the data where it is located.

Re:One ups Yahoo & Hadoop (1)

gfody (514448) | more than 4 years ago | (#25865993)

MapReduce isn't something invented by Google. It's a design pattern.

Re:One ups Yahoo & Hadoop (3, Informative)

Patrick May (305709) | more than 4 years ago | (#25866295)

It's older than design patterns. Lisp has provided map and reduce functions for literally decades. It's a standard functional programming idiom.

Re:One ups Yahoo & Hadoop (1)

iwein (561027) | more than 4 years ago | (#25865903)

With a larger dataset outscaling efficiency becomes more important than sorting efficiency. Sorting 1PB is different than sorting 1TB.

Since were relating to human proportions today, I'll compare your comparison to comparing running 100m to running a marathon. Apply story telling skills and score.

Sort? Sort what? (1, Insightful)

mlwmohawk (801821) | more than 4 years ago | (#25865453)

One quadrillion bytes, or 1 million gigabytes.

How big are the fields being sorted. Is it an exchange sort or a reference sort?

It is probably very impressive, but without a LOT of details, it is hard to know.

Re:Sort? Sort what? (5, Informative)

nedlohs (1335013) | more than 4 years ago | (#25865507)

I realize, slashdot..., but maybe you could glance at the article which states:

10 trillion 100-byte records

Re:Sort? Sort what? (1)

mlwmohawk (801821) | more than 4 years ago | (#25865849)

10 trillion records across 4,000 computers comes to 2.5 billion records per computer.

It took 6 hours for a computer to sort 2.5 billion records? 250G?

Yawn.

Re:Sort? Sort what? (2, Insightful)

nedlohs (1335013) | more than 4 years ago | (#25866023)

You do have to merge them all back together at the end...

But I'm sure you can do better tonight.

Re:Sort? Sort what? (0, Flamebait)

mlwmohawk (801821) | more than 4 years ago | (#25866311)

You do have to merge them all back together at the end...

Technically speaking, that's not true. In fact, you wouldn't want too.

Assuming some sort of search paradigm, you'd keep the records on their 4000 separate servers, each server doing its on search functionality, and *only* merge the results of the searches as needed and cache them in the web layer.

Re:Sort? Sort what? (2, Insightful)

chaim79 (898507) | more than 4 years ago | (#25866029)

right, so it's 250gb sorted in 6 hours... now where does the sorting and integration of the 4000 250gb blocks of sorted data come in? :)

Re:Sort? Sort what? (1)

mlwmohawk (801821) | more than 4 years ago | (#25866391)

right, so it's 250gb sorted in 6 hours... now where does the sorting and integration of the 4000 250gb blocks of sorted data come in?

You wouldn't merge it in to one set, you'd keep it all on their own servers and only merge the results as needed.

Re:Sort? Sort what? (1)

neoform (551705) | more than 4 years ago | (#25865643)

Odds are they're using the mythical "google algorithm", so they're probably going to keep what they're doing quiet.

Re:Sort? Sort what? (0)

Anonymous Coward | more than 4 years ago | (#25865693)

I believe it was a Bubble-Sort of some type...O(n^2) FTW!

Re:Sort? Sort what? (0)

Anonymous Coward | more than 4 years ago | (#25865727)

One quadrillion bytes, or 1 million gigabytes. How big are the fields being sorted. Is it an exchange sort or a reference sort?

It's a bubble sort [wiktionary.org] .

Re:Sort? Sort what? (5, Funny)

Dpaladin (890625) | more than 4 years ago | (#25865771)

Sorting a petabyte sounds pretty impressive, but I don't think it was a whole yotta work.

Re:Sort? Sort what? (0)

Anonymous Coward | more than 4 years ago | (#25866625)

*groan*

Re:Sort? Sort what? (0)

Anonymous Coward | more than 4 years ago | (#25867547)

That's about the size of emacs, right?

Its About Time.... (2, Funny)

Anonymous Coward | more than 4 years ago | (#25865535)

Finaly... A system with enough power to run vista efficiently.

Re:Its About Time.... (3, Informative)

poetmatt (793785) | more than 4 years ago | (#25865685)

Are you sure? It wasn't marked Vista capable.

Re:Its About Time.... (3, Funny)

peragrin (659227) | more than 4 years ago | (#25865745)

Not only that the extra processors aren't covered under the EULA and require special extra licenses.

Not impressive... (4, Funny)

g0dsp33d (849253) | more than 4 years ago | (#25865573)

Not a big deal, that's just the data they have on you.

Is it new data (1)

moteyalpha (1228680) | more than 4 years ago | (#25865641)

As memory gets cheaper and I can store more locally, what I really need to know is whether it is unique or new to me. I can read Frits P0st a million times and never get tired of it. There was a very good article on slashdot the other day and it got over 2000 comments, some of which were very insightful and useful. I need a way to know for myself what is new to me. I would be nice if the browser interacted more with Google to help me with that. I just looked, and RTFM is indexed 4.5 million times which of course includes xkcd#293, and that is really all I need to know.

0s and 1s (2, Funny)

johno.ie (102073) | more than 4 years ago | (#25865689)

That's a lot of computing power to use just to get 4,000,000,000,000 0s and 4,000,000,000,000 1s.

nice one, Google... (2, Funny)

Tastecicles (1153671) | more than 4 years ago | (#25865709)

...fancy doing my mp3 collection?

Libraries of congress? (2, Insightful)

TinBromide (921574) | more than 4 years ago | (#25865715)

First of all, this isn't a straight up "Libraries of Congress" (better known and mentioned in prior posts as a LoC). Its the web archiving arm of the LoC. I call for the coining of a new term, WASoLoC (Web Archival System of Library of Congress) which can be defined as X * Y^Z = 1 WASoLoC where X is some medium that people can relate to (books, web pages, documents, tacos, water, etc), Y is a volume (Libaries, Internets, Encyclopedias, end to end from A to B, swimming pools, etc) and Z is some number that marketing drones come up with because it makes them happy in their pants.

Honestly, How am i supposed to know what "..the amount of archived web data in the US Library of Congress as of May 2008." Looks like!? I've been to the library of congress, i've seen it, its a metric shit-ton of books (1 shit-ton = Shit * assloads^fricking lots), but i have no clue what the LoC is archiving, what rate they're going at it, and what the volume is of it.

Wow (1)

ice_nine6 (1149219) | more than 4 years ago | (#25865721)

That must have taken a lot of monkeys.

clever strategy (2)

stimpleton (732392) | more than 4 years ago | (#25865769)

Good.

They clearly have the ability to respond to emergencies. And this puts it out there that they can...

eg;
1) Foot n mouth out break in cattle
2) A supliment to census data
3) Finding information of dissidents/traitors(bloggers)

Re:clever strategy (0)

Anonymous Coward | more than 4 years ago | (#25866227)

4) Profit!

20,111 Servers ?? (1, Interesting)

johnflan (1413981) | more than 4 years ago | (#25865823)

With a little bit of excel, if takes 4,000 servers 362 minutes to calculate a 1PB job It takes 1440 (24 hours) on 20,111.11 server to sort 20pb (if it was just plain sorting they were doing). And just on a side note, from their number one of their servers can compute 741.5 MB per minute!

Re:20,111 Servers ?? (3, Insightful)

chaim79 (898507) | more than 4 years ago | (#25866071)

Yah, but you gotta wonder at the computing cost of integrating all those datasets into one complete sorted block of data. It could be that those servers can sort at 1gb per min but the overhead for combining is 25% of the computing time.

Re:20,111 Servers ?? (2, Informative)

johnflan (1413981) | more than 4 years ago | (#25866173)

Agreed, but even if it takes 40,000 servers with losses and extra overhead to calculate their daily workload. It makes you wonder what their other estimated 410,000 servers are doing? (2006 estimate)

Re:20,111 Servers ?? (3, Insightful)

smallfries (601545) | more than 4 years ago | (#25866291)

Oh dear. 4000*362 ~= 1440*20111 / 20. So you assumed that the sorting would scale linearly. fail.

just in perspective... (2, Interesting)

wjh31 (1372867) | more than 4 years ago | (#25865969)

i make this about 48GB/s, my hard drive manages about 20MB/s, even my mid-range ram manages only ~6.4GB/s, and top end ram will reach only ~13GB/s (according to wiki) so even ignoring the ability to process that much data in that time, the ability to simply move that much data is quite impressive (at time of print, may not hold one year down the line)

Holy shit... (1)

Taken07 (1395851) | more than 4 years ago | (#25866069)

That's a lot of data...

I assume it was... (0)

Anonymous Coward | more than 4 years ago | (#25866549)

bubblesort?

Is this our standard of measurement? (1)

moniker127 (1290002) | more than 4 years ago | (#25866883)

Just like I measure my distance to work (452.75 football fields) i measure the data on my computer by libraries of congress?

Amazing feat... (5, Funny)

Duncan3 (10537) | more than 4 years ago | (#25866967)

Today from Google, the god of all things and doer of all things good in the universe, many millions of dollars in computer equipment were able to sort lots of things, in about the amount of time you would think it would take for millions of dollars of equipment to sort things.

In other news, a woodchuck was found chucking wood as fast as a woodchuck could chuck wood.

Congrats Google, you have a HUGE data set, and an even bigger wallet.

MapReduce = map + reduce (3, Interesting)

Bitmanhome (254112) | more than 4 years ago | (#25867089)

If you feel the urge to play with MapReduce (or reade the paper), you don't need a fancy Linux distro [apache.org] to do it. MapReduce is simply the map() and reduce() functions, exactly as implemented in Python. Granted, Google implementation can work with absurdly large data sets, but for small data sets, Python is all you need.

Re:MapReduce = map + reduce (3, Informative)

boyter (964910) | more than 4 years ago | (#25867461)

True, but not quite the point. The map and reduce functions as you say are implemented in python (and a great many other languages), but what makes MapReduce special is that you replace the Map function with one which distributes it out to other computers. Because any map function can be implemented in parallel you get a speed boost for however many machines you have (dependant on network speeds etc....).

So yeah, you can do it in Python but you arent going to be breaking any records untill you implement your own infrastructure that lets you span it out to thousands of computers. The nice thing being you dont need to write any new code to take advantage of the speed when you do.

Re:MapReduce = map + reduce (1)

Varun Soundararajan (744929) | more than 4 years ago | (#25867577)

Also, one of the biggest problem in distributing tasks is *handling failures*. MR does it as part of the library, which greatly simplifies distributed computing tasks.

Re:MapReduce = map + reduce (2, Informative)

Pinball Wizard (161942) | more than 4 years ago | (#25867889)

Exactly. There is nothing special to map and reduce.

Here's an example. Map and reduce are functional programming tools that work with lists. So we'll start with a simple list.

1 2 3 4 5

Now we'll take a function - x^2, and map it to the list. The list now becomes:

1 4 9 16 25.

Now, we'll apply a reduce function to our list to combine it to a single value. I'll use "+" to keep it simple. We end up with:

55

And that is pretty much all there is to map and reduce.

But.... (1)

VonSkippy (892467) | more than 4 years ago | (#25867221)

It really only took Two Hours - the rest of the time was used stuffing in paid ads.

Load More Comments
Slashdot Login

Need an Account?

Forgot your password?