Beta

×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Facebook's Corona: When Hadoop MapReduce Wasn't Enough

Soulskill posted about a year and a half ago | from the named-after-a-desire-to-launch-hadoop-into-the-sun dept.

Facebook 42

Nerval's Lobster writes "Facebook's engineers face a considerable challenge when it comes to managing the tidal wave of data flowing through the company's infrastructure. Its data warehouse, which handles over half a petabyte of information each day, has expanded some 2500x in the past four years — and that growth isn't going to end anytime soon. Until early 2011, those engineers relied on a MapReduce implementation from Apache Hadoop as the foundation of Facebook's data infrastructure. Still, despite Hadoop MapReduce's ability to handle large datasets, Facebook's scheduling framework (in which a large number of task trackers that handle duties assigned by a job tracker) began to reach its limits. So Facebook's engineers went to the whiteboard and designed a new scheduling framework named Corona." Facebook is continuing development on Corona, but they've also open-sourced the version they currently use.

cancel ×

42 comments

Sorry! There are no comments related to the filter you selected.

Misleading headline (0)

Anonymous Coward | about a year and a half ago | (#41933717)

What do you mean "Hadoop MapReduce isn't enough"? It's the same fucking framework with a better scheduler.

Re:Misleading headline (0)

Anonymous Coward | about a year and a half ago | (#41934153)

It needed a better scheduler, what was there wasn't enough.

Do you just knee-jerk rage for internet karma, or do you legitimately have reading comprehension problems?

Re:Misleading headline (4, Funny)

ArcadeMan (2766669) | about a year and a half ago | (#41934245)

And why the fuck should I care about Windows 8 tablets? You are not making any sense!

Re:Misleading headline (0)

ArcadeMan (2766669) | about a year and a half ago | (#41934257)

Oh, shut the fuck up you idiot!

Look, ponies!

Re:Misleading headline (1)

ArcadeMan (2766669) | about a year and a half ago | (#41934287)

No, I'M Spartacus!

Re:Misleading headline (0)

Anonymous Coward | about a year and a half ago | (#41934799)

What was there was plenty enough, they were having problems because they were using it all wrong. So they decided to do spend a lot of time and effort to build things differently instead of working out how to do them properly with what was there.

Re:Misleading headline (1)

Anonymous Coward | about a year and a half ago | (#41935163)

Much as those who were holding their iPhone wrong were at fault?

Seriously, the Job Tracker just didn't scale well and applications had to worry about it - that's a broken architecture, not a broken application or deployment. Blaming the application or deployment for serious fundamental architectural flaws of the platform is much like blaming an application programmer in 1980 for using a=a+1 which a compiler happened to implement less efficiently than a++ or even a+=1 (or, for you old timers, a=+1 not to be confused with a= +1).

I don't know how Corona relates to YARN, but both should be a big improvement over the Job Tracker based architecture - even if they still may not technically scale linearly to n (for VERY large n) nodes.

Re:Misleading headline (0)

Anonymous Coward | about a year and a half ago | (#41935643)

"Hadoop MapReduce is not enough" is a misleading headline when they are still using "Hadoop Mapreduce".

I'm not enraged, I don't care about internet karma, I'm not even logged in. You seem to be severely retarded though.

Junk. (0)

Anonymous Coward | about a year and a half ago | (#41933719)

"Its data warehouse, which handles over half a petabyte of information each day, has expanded some 2500x in the past four years"
Too bad that's 99.9% junk I don't care about.

Re:Junk. (2)

isopropanol (1936936) | about a year and a half ago | (#41933859)

But between you and 1000 other people who care about slightly different sets, much of it is stuff that someone cares about.

Re:Junk. (4, Insightful)

Daniel Dvorkin (106857) | about a year and a half ago | (#41933959)

Too bad that's 99.9% junk I don't care about.

But between you and 1000 other people who care about slightly different sets, much of it is stuff that someone cares about.

This. 99.9% (at least) of the entire internet is junk that any one person doesn't care about. But every bit has someone who cares about it (or did at one time) or it wouldn't be there.

Well. I opened the story expected some reflexive Facebook-bashing, and I wasn't disappointed. When are people going to realize that FB's just another internet company with a reasonably successful business model, and worthy of neither adulation nor hatred?

Re:Junk. (1)

Daniel Dvorkin (106857) | about a year and a half ago | (#41934031)

s/expected/expecting/

[sigh] I do so wish Slashdot would allow editing posts, at least for a limited time (say, until they've been moderated or replied to). C'mon, even Facebook can manage that. ;)

Re:Junk. (1)

rtaylor (70602) | about a year and a half ago | (#41934619)

This. 99.9% (at least) of the entire internet is junk that any one person doesn't care about.

I've done a crawl of a few billion pages.

No person at all cares about 99% of the content available on the interent. In fact, nearly that much is completely unreadable and was machine generated gibberish (real words, not sentences) in an attempt to fool Google and other search engines.

There are a few servers which host millions of subdomains with millions of manufactured pages under each subdomain.

In short, it's far worst than 99.9% of the entire internet being of little use.

Re:Junk. (2)

martin-boundary (547041) | about a year and a half ago | (#41938545)

When are people going to realize that FB's just another internet company with a reasonably successful business model, and worthy of neither adulation nor hatred?

Wrong. FB is worthy of hatred because what they do is inherently evil. They spy on people, and sell off that information.

The "it's just a job/business" excuse doesn't work when the job/business is evil. For example, when the local Mafia goons come to collect protection money, it's "just a job" for them right? Nothing personal. They're just regular people who are trying to make ends meet, like eveybody else. Don't hate them. Wrong, it's evil, and the goons display a singularly bad sense of judgement in accepting to do this kind of work.

Similarly, spies are evil, James Bond notwithstanding. They steal secrets, and betray people in the process. And FB are a spying organization. They treat users' rights as a joke, and due to their size and ubiquity, are substantially responsible for the state of privacy on the internet today.

Re:Junk. (1)

ilsaloving (1534307) | about a year and a half ago | (#41944069)

What you say is sort of true, but I disagree that it is inherently evil. Evil implies a malicious intent. At worst, it's simply sociopathic. Facebook is doing what it's doing so that it can make money, and it's methods arn't even remotely secret. They would have no power at all if it wasn't handed to them gleefully by people.

Further it's disingenuous to compare them to the mafia and similar, for one simple reason. The mafia does what it does against people who are unwilling participants. Facebook on the other hand, is being fed almost limitless information by people giving it up willingly.

I don't see a problem with Facebook for the same reason I don't see a problem with 419 scammers. Only incredibly stupid and greedy people fall for 419 scammers. People who post their most intimate details on Facebook are either similarly stupid, or just don't think what they're posting is of value to anyone. Which would be true if you're posting pictures of kittens. But as many have already found out, that picture of you drunk at a party can get you fired. In either case, it's the people themselves that orchestrated their own demise.

So calling Facebook evil is like calling a crocodile evil because you were stupid enough to walk your dog right on the edge of a swamp known to home crocodiles, and a crocodile came out and ate your dog. The crocodile didn't do anything any other crocodile wouldn't do, and Facebook isn't doing anything that bajillions of other companies arn't already doing (case in point, google for Target and pregnant daughter).

What they do can't be stopped because not enough people care enough to actually try. All you can do is walk with your eyes open and avoid problems where possible.

Re:Junk. (1)

John Bokma (834313) | about a year and a half ago | (#41933941)

And yet you just made an effort to post similar junk to Slashdot....

Re:Junk. (4, Funny)

Revotron (1115029) | about a year and a half ago | (#41933991)

Yes, Facebook sure would be a lot more successful if 99.9% of people's posts got deleted and replaced with an on-screen notification that reads,

This post has been removed because it is of no interest to Anonymous Coward. Please try posting things more in line with the following categories:

1. Linux
2. Open-source software
3. Richard M Stallman
4. OMG!!! PONIES!!!

Re:Junk. (0)

Anonymous Coward | about a year and a half ago | (#41935447)

4. I, for one, look forward to an April first in which /. properly pay homage to Pinkie Pie.
3. Maybe someone could draw an epic enough beard on her and we can all pretend it's RMS?
2. Speaking of Pinkie, did you know Pinkie Pie has pwned chrome twice? Maybe if I twist this enough, it can be a "many-eyes" post about why OSS is better? I mean, if you've got _ponies_ breaking your browser, you've obviously done something wrong.
1. got nothin'

Re:Junk. (0)

Anonymous Coward | about a year and a half ago | (#41934197)

Luckily there's that 0.001% that you do care about, which measures to about half a terabyte each day. Plenty of interesting information for you there.

Re:Junk. (2)

SolitaryMan (538416) | about a year and a half ago | (#41935435)

So, you care about (1 - 0.999) * 500 TB = 500 GB of Facebook information every day??? Dude, where do you get the time?

Re:Junk. (1)

nurb432 (527695) | about a year and a half ago | (#41938043)

You may not care, but the people doing datamining to find new ways to push ads at us or find the next serial killer care greatly. You know, the ones that actually pay the bills.

First beer! (0)

Anonymous Coward | about a year and a half ago | (#41933803)

With lime. Had to do it.

you insensitive Clod! (-1)

Anonymous Coward | about a year and a half ago | (#41934361)

consider worthwhile impaired its alL; i8 order to go Never heeded she had no fear many users of BSD deeper into the DISEASES. THE

What? (0)

Anonymous Coward | about a year and a half ago | (#41934715)

In layman's terms:

What is Hadoop?
What is MapReduce?

From the article, I derive that it is a scheduling framework. What the hell is a scheduling framework?

Re:What? (3, Informative)

Em Adespoton (792954) | about a year and a half ago | (#41935723)

Hadoop: massive data storage system framework... "Apache Hadoop is an open-source software framework that supports data-intensive distributed applications"
MapReduce: a way of managing distributed clusters of data sets... "MapReduce is a programming model for processing large data sets, and the name of an implementation of the model by Google. MapReduce is typically used to do distributed computing on clusters of computers"

Scheduling framework: a framework for providing optimal scheduling of something such that events are handled in an optimal manner.

Or, to put it another way:
http://lmgtfy.com/?q=hadoop [lmgtfy.com]
http://lmgtfy.com/?q=mapreduce [lmgtfy.com]
http://lmgtfy.com/?q=scheduling+framework [lmgtfy.com]

Re:What? (0)

Anonymous Coward | about a year and a half ago | (#41936835)

Well shit, if we have to google half the article terminology, why not just google the article itself?

Thanks For Nothing (0)

Anonymous Coward | about a year and a half ago | (#41937115)

Mmm. Snarky sarcasm, industry lingo, and obtuse responses. Thanks ofr nothing.

Hadoop: massive data storage system framework... "Apache Hadoop is an open-source software framework that supports data-intensive distributed applications"

So it's some sort of file system or database? Or is it simply yet another programming language abstraction layer upon other programming language abstraction layers?

MapReduce: a way of managing distributed clusters of data sets... "MapReduce is a programming model for processing large data sets, and the name of an implementation of the model by Google. MapReduce is typically used to do distributed computing on clusters of computers"

Then Facebook's map reduce is what, a programming method, as described above? It's made out by the article to sound like something more tangible than a methodology. Even reading the Wikipedia article doesn't make anything clear in layman's terms.

Thanks again.

Re:Thanks For Nothing (1)

Em Adespoton (792954) | about a year and a half ago | (#41937459)

No snark intended... no sarcasm given. The terms describe things that are technical. If you want something more generic, I could go as far as "Database management architecture" and "database communication architecture" but that dumbs things down to the point where it ads nothing to the discussion. If you don't understand what a database is and how it works (and that we're talking about database management here), you're going to find this entire article over your head, not just the industry buzzwords.

Kind of like if we were discussing an article dealing with some new algorithm for sequencing DNA or some new tool being used by biochemists -- background reading is required to gain anything useful from the article -- explaining the buzzwords won't mean much.

Slashdot. Stuff that mutters.

Oh! I See. (0)

Anonymous Coward | about a year and a half ago | (#41941709)

So, my inability to derive that a massive data storage system framework is a "Database management architecture" in this case is a demonstration of my inability to comprehend technical stuff.

Sort of like your inability to understand that Let Me Google That For You with obtuse results is pure snark when someone asks for a layman's description. You didn't have to respond if you thought that the response was pointless or that you lacked sufficient understanding to explain it in a coherent fashion. Yet, you felt compelled to provide a smart-ass(not smart) response.

So, by your example, your inability to understand the technicalities of language and your absence of social skills makes it nearly impossible for you to understand the textbook definition of snarky. Allow me to try to explain it to you, by example.
http://www.lmgtfy.com/?q=define+snarky [lmgtfy.com]

Re:Oh! I See. (0)

Anonymous Coward | about a year and a half ago | (#41946837)

BURN!

java (1)

Anonymous Coward | about a year and a half ago | (#41934747)

after paging through the code a bit, i found it interesting that they use java in their implementation (not just corona, but hadoop as well). i was wondering why, and after some googling found this link [nabble.com] which helped explain the situation a bit clearer.

pretty interesting stuff. but id be willing to bet googles map reduce is written in c/c++

Re:java (0)

Anonymous Coward | about a year and a half ago | (#41935223)

Because life is more fun if you have to worry about nodes taking naps while garbage collecting every so often and developers wouldn't know what to do with their spare time if they didn't have to spend time on those issues?

(Seriously, the link above provides several much more rational explanations!)

Re:java (1)

Rakishi (759894) | about a year and a half ago | (#41937721)

Hadoop is not real time, it's a batch processing system, no one gives a damn if a node spend 50ms garbage collecting or not every so often.

Re:java (1)

rioki (1328185) | about a year and a half ago | (#41954721)

Until you process petabytes of data and suddenly you it makes a difference of a couple of hours per run. All the coll dynamic web technology is really nice and empowering, but once you start hitting real traffic, it makes sense to invest into more efficient core systems. See G-WAN [gwan.com] for how to do it right.

Facebook (3, Interesting)

gman003 (1693318) | about a year and a half ago | (#41934991)

I have to admit, while I hate using Facebook, and hate most of their business practices, I like how they're not just writing new infrastructure software, but are open-sourcing it all. I don't think it quite makes up for everything else, but it helps.

Wow! 2500X (0)

Anonymous Coward | about a year and a half ago | (#41935263)

They went from a couple GB to 5 TB. Impressive!!!

Half a PB of data flows through the infrastructure, but how much of that actually has to be stored? That's the real question.

Song to celebrate major version after 8 (0)

Anonymous Coward | about a year and a half ago | (#41935593)

9 Coronas [madmusic.com]

How many IT projects... (1)

Synerg1y (2169962) | about a year and a half ago | (#41937061)

Have been code-named corona these last few years?? Seems like every org's got a project named corona nowadays.

Re:How many IT projects... (1)

VortexCortex (1117377) | about a year and a half ago | (#41939433)

Have been code-named corona these last few years?

The only one I can think of involves me remotely managing a server from the beach with only a lime wedge and cold beer.

Hard to believe (0)

Anonymous Coward | about a year and a half ago | (#41937717)

Hard to believe how much technology goes into such a shitty website. Even right now over 90% of profile images aren't loading for me.

I have little sympathy for them (1)

stickyboot (845510) | about a year and a half ago | (#41940053)

They could start by actually deleting deleted content. Seems simple to me. Lets hope their shortsightedness continues when everyone jumps ship for the next social fad, and continuing this rat race becomes far to costly.

Re:I have little sympathy for them (1)

Lazy Jones (8403) | about a year and a half ago | (#41942381)

They could start by actually deleting deleted content

They could, but why should they put themselves at a disadvantage over Google, every other corporation that buys such data and the NSA, who all most certainly do not delete stuff in the way you'd like them to?

Check for New Comments
Slashdot Login

Need an Account?

Forgot your password?
or Connect with...

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>