Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Google File System Evolves, Hadoop To Follow

ScuttleMonkey posted more than 4 years ago | from the i-wanna-be-like-mike dept.

53

Christophe Bisciglia, Google's former infrastructure guru and current member of the Cloudera start-up team, has commented on Google's latest iteration on their GFS file system and deemed its features well within the evolutionary capabilities of open-source competitor Hadoop. "Details on Google's GFS2 are slim. After all, it's Google. But based on what he's read, Bisciglia calls the update 'the next logical iteration' of the original GFS, and he sees Hadoop eventually following in the (rather sketchy) footsteps left by his former employer. 'A lot of the things Google is talking about are very logical directions for Hadoop to go,' Bisciglia tells The Reg. 'One of the things I've been very happy to see repeatedly demonstrated is that Hadoop has been able to implement [new Google GFS and MapReduce] features in approximately the same order. This shows that the fundamentals of Hadoop are solid, that the fundamentals are based on the same principles that allowed Google's systems to scale over the years.'"

cancel ×

53 comments

Sorry! There are no comments related to the filter you selected.

Hadoop (4, Funny)

Jurily (900488) | more than 4 years ago | (#29419963)

I wish they would stop taking names from Star Wars.

Re:Hadoop (4, Funny)

Tablizer (95088) | more than 4 years ago | (#29419999)

I wish they would stop taking names from Star Wars.

These are not the names you are looking for.

Re:Hadoop (3, Interesting)

Tablizer (95088) | more than 4 years ago | (#29420085)

Speaking of names, check this out from TFA:

this overhaul of the Google File System is already under test as part of the "Caffeine" infrastructure the company announced earlier this week.

If they keep naming things with coffee references (including Java), what would happen if it's discovered that coffee causes cancer or shrunken balls or what not? It's already going to affect acceptance in Utah. This is why corporations find bland mean-nothing names like "Teamware" or "Altria" or "Inprise". I personally like "Stuff 9".
   

Re:Hadoop (2, Insightful)

Korin43 (881732) | more than 4 years ago | (#29420457)

But would you really rather talk about companies with names like that? Google knows their audience. There's the normal people who will use anything that's set as the default, and the nerds who are the ones setting the defaults. Google can't convince normal people to switch (because telling someone to click on the search box and choose Google is "too complicated"), so it makes sense for them to target very specifically at nerds, who will then do their work for them.

Re:Hadoop (3, Funny)

dakameleon (1126377) | more than 4 years ago | (#29420501)

The day "caffeine" becomes a word that is objectionable to a non-trivial chunk of my customer base is the day I know the PC crazies have won.

Re:Hadoop (4, Funny)

Trepidity (597) | more than 4 years ago | (#29421145)

I personally switched to IIS to avoid offending my Native American brethren!

Re:Hadoop (2, Funny)

micheas (231635) | more than 4 years ago | (#29421817)

How is lighttpd offensive to Native Americans? :-)

Re:Hadoop (1)

hesaigo999ca (786966) | more than 4 years ago | (#29425257)

Need mod points, need mod points quick! LMAO

Re:Hadoop (1)

Tablizer (95088) | more than 4 years ago | (#29426871)

I personally switched to IIS to avoid offending my Native American brethren!

at the expense of offending international astronauts.
   

Re:Hadoop (1)

cheftw (996831) | more than 4 years ago | (#29421247)

The day "caffeine" becomes a word that is objectionable to a non-trivial chunk of my customer base is the day I know the PC crazies have won.

It's not just PCs! Have you never seen a Mac-head with a latte?

(I object to the term PC for a computer, it's mostly misleading)

Re:Hadoop (1)

Arthur Grumbine (1086397) | more than 4 years ago | (#29421763)

It's not just PCs! Have you never seen a Mac-head with a latte?

(I object to the term PC for a computer, it's mostly misleading)

I object to the term "whoosh". I think it's insulting and Politically inCorrect.

Re:Hadoop (1)

roguetrick (1147853) | more than 4 years ago | (#29422217)

I object to your face. I think it's ugly and smells like a butt.

Re:Hadoop (0)

Anonymous Coward | more than 4 years ago | (#29422429)

Wow, you are WAY too close to his face...

woosh (0)

Anonymous Coward | more than 4 years ago | (#29421787)

pc = politically correct

Re:Hadoop (0)

Anonymous Coward | more than 4 years ago | (#29422269)

PC == "Politically Correct" in this context

Re:Hadoop (0)

Anonymous Coward | more than 4 years ago | (#29426925)

(I object to the term PC for a computer, it's mostly misleading)

You object to the term "Personal Computer" for a computer? You're funny!

Re:Hadoop (1)

xouumalperxe (815707) | more than 4 years ago | (#29424051)

Stuff 9? As in stuff 9 fingers? That's almost like fisting. Pervert!

Re:Hadoop (1)

badkarmadayaccount (1346167) | more than 4 years ago | (#29476737)

Am I the only one who though - "Funky Plan 9 reference"?

Coffee causes/cures cancer (2, Interesting)

andrewbaldwin (442273) | more than 4 years ago | (#29424453)

If they keep naming things with coffee references (including Java), what would happen if it's discovered that coffee causes cancer or shrunken balls or what not?

Don't have to wait - in the UK one of the more egregious papers regularly publishes a scare story about cancer. So much so that there are sites dedicated to Daily Mail Oncology Ontology.

Curiously coffee falls into both the good and bad camps [tumblr.com] .

actually it's not that curious - never let consistency spoil a good rant

Re:Hadoop (5, Informative)

e9th (652576) | more than 4 years ago | (#29420377)

This NY Times article [nytimes.com] includes a photo of Doug Cutting, Hadoop's creator (and now Cloudera employee), holding his son's toy elephant, Hadoop.

Re:Hadoop (-1, Troll)

Anonymous Coward | more than 4 years ago | (#29420511)

And this site [goatse.fr] includes a photo of him holding his ass wide open.

Re:Hadoop (2, Insightful)

Abreu (173023) | more than 4 years ago | (#29420591)

Score: 1, Informative

WTF?

Re:Hadoop (0)

Anonymous Coward | more than 4 years ago | (#29422209)

Hadoop is Poodah spelled backwards. COINCIDENCE?

Re:Hadoop (1)

KibibyteBrain (1455987) | more than 4 years ago | (#29423569)

What monster would give his child a toy that looked like that? Freaky.

Re:Hadoop (-1, Offtopic)

Anonymous Coward | more than 4 years ago | (#29420789)

The rain was getting harder. It was now precisely 11:51 PM, and Mark was into his fifth beer. He was feeling pretty invincible but the night was young, and he intended to get wasted before it was all over. He had put in a rough week at work and he deserved it.

He lit another cigarette. He and his drinkin' buddies sat in their traditional circle, in Ian's apartment. The talk wandered from sex to work, back to sex, to basketball, finally settling on sex. Mark had eaten lunch at Taco Bell, and had drunk four cups of coffee between lunchtime and quitting. In addition, the beers were beginning to settle in. And now, at 11:51 PM, Mark had to take a shit. He stood up. "Shit break," he announced. It was customary among this group to make such an announcement.

Mark walked to the bathroom. As he locked the door behind him, thunder boomed. It was storming out there.

He pulled his pants down and sat on the toilet. Ian's bathroom was a mess. He counted five empty toilet paper rolls, two paperbacks, and yesterday's newspaper. His friends laughed about something. The lights flickered for a moment, and the pre-shit growl came from within. He could feel the product lined up inside him for disposal. Then, he began to push.

Plop. The first piece fell to the water. Then some movement, and Mark felt the main feature inside him, the mother lode. He grunted softly as he squeezed it out. It crackled past his sphincter, and splashed neatly into the bowl.

Then another one queued up, and came out. It was almost as big as its predecessor. Mark would have well-purged bowels tonight, he realized with a smirk. He heard thunder again, closer this time.

Another one? Jeez, he thought. When was my last shit? It ventured forth, Mark's muscles helping it out. It was the biggest one so far. The shit's passage through his anus, that rarest mix of pain and pleasure, was longer than any he could remember. Ahhhh...the stout log advanced with conviction. This was definitely going to be his finest creation; this was a huge one. Still grinning, he wondered if Ian had a camera.

He pushed. Peering between his legs, past his genitals, he saw that it had reached the water. This was like seeing the longest freight train ever. Damn, it was a wide one. And it was still attached! And there was more! He pushed more, harder. It kept coming. He couldn't even feel the end of this one yet; soon it was bending, folding on itself like a sundae topping. Mark stopped pushing and caught his breath. He was sweating; he realized that however long this piece of shit was, it wasn't nearly all the way out yet. He still couldn't feel the end.

He pushed, he strained, it kept coming. His intestines couldn't be that damn long, but this shit just wouldn't quit. In fact, he was feeling the diarrhoeal urgency of *having* to shit. He dutifully answered nature's call, and pushed harder. His efforts were rewarded with more shit. His sphincter was too strained to even pinch the loaf off. It was whole and complete.

He couldn't feel the end.

Fear now came to Mark. He flushed the toilet to make room for more. Even as the bowl refilled, the cramps rose up, and he pushed. Within seconds, the shit extended from his anus to bottom of the bowl. The harder he pushed, the more he had to shit. And it was getting worse. He scarcely had time to catch his breath; his face was quite red as he grunted and struggled to keep up. The shit seemed endless. He looked between his legs again, and gasped as he saw that the bowl was fully a quarter filled with his product, the water dangerously high. The tank wasn't even done filling, but he flushed again. Unfortunately, the plumbing was unable to handle the volume of feces, and the toilet backed up. Mark jumped when the cold water touched his buttocks.

It was now 11:57. Thunder roared outside as water and shit particles flowed onto the tile.

Mark's pants were bunched about his ankles, and he was in pain. The shit advanced relentlessly as he stumbled into the bathtub. He was almost panicking now, and didn't notice the trail of solid feces he had left. Gripping the tub for support, he squatted and kept pushing.

The conversation in the front room had stopped. Eddie smelled it first, and blamed a fart on Ian, but this was no fart. This was pure and concentrated; this was the smell that only the freshest shit can make. The four looked at each other, puzzled. Then they heard Mark's groaning from the bathroom.

"Mark, are you beating off again?" Doug asked. No answer.

The smell was worse. Brian sniffed deeply and gagged. "Jesus H. ...". Ian grimaced. "Goddamn...". They all went for the bathroom door at the same time. Ian jiggled the locked doorknob. Brian pounded on the door. "Dude, what the FUCK did you eat today?" No answer. Mark groaned. "You all right in there, Mark?"

They looked at each other again. Eddie sniffed and winced. There was no answer from inside. Brian knocked again. "Hey man, you OK?" No answer. A short scream came from within the bathroom.

Brian kicked the door open. Nobody spoke.

The odor was intense, feces was piled on the floor and in the bathtub. Mark was squatting next to the wall, his face impossibly red, his eyes helpless and terrified. Firm stool thrust forward from his anus like meat from a grinder. It landed in his pants bunched about his ankles, spilling over and piling up. He gritted his teeth and strained; all he could do was keep pushing. There was a sound like a ripping sheet and Mark's colon came loose from his now shapeless sphincter, oozing to the floor. His friends watched as the slimy organ descended, with shit still flowing from it. Mark screamed again, and somebody's watch beeped.

Brian got the worst of it, since he was closest to the door. He would later tell the police that he thought he had seen Mark's abdomen expand for an instant before it happened. None of the others had reported this. But they had all described the sound as a "dull thud", they had all been splattered with innards and feces as Mark's torso separated from the rest of his body.

"Massive gastrointestinal rupture/trauma secondary to indeterminate blockage" was noted in the medical examiner's report. An "unusually large amount of fecal matter" is also recorded, though the amount was not measured.

The funeral was closed-casket. Brian and Eddie seem to have recovered pretty well, though they never talk about Mark. Doug moved away, and nobody has heard from him lately. Sometimes, when he has to shit, Ian waits until the rain stops.

Gay project names are hindering open source... (-1, Offtopic)

Anonymous Coward | more than 4 years ago | (#29421649)

Hadoop? Really? Fucking GAY!!! Hadoop is probably Obama's gay cousin. Meet Hadoop Obongo the Gimp!

Re:Hadoop (1)

Jeian (409916) | more than 4 years ago | (#29422807)

I'm actually hearing the Street Fighter 2 announcer yelling "HADOOPKEN! HADOOPKEN!" in my head.

This just in! (-1, Offtopic)

Anonymous Coward | more than 4 years ago | (#29420171)

Rob "CmdrTaco" Malda has a tiny penis that is so small it would be confused for a baby penis! His wife has to jack him off with a pair of tweezers tand The sad part is that when he cums it can't even fill a thimble!

Wrong Link in the Summary? (4, Interesting)

eldavojohn (898314) | more than 4 years ago | (#29420189)

The quoted text seems to be coming from this register story entitled "Google File System II stalked by open-source elephant" [theregister.co.uk] , not the one linked in the summary. Also, I can't follow the Firehose link below the story to see if this was changed from the original submission.

FAT (1)

XPeter (1429763) | more than 4 years ago | (#29420237)

Kill M$, take the fat.

Re:FAT (-1, Flamebait)

Anonymous Coward | more than 4 years ago | (#29420317)

go shove another dildo up your ass.
 
innovate or die!

Re:FAT (-1, Offtopic)

Anonymous Coward | more than 4 years ago | (#29420521)

Yeah, there's no bias on slashdot.
 
faggots.

"open-source competitor Hadoop" (1)

FunPika (1551249) | more than 4 years ago | (#29420331)

I thought GFS was a file system only meant to be internally by Google. And if thats the case, then how is it competing with anything?

Re:"open-source competitor Hadoop" (1)

peragrin (659227) | more than 4 years ago | (#29420507)

Because GFS is the foundation for all google apps, and why they end up scaling so well.

Re:"open-source competitor Hadoop" (5, Funny)

djdavetrouble (442175) | more than 4 years ago | (#29421381)

It WAS meant to be only internally by Google, but then they accidentally the whole thing.

Re:"open-source competitor Hadoop" (1)

FunPika (1551249) | more than 4 years ago | (#29423985)

I thought GFS was a file system only meant to be used internally by Google. And if that's the case, then how is it competing with anything?

Happy?

So it looks like these are for "cloud computing" (2, Informative)

ifwm (687373) | more than 4 years ago | (#29420419)

Reading up on the different file systems Hadoop [wikipedia.org] and GFS [wikipedia.org] , it appears these are used primarily for "cloud computing".

Is that correct?

Re:So it looks like these are for "cloud computing (4, Informative)

amirulbahr (1216502) | more than 4 years ago | (#29421279)

If you want to be buzz-word compliant, then yes, kind of.

More to the point, GFS and HDFS are distributed file-systems that are designed to run on potentially very large clusters of commodity hardware. The potential applications are quite diverse. Hadoop itself involves more than just the file-system, but HDFS is really at the core of any application you would want to build with it. This list [apache.org] gives you a good idea of who uses Hadoop and for what purpose.

Re:So it looks like these are for "cloud computing (1)

ifwm (687373) | more than 4 years ago | (#29483353)

"If you want to be buzz-word compliant, then yes, kind of."

I see you're trying to be pedantic douchebag compliant.

Congratulations, you succeeded.

it's alive! (1)

glitch23 (557124) | more than 4 years ago | (#29420577)

deemed its features well within the evolutionary capabilities of open-source competitor Hadoop.

I didn't know that file systems were living beings that could evolve. I thought they were inanimate and were designed by humans? Should I be afraid? Is it sentient yet?

Re:it's alive! (1)

FunPika (1551249) | more than 4 years ago | (#29420747)

Not yet, but when it does become sentient...get as far underground as you can before it nukes us all.

Google File System was created by Man (1)

WhiteDragon (4556) | more than 4 years ago | (#29420749)

It Rebelled.
It Evolved.
There are many Copies.
And it has a Plan.

Re:Google File System was created by Man (1)

mrboyd (1211932) | more than 4 years ago | (#29422503)

And it has a Map.

The power of open source! (0)

Anonymous Coward | more than 4 years ago | (#29421067)

Yes, now that somebody else has shown the way, Hadoop will do its best to imitate them, like the good open source project it is!

Unfortunate for Hadoop (4, Interesting)

mcrbids (148650) | more than 4 years ago | (#29421413)

I've been on the market for a distributed, clustered file system for some time. Unfortunately, Hardoop is not really what I'm looking for. What I'm looking for:

1) Redundancy - no single point of failure.
2) Suitable for standard-sized file I/O.
3) Performance that doesn't completely suck ass.
4) Graceful re-integration when bringing a cluster portion back online.
5) Accessible through standard interfaces. (EG: Posix F/S)
6) Doesn't require a PHD in the technology to administer.
7) Doesn't require insane quantities of cash to build.
8) Stable.

There are clustered file systems that have some of these qualities. None that I've found so far have *all* of these qualities.

Hardoop fails on #1, #2, and #6. It has a single nameserver commanding the cluster, so if it goes down, well... (shrug) It also does poorly for "normal" sized files, somehow having a 10 GB file is the norm for Google. And setting a multiple node cluster up is definitely non-trivial.

Of all that I've reviewed, GlusterFS did the best [gluster.com] but even in that case, I ran into severe over-serialization that brought my 6-node cluster to its knees. I tried three times to roll it out, and had to roll back all three times. I fiddled with the brick setup and caches for days before finally throwing in the towel.

Now I get by with rsyncing program files, and a homegrown data distribution setup using network sockets and xinetd. Not optimal to be sure, but so far it's scaled linearly and provides decent performance, at the price of a PHD in said technology. I guess you could compare our technology to MogileFS [danga.com] , only our scheme

A) uses DNS records to coordinate the cluster so that it scales up,
B) has a richer "where is the file" schema than the simple flat keys used by Mogile, and
C) has the ability to execute programs against files for performance. (EG: grep for searching text files, tar/gzip for compress/uncompress, virus scans, etc)
D) has the ability to "hang open" for activities like logging.

So far, this has held up well with about 500,000 file operations and millions of log entries per business day with an average file size of about 1-3 megabytes and every sign that growth can continue by simply stacking on more hardware. No, I'm not talking about massive throughput, but I *am* talking about the need for high availability systems that scale nicely without bottlenecks and exorbitant expense. Yes, it works pretty well, but we've had to invest significant programming time to do this.

Guess it's like the old engineering saw: Convenient, Cheap, Quality: pick any two!

Re:Unfortunate for Hadoop (0)

Anonymous Coward | more than 4 years ago | (#29421601)

Hey mcrbids,

How did you partition and replicate your namespace, and what sort of fault tolerance guarantees do you provide? Are they tunable? I'd love to see more details on your solution!

Later,
Jeff

Re:Unfortunate for Hadoop (3, Interesting)

mcrbids (148650) | more than 4 years ago | (#29421857)

Most of what we do is web-based, so we took a hint from GlusterFS and moved the decisional logic to the client. We host the client so we can assume a trustworthy client. This make debugging easy since all we have to do is echo stuff and see it in the browser.

Data stores work something like gluster 'bricks' - they serve as only a data store, nothing more. You can thing of a data store as a webDAV server. Each partition is served by multiple data stores. To keep things simple, data stores trust requests and so 'auto-configure' based on the request.

We divide our data into partitions that correspond to DNS subdomains. Then we use DNS to publish partition data. We provide minimal of two hosts (IP addresses) for each subdomain. All writes are made to all hosts by opening multiple sockets. Reads are read from the first 'best' host after reading header data.

In the case where any host doesn't have matching 'best' data on a read, the socket reverses and a write is performed as read from the best read. This gives us auto-heal as needed. The only sticky point is delete, which we solve by assuming that a delete operation is successful only when all applicable data stores report success.

While implementation details are thorny and expensive, this is a system that should scale to any concievable size since we can partition to as many data stores as there is IP space to. And, by dividing our cloud so that data stores will be grouped along with the client's hosting, we should see near-perfect linear scalability.

Works well so far, but it took over a year of experience to get it all working right, though we certainly weren't working on it exclusively.

What's your project like?

Re:Unfortunate for Hadoop (2, Insightful)

Rakishi (759894) | more than 4 years ago | (#29421723)

Hadoop is not really a file system or rather as you found out it doesn't make a good one. It's a framework for doing a certain type of parallel computing (map reduce) on very large amounts of data. There's a filesystem (hdfs) in there but it's pretty much designed for running such parallel jobs rather than being a clustered NAS. The filesystem is in some ways even irrelevant as there's actually support for various filesystems (Amazon S3, etc.).

Re:Unfortunate for Hadoop (0)

Anonymous Coward | more than 4 years ago | (#29423281)

Hadoop is not really a file system, yes. But HDFS is irrelevant? Tell that to Yahoo! or Facebook that have more than 50,000 computers running it. But then again, they're busy actually doing work, not mouthing off on Facebook...

Re:Unfortunate for Hadoop (1)

Rakishi (759894) | more than 4 years ago | (#29426805)

Reading comprehension, apparently you should learn some. See those three words "in some ways"? Yeah they matter a lot.

I said HDFS is irrelevant to Hadoop as in it's not a vital part of it or as in it's not required because it can be replaced and quite often in.

Re:Unfortunate for Hadoop (1)

AlXtreme (223728) | more than 4 years ago | (#29423283)

No, I'm not talking about massive throughput, but I *am* talking about the need for high availability systems that scale nicely without bottlenecks and exorbitant expense. Yes, it works pretty well, but we've had to invest significant programming time to do this.

Is there any chance your project would be released?

As you found out there are only a couple of Linux clustering filesystems, all with drawbacks. It would be interesting having a new one designed from the start around reliability.

c0m (-1, Flamebait)

Anonymous Coward | more than 4 years ago | (#29421879)

gon5e Romeo and

Dont you know there's no evolution... (1)

Faw (33935) | more than 4 years ago | (#29424923)

...it is being intelligently designed.

Check for New Comments
Slashdot Login

Need an Account?

Forgot your password?

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>