Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Hadoop 1.0 Released

timothy posted more than 2 years ago | from the doowop-doobie-dee-do-hadoop-whaeeeee dept.

Software 38

darthcamaro writes "There has been a tonne of hype about Big Data and specifically Hadoop in recent years. But until today, Hadoop was not a 1.0 release product. Does it matter? Not really, but it's still a big milestone. The new release includes a new web interface for the Hadoop filesystem, security, and Hbase database support. '"At this point we figured that as a community we can support this release and be compatible for the foreseeable future. That makes this release an ideal candidate to be called 1.0," Arun C. Murthy, vice president of Apache Hadoop, said.'"

Sorry! There are no comments related to the filter you selected.

What is Hadoop? (5, Informative)

Anonymous Coward | more than 2 years ago | (#38589606)

From Wikipedia:

Apache Hadoop is a software framework that supports data-intensive distributed applications under a free license.[1] It enables applications to work with thousands of nodes and petabytes of data. Hadoop was inspired by Google's MapReduce and Google File System (GFS) papers.

Hadoop is a top-level Apache project being built and used by a global community of contributors,[2] written in the Java programming language. Yahoo! has been the largest contributor[3] to the project, and uses Hadoop extensively across its businesses.[4]

Hadoop was created by Doug Cutting,[5] who named it after his son's toy elephant.[6] It was originally developed to support distribution for the Nutch search engine project.[7]

...in case you're as ignorant as I am. Post anonymously to avoid karma whoring.

Re:What is Hadoop? (-1)

Anonymous Coward | more than 2 years ago | (#38589654)

Anyone as "ignorant" as you could have easily searched for it as you did. All you did was allow people to actually be ignorant, more than you pretend to be.

Re:What is Hadoop? (1)

Imbrondir (2367812) | more than 2 years ago | (#38589750)

Why stop there? Google is just a robot instead of a human that provides the information at your convenience. No! A thorough quest searching all the lands and oceans, involving 2 competing pirate clans, a fair lady and a magic unicorn, all spanning 7 years is the only thing that will allow you to not be ignorant.

Either that or you're confusing "ignorant" with "lazy"

Re:What is Hadoop? (1)

Anonymous Coward | more than 2 years ago | (#38589958)

I'm confused, so tell me. Was the submitter ignorant of the fact that many people here had no idea what Hadoop was or too lazy to include a quick description?

Re:What is Hadoop? (0)

Anonymous Coward | more than 2 years ago | (#38590098)

All you did was allow people to actually be ignorant

You seem to be ignorant of the meaning of ignorant.

Re:What is Hadoop? (-1)

Anonymous Coward | more than 2 years ago | (#38590136)

God you're a fag. TFS assumes we just know what this stupid thing is... It's not worth googling for me so props to the AC that took one for the team and fuck you for being worthless.

Re:What is Hadoop? (1)

Icyfire0573 (719207) | more than 2 years ago | (#38592776)

this guy! the most insulting way possible to say be thankful for someone else adding the information you needed to add to the discussion.

Re:What is Hadoop? (0)

Anonymous Coward | more than 2 years ago | (#38590162)

Au contraire, my dear Twatson.

You see, many a web surfer would not have reached for the goog, but just kept on skimming along.

Most honorable GP Coward dug out this tidbit of information and allowed the ignoramii mindlessly gliding by to learn a little something instead of nothing.

Re:What is Hadoop? (0)

Anonymous Coward | more than 2 years ago | (#38592216)

Posting that saved 10sec per viewer who otherwise would've had to load wikipedia and search for it.

10sec per each viewer is a lot. He did more than you did with your pointless whine.

Re:What is Hadoop? (0)

Anonymous Coward | more than 2 years ago | (#38589690)

Thanks for clarifying. I was thinking it was that thing that kicks people off the internet in France

Re:What is Hadoop? (1)

Pieroxy (222434) | more than 2 years ago | (#38589976)

Thanks for clarifying. I was thinking it was that thing that kicks people off the internet in France

That's Hadopi. Pretty bad. Nobody's been kicked out yet, they're still pondering the sanity of it all I guess. Anyways, the law is in effect.

Of course, it's dead easy to circumvent it since they only monitor the eMule network, on which you can barely find anything anymore anyways.

Re:What is Hadoop? (0)

Anonymous Coward | more than 2 years ago | (#38589820)

I thought the name came from an answer given by a well-known Perl hacker:

"Got objects? Sure, we've hadoop since Perl 5."

Re:What is Hadoop? (0)

Anonymous Coward | more than 2 years ago | (#38593486)

Thank you.
I still cannot believe this happens so often, than neither the submitter nor the editor take the time to add such simple links to Wikipedia to the article.
It's like they want to piss us off.

What the heck is Hadoop? (5, Funny)

Anonymous Coward | more than 2 years ago | (#38589664)

Um, what the heck is Hadoop? A filesystem? Linux distro? Database software? Something to do with web servers? Throw me a bone here, man. Why does this 'Big Data' need capitalization?

And most importantly, why did they go with the British spelling for 'tonne'? Is this a product of the UK?

What, read the article? Are you mad?

Re:What the heck is Hadoop? (1)

Alioth (221270) | more than 2 years ago | (#38594104)

A tonne and a ton are different things. A tonne is a metric measure, 1000kg, a ton may be a short or long ton, and is some odd number of lbs (in the order of 2000 lbs or so).

Re:What the heck is Hadoop? (0)

Anonymous Coward | more than 2 years ago | (#38607814)

And all this time I've thought that the imperialists (as in measurement) have been using a nice round metric unit instead of say, barrel eq. of water or sth :)
Thank god I'm not an engineer.

Better late than never (5, Informative)

abigor (540274) | more than 2 years ago | (#38589692)

It was actually released over a week ago, but I guess the announcement got lost over the holidays. I am actually a bit surprised they did a 1.0 version before solving the "NameNode is a single point of failure" problem with HDFS. I know for a fact that big companies (one of which was a client) are sometimes hesitant to deploy Hadoop because of this.

In theory, you can also use Hadoop with purportedly more robust distribute file systems, like KFS (Kosmos File System, I think it's called). I've never seen this in the wild though.

Re:Better late than never (0)

Anonymous Coward | more than 2 years ago | (#38590420)

You know that this 1.0 thing was updated from 0.20.205.
Both 0.20.205 and 1.0 is beta, not stable and not for production.
If you want to use Hadoop on production, you have to use 0.20.203

And there's Hadoop 0.23. This is also beta, but have Federated HDFS, Namenode-HA and MapReduce2.0

They will release Hadoop 0.23 as Hadoop2.0 in 2012 Q3, maybe.

Re:Better late than never (0)

Anonymous Coward | more than 2 years ago | (#38592914)

Anonymous Coward, you are mostly correct. Hadoop became 1.0 mostly out of greed by the various commercial interests in an attempt to sell to enterprises. There were a lot of plans to fix a lot of things for a 1.0 release that were pushed aside in order to get more $$.

Re:Better late than never (1)

Anonymous Coward | more than 2 years ago | (#38590602)

Something else I thought that got missed over the holidays was this:

HPCC Systems From LexisNexis Breaks World Record on Terasort Benchmark [msn.com]

Pretty amazing when you consider how little code it took to run the 100GiB sort, never mind that it was faster than hadoop using 1/5 of the hardware. Being able to read in from disk, perform network calls, compute the sort, and write back down to disk just over 1 gigabyte a second is BLAZING fast.

Re:Better late than never (1)

allenw (33234) | more than 2 years ago | (#38592882)

... except 100 gigabytes is not 1 terabyte.

Re:Better late than never (0)

Anonymous Coward | more than 2 years ago | (#38596416)

... except 100 gigabytes is not 1 terabyte.

Terasort doesn't imply 1 terabyte. It is a type of sort operation. Hadoop posted a world record doing a 100 gigabyte terasort in 130sec, and HPCC Systems ran the same 100 gigabyte terasort and did it in 98sec.

Version 2.0 out in six weeks (1, Funny)

Anonymous Coward | more than 2 years ago | (#38589858)

Now it's released 1.0. it can increase Mozilla style.

Re:Version 2.0 out in six weeks (1)

Icyfire0573 (719207) | more than 2 years ago | (#38592790)

To Infinity.... and beyond?

Losing security for speed (0)

Anonymous Coward | more than 2 years ago | (#38590316)

I am a little ignorant to it because reading a PDF I downloaded is a world away from experience however the chief systems architect is (at least) conceptually enthralled, especially with how easily load can be distributed (I am not keen on replication and use it sparingly because setting the rules for merge replication a real minefield in a fast evolving environment), I personally think the strict adherence to structure that our regular database enforces has been nothing but a god send when a buggy release of software has interacted with the database. I also think that in this instance we shouldn't set any trends and let others iron out the bugs because as a company data is everything to us, so unusually I am the conservative one. I'll need convincing long before I can convince myself.

Re:Losing security for speed (1)

Sarten-X (1102295) | more than 2 years ago | (#38590462)

If your data's integrity is absolutely necessary, Hadoop (or more specifically HBase, which is the part most closely analogous to a database) is probably not for you. On the other hand, if you're working with statistics or any other application where an error affects your product trivially, you may find the speed is worth it, bearing in mind that changes are broadcast across the cluster "eventually". The strengths and weaknesses of Hadoop are different from a traditional database enough that I'd caution your architect against migrating an existing application just for the sake of speed. There are numerous pitfalls and headaches down that path.

No Point, Really (0)

Anonymous Coward | more than 2 years ago | (#38590328)

Off-topic, but this reminds me of something i found out yesterday: "HABOOBS" is a playable word in Words With Friends. Seriously, try it.

Re:No Point, Really (1)

Ramin_HAL9001 (1677134) | more than 2 years ago | (#38592136)

Haboob (n) A violent and oppressive wind blowing in summer, esp. in Sudan, bringing sand from the desert

I imagine Haboob will be the Apache foundation's non-Java version of Hadoop. Seriously, if big data is the application, better to run it on metal, not on a virtual machine.

Doesn't matter. (1)

kelemvor4 (1980226) | more than 2 years ago | (#38590356)

"There has been a tonne of hype about Big Data and specifically Hadoop in recent years. But until today, Hadoop was not a 1.0 release product. Does it matter? Not really

Wasn't there a slogan about "news for nerds, stuff that matters" around here somewhere?

Re:Doesn't matter. (1)

MurukeshM (1901690) | more than 2 years ago | (#38591504)

Wasn't there a slogan about "news for nerds, stuff that matters" around here somewhere?

Key word being was. Or is it still around?

Hadoop is a distributed computing platform (4, Informative)

Anonymous Coward | more than 2 years ago | (#38590358)

Seems a fair number of you are unaware of what Hadoop is.

Hadoop is a platform that enables distributed computing. Specifically, it supports map/reduce programming in a manner similar to Google's App Engine, except that it is open source. It supports distributing data for redundancy and/or scalability (in other words, you can have multiple copies of each data item on multiple computers, or you can split a data set across multiple computers, or both, with the data set sharded across multiple machines but with copies of each shard on multiple machines).

There is a distributed filesystem built on top of hadoop called HDFS. There is a distributed key/value store (somewhat analogous to a database...actually, scratch that, it's a distributed hash map) called HBase. There are also a number of distributed computing libraries built on top of Hadoop, like Mahout (for machine learning), Hive (for ad-hoc querying of large data sets), and Pig (another distributed computing model that some consider to be easier than map/reduce).

The whole setup provides a distributed computing model similar to Google's distributed environment, supporting very large clusters, map/reduce programming, and distributed storage of very large and/or spare matrices and tables.

Re:Hadoop is a distributed computing platform (0)

Anonymous Coward | more than 2 years ago | (#38596230)

There is a distributed filesystem built on top of hadoop called HDFS. ...

I would say that Hadoop runs on top of HDFS. That is, the Hadoop layer expects to use a distributed filesystem which has some location awareness (allowing processing tasks to be sent to the nodes containing the data) and a level of resilience to individual datanodes failing (there is at least 2-way replication of data assumed in a typical HDFS configuration). The HDFS API expected by Hadoop can be implemented by other filesystems, such as Kosmos File System, mentioned above.

Re:Hadoop is a distributed computing platform (1)

FremlinsMan (2450290) | more than 2 years ago | (#38599400)

....

There is a distributed filesystem built on top of hadoop called HDFS.

...

I would say that Hadoop runs on top of HDFS. That is, the Hadoop layer expects to use a distributed filesystem which has some location awareness (allowing processing tasks to be sent to the nodes containing the data) and a level of resilience to individual datanodes failing (there is at least 2-way replication of data assumed in a typical HDFS configuration). The HDFS API expected by Hadoop can be implemented by other filesystems, such as Kosmos File System, mentioned above.

Apos for posting this comment again, but I had the suspicion that posting anonymously was hiding my comment...

It's too much for average user... (0)

Anonymous Coward | more than 2 years ago | (#38591616)

Hadoop is solid, has a lot of features, big companies use it. It seems great... And yet I prefer GridGain due to its simplicity, ease of use and development speed. It's like EJB vs service coded in PHP - for most pages/services EJB is just overkill.

Apache ZooKeeper (1)

vikingpower (768921) | more than 2 years ago | (#38593844)

ZooKeeper is a subproject of Hadoop ( and BookKeeper a sub-subproject, so to say ). I have been using both for a while now, and must say I am astonished about their resilience. Great products. Moreover, ZooKeeper is a valiant attempt at solving one of computer science's oldest standing problems: leader election in a ring. Hooray Hadoop, keep the good work going !

I hear... (0)

Anonymous Coward | more than 2 years ago | (#38596692)

it runs on GNU HURD as well.

Hadoop (0)

Anonymous Coward | more than 2 years ago | (#38620592)

Why not JHadoop? It's good to know it's another made up word for java stuff.

Check for New Comments
Slashdot Login

Need an Account?

Forgot your password?