Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

VMware's Serengeti Brings Hadoop To Virtual, Cloud Environments

Soulskill posted more than 2 years ago | from the lions-to-be-added-in-future-patch dept.

Cloud 28

Nerval's Lobster writes "VMware's Serengeti is a new open-source project for deploying Apache Hadoop in virtual and cloud environments. Serengeti 0.5 is available as a free download under the Apache 2.0 license. It has been designed as distro-neutral, with support for Apache 1.0, CDH3, Hortonworks 1.0 and Greenplum HD 1.0. Of course, VMware isn't the only company seeking to leverage the increased interest in Hadoop. In June alone, midsize IT vendors such as Datameer, Karmasphere, and Hortonworks have all announced platforms that utilize the framework in some way. Research firm IDC recently predicted that worldwide revenues from Hadoop and MapReduce will hit $812.8 million in 2016, up from $77 million in 2011."

cancel ×

28 comments

Sorry! There are no comments related to the filter you selected.

first (-1)

Anonymous Coward | more than 2 years ago | (#40312511)

first

Re:first (-1)

Anonymous Coward | more than 2 years ago | (#40312553)

second

number ordering (-1)

Anonymous Coward | more than 2 years ago | (#40312585)

... will hit $812.8 million in 2016, up from $77 million in 2011.

Nothing to do with the story, just want to voice a pet peeve of mine.

Does anyone else find the above format annoying? I prefer it when things are written "it changed from a value of X [in 20aa], to Y [in 20yy]."

The style of using "will got to Y, from a value of X" always seemed to be to go against the 'natural' flow: source->destination; start->finish.

Anyway, just had to get that off my chest.

Re:number ordering (0)

Anonymous Coward | more than 2 years ago | (#40317347)

Agreed I initially read that they were loosing money, until I re-read it.

I have to admit... (3, Funny)

busyqth (2566075) | more than 2 years ago | (#40312793)

So, if I've got only one server, then up to now, I would have to just run an application on that server.
But now, with only a little overhead, I can pretend to be running the same application in a distributed manner on a cluster, even though it's actually still running on the single server.
I have to admit this is pretty awesome.

Re:I have to admit... (2)

ArsonSmith (13997) | more than 2 years ago | (#40313025)

Or you have a smaller cluster of big boxes running a big cluster of smaller boxes(vms)

Re:I have to admit... (1)

Anonymous Coward | more than 2 years ago | (#40313125)

Just imagine, you can build a beowulf cluster with just one computer, instead of dozens!

Modup. (1)

PerfectionLost (1004287) | more than 2 years ago | (#40313631)

Sadly, no mod points today.

Re:I have to admit... (1)

tom17 (659054) | more than 2 years ago | (#40313687)

So there are a lot of new names and jargon in the summary that I am not yet familiar with, but could you not just do this before using virtual machines?

I am sure I am missing the bigger picture here somehow...

Re:I have to admit... (3, Interesting)

abigor (540274) | more than 2 years ago | (#40314011)

Yes, of course you can manually set up Hadoop in whatever environment, but it's a pain and generally speaking management is annoying. This new project appears to alleviate at least some of that, making it easy to remotely deploy and manage a Hadoop cluster. At least, that's what I got from the demo video - there's probably more to it.

Regarding Hadoop, I'm always surprised by its popularity given the relative fragility of HDFS (the NameNode is a single point of failure; other distributed filesystems have beaten this problem) and the dubious, beta-like quality of the tools built on top of it (Pig, etc.)

Re:I have to admit... (1)

Rakishi (759894) | more than 2 years ago | (#40314385)

Regarding Hadoop, I'm always surprised by its popularity given the relative fragility of HDFS (the NameNode is a single point of failure; other distributed filesystems have beaten this problem) and the dubious, beta-like quality of the tools built on top of it (Pig, etc.)

So what's the alternative you recommend?

Re:I have to admit... (1)

Tsiangkun (746511) | more than 2 years ago | (#40314891)

pick a different storage backend.

Re:I have to admit... (1)

Rakishi (759894) | more than 2 years ago | (#40315099)

So you don't actually know of any that are competitive. Got it, thanks for playing.

Re:I have to admit... (1)

codepunk (167897) | more than 2 years ago | (#40316745)

Be careful what you ask, next they will be telling you to just run it on mysql since it works great for their wordpress site.

Who is really using this stuff? (0)

Anonymous Coward | more than 2 years ago | (#40313973)

Maybe I'm just getting old, but how many applications for distributed computing actually exist? Can't most jobs be done on a single computer, or are programmers just getting too lazy to optimize?

Shiny - High Revenue (2)

Lord Grey (463613) | more than 2 years ago | (#40313977)

From TFS:

Research firm IDC recently predicted that worldwide revenues from Hadoop and MapReduce will hit $812.8 million in 2016, up from $77 million in 2011.

Notice that the revenue is directed toward the few companies supporting and extending Hadoop. If you're working for one of those companies, congratulations. If you're working for one of the companies that is spending its money on this new shiny thing, you're probably in for a ride (one way or another). The technology is definitely good, I'll grant you that. But it is not the solution (or, not a very good solution) for many of the problems IT/data shops have. It really seems that a lot of people are jumping on the Hadoop bandwagon because "everyone else is getting it" and not because it will solve particular, concrete, existing problems. Or, it will solve exactly one relatively small, concrete, existing problem while erecting a complex infrastructure that must be supported for several years, making it more of a PITA than a solution.

Anyway, back to my original point: I think this revenue citation is more of an indication of a technology bubble and successful marketing than anything else. The price IT will pay for that bubble will probably far exceed the original cost.

Re:Shiny - High Revenue (3, Interesting)

Rakishi (759894) | more than 2 years ago | (#40314433)

As someone who actually uses Hadoop, you're so far off the mark you've hit a bystander in the head. Dealing with large amount of data is a major PITA. If you don't understand that then you must never have worked with anything but trivial data sets. Hadoop fixes much of it, period. Without having to spend insane amount of money on databases, DBAs and still not being able to scale properly. It's not optimal but it works, it scales and it's flexible.

That's why companies are moving to it.

Re:Shiny - High Revenue (1)

Anonymous Coward | more than 2 years ago | (#40314825)

+1

Once you have collections of hundreds of millions of objects and need to work with a trillion properties,
everything that you know about working with data stops working.

Hadoop is not being adopted because it is fun and trendy, but because it addresses real needs now with resources that are attainable.

Re:Shiny - High Revenue (0)

BitZtream (692029) | more than 2 years ago | (#40316781)

Hadoop fixes much of it, period. Without having to spend insane amount of money on databases, DBAs and still not being able to scale properly.

That just indicates you lack the skills and knowledge to deal with big data sets.

Congratulations, you're not the exact same type of person Microsoft aims to sell to.

Re:Shiny - High Revenue (3, Insightful)

Rakishi (759894) | more than 2 years ago | (#40317433)

Really, so what would you use to deal with 10Petabytes of data while going through 1Petabyte of it per day? I'm sure Google, Ebay, LinkedIn, Twitter, Yahoo and Facebook have no idea what they're doing. Hahaha. Get back to me when you're running a multi-billion dollar company.

That just indicates you lack the skills and knowledge to deal with big data sets.

Keep thinking that, I'm going to be over here enjoying my guaranteed job security with occasional breaks to beat hordes of recruiters away. No, seriously, keep thinking it, more idiots like you out there the better my job security will be.

See, I've actually got experience dealing with big data which I'm guessing is a lot more than you can say. I've talked to companies that think like you, it's downright hilarious to watch their jaws drop when you casually mention how trivial going through data is for us on Hadoop. A month long data project for them is a ten minute query for me that I let run for a few hours. I've personally played in both worlds, you can keep the alternative while I get actual work done.

Re:Shiny - High Revenue (1)

Lord Grey (463613) | more than 2 years ago | (#40317475)

As someone who also works with large amounts of data every day, I know exactly what I'm talking about. You may want to reread what I actually wrote.

Hadoop is a decent technology and is one approach to dealing with "Big Data" problems. There are other products out there, and for the most part they have all been around a lot longer than Hadoop. The problems all these products address have been around for quite some time, as most people know.

So what is the difference at this point in time? Did everyone's data suddenly get fat or something? No. What has happened is that Google published their version of a map and reduce algorithm (with ideas for dealing with associated things like storage), someone else built an open source engine around it, and some other people started publicizing it. There is no problem with any of this.

But then some companies mistakenly believe that their one-million-row MySQL database is "Big Data" and get their IT staff to adopt this shiny -- and it is shiny, you know, compared to the older systems -- technology for their OMG Huge Database. This is what I was talking about. This is a misapplication of technology. It's as bad as using a poorly-tuned Oracle RAC on true "Big Data" databases. Sure, it works. But it's the wrong solution and eventually the company pays a much bigger price than they originally thought.

My bet is that most of Hadoop's growth is due to the marketing and "me too" effects rather than true technological need.

Re:Shiny - High Revenue (1)

Rakishi (759894) | more than 2 years ago | (#40317565)

Hadoop is a decent technology and is one approach to dealing with "Big Data" problems. There are other products out there, and for the most part they have all been around a lot longer than Hadoop. The problems all these products address have been around for quite some time, as most people know.

So what are these alternatives? I like how people keep mentioning "alternatives" but never state them by name. Afraid of their actual flaws being ripped apart I guess. Always a vague "other options" statement.

Hadoop is inexpensive, flexible and well supported. It's cheaper overall than paying for some silly clustered RDBM licence which is optimized to solve a problem you don't actually care about. If you don't realize the specific set of problems Hadoop excels at solving then, frankly, you really don't understand the space. The fact that you think Hadoop is being used to solve the same problems as an RDBM pretty much says everything about your experience in this area. Hint: Hadoop is not a database, it's a batch processing system/data warehouse.

My bet is that most of Hadoop's growth is due to the marketing and "me too" effects rather than true technological need.

Keep thinking that if you want to, I'll take my actual experience with these companies and their problems.

Re:Shiny - High Revenue (1)

Rakishi (759894) | more than 2 years ago | (#40317711)

Also, a million rows of data? Most any decent web startup that does data is probably running at a million rows of data per day. Minimum. Maybe closer to a 100 million once they get around to collecting everything and got a few companies or users on board. Especially once you remove silly idiotically low restrictions on scaling and storage (unless you spend $$$$$). Got more data? Add more nodes, problem solved, get on with running the company. And they want to run complex analysis over the last year of data including, potentially, resource intensive machine learning. And they want to do it easily with maximum flexibility.

Like I said in another post, I've dealt with both worlds and you can keep your utterly limited systems if you want. When I want to do something with the data and the answer is "No" for any non-legal reason, well that's a show stopper. I don't care if there's a more "elegant solution" to some problem, if Hadoop solves the bottle neck most easily then Hadoop is the best tool for the job unless there's other issues it causes. I want to get work done, not play armchair philosopher.

Re:Shiny - High Revenue (0)

Anonymous Coward | more than 2 years ago | (#40322483)

Agreed. Hadoop is great tool when applicable, but likely over-prescribed by people more impressed with the bandwagon than the actual merits (and shortcomings) of the design.

There aren't many Googles and Yahoo's out there, so don't go thinking that's the norm by any means - and to the previous comment, even a 500 million row database isn't big by any means (further re-enforcing your point). Hadoop (and big data at large) really is letting people be lazy with processing on the front end (which is an interesting paradigm) -

Big Data is the new Cloud which was the new Virtualization - good technologies, over applied in their infancy.

Dumb question (0)

Anonymous Coward | more than 2 years ago | (#40315117)

I've been out of the virtualization world for a long time so this may be a dumb question but could someone explain me through the following:

Virtualization was created to share I/O and storage utilization of one hardware setup with multiple O/S installations thereby making sure that I/O and storage were being fully utilized with great management tools on top.

Hadoop was created to split the handling of gobs (yes gobs) of I/O and storage across multiple O/S / Apache installations thereby utilizing every available bit of bandwidth across multiple hardware setups (with bonuses for redundancy etc.).

How does virtualizing O/S's for Hadoop installations make any sense when virtualization just adds I/O overhead for each hardware setup that Hadoop on a single single O/S would fully utilize anyway? Pardon my ignorance but it doesn't make sense to me. Is it the speed at which you can drop in a functioning installation? One VMWare instance per hardware setup?

Real I/O on Virtual machines (1)

Blaskowicz (634489) | more than 2 years ago | (#40315261)

I've not tried the tech I will mention yet, but if your motherboard includes an IOMMU you can use physical networking and storage controllers on a virtual machine. it works as long the "passed-through" device is PCIe or PCI, be it onboard or on a card. so you can have racks of physical servers, with for instance one VM on each used as a node for your distributed file system. Virtualization still is useful for using the remainder of your physical server's capacity for other purposes. Or so I imagine it to be, as an armchair datacenter IT worker.

(PLUS ONE INFORMATIVE) (-1)

Anonymous Coward | more than 2 years ago | (#40315809)

will not work. And forwar3s we must

Microsoft has it as well (0)

Anonymous Coward | more than 2 years ago | (#40318581)

http://www.hadooponazure.com/

Check for New Comments
Slashdot Login

Need an Account?

Forgot your password?