Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Comments

top

Book Review: Puppet 3 Beginner's Guide

sagecreek Re:Slashvertisement my ass (81 comments)

Slashdot has specific guidelines for structuring and posting book reviews. Anyone is free to write a review and offer it for consideration by Slashdot's editorial staff. I tried out the "Puppet 3 Beginner's Guide" as a Puppet beginner. Then I offered my review as a guide for others who might be (1) considering Puppet and (2) curious about what the author has put inside his book--so they can decide if they want to spend money to buy it or not. You are free to think that somehow rises to the level of a dark, sinister "slashvertisement" conspiracy. And, likewise, I am free to LMAO.

about a year ago
top

Book Review: Hadoop Beginner's Guide

sagecreek Re:LMAO... HFDS? (57 comments)

It helps to have friends in high places. The HDFS correction has been made. So all who LMAO'ed because of my typo are now free to LYABO. Thanks for pointing out the mistake.

about a year and a half ago
top

Book Review: Hadoop Beginner's Guide

sagecreek Re:LMAO... HFDS? (57 comments)

Yup, that was my bad. It IS Hadoop Distributed File System (HDFS) and NOT Hadoop File Distribution System (HFDS). I had it right in front of me and still typed it wrong from some of my notes. I'll see if I can get it fixed. Thanks.

about a year and a half ago
top

Hadoop Beginner's Guide

sagecreek Book Review: Hadoop Beginner's Guide (1 comments)

Author: Garry Turkington Pages: 374 Publisher: Packt Publishing Rating: 9/10 Reviewer: Si Dunn ISBN: 978-1-84951-730-0 Summary: Shows how to use Hadoop software in Big Data settings.

about a year and a half ago

Submissions

top

Book Review: Puppet 3 Beginner's Guide

sagecreek sagecreek writes  |  about a year ago

sagecreek (2860541) writes "Puppet 3 Beginner’s Book John Arundel 184 pages Packt Publishing 978-1-78216-124-0 8 out of 10

If you are in charge of a small network with just a few servers, you may still be doing configuration management primarily by hand. And you may take particular pride in maintaining that “artisan” role.

After all, it’s mostly up to you to set up new users and their machines, fix current problems, manage the servers and their software, create databases and their user accounts, and try to keep the network and user configurations as uniform as possible despite running several different brands--and vintages--of hardware and software.

However, warns infrastructure consultant John Arundel, “[b]eyond ten or so servers, there simply isn’t a choice. You can’t manage an infrastructure like this by hand. If you’re using a cloud computing architecture, where servers are created and destroyed minute-by-minute in response to changing demand, the artisan approach to server crafting just won’t work.”

In his new book, Puppet 3 Beginner’s Guide, Arundel emphasizes: “Manual configuration management is tedious and repetitive, it’s error-prone, and it doesn’t scale well. Puppet is a tool for automating this process.”

Actually, among “UNIX-like systems,” there are at least three major configuration management (CM) packages — Puppet, Chef, and CFEngine — plus some other competitors, Arundel notes. He calls them “all great solutions to the CM problem...it’s not very important which one you choose as long as you choose one.” But he hopes, of course, you will check out Puppet and his new, well-written how-to book.

Puppet 3 Beginner’s Guide is structured to help system administrators “start from scratch...and learn how to fully utilize Puppet through simple, practical examples.”

Arundel’s book places important emphasis on the rapidly closing “divide between ‘devs,’ who wrangle code, and ‘ops,’ who wrangle configurations. Traditionally, the skills sets of the two groups haven’t overlapped much,” he notes. “It was common until recently for system administrators not to write complex programs, and for developers to have little or no experience of building and managing servers.”

Today, he points out, system admins are “facing the challenge of scaling systems to enormous size for the web, [and] have had to get smart about programming and automation.” Meanwhile, “[d]evelopers, who now often build applications, services, and businesses by themselves, couldn’t do what they do without knowing how to set up and fix servers.”

Therefore, “[t]he term ‘devops’ has begun to be used to describe the growing overlap between these skill sets,” Arundel emphasizes. “Devops write code, herd servers, build apps, scale systems, analyze outages, and fix bugs. With the advent of CM systems, devs and ops are now all just people who work with code.”

Arundel’s 184-page Puppet 3 Beginner’s Guide has 10 chapters that are smoothly structured with numerous headings, subheadings, short paragraphs, code examples, and other illustrations. He has generated his code examples using the Ubuntu 12.04 LTS “Precise” distribution of Linux. But he explains how to load the software using “Red Hat Linux, CentOS, or another Linux distribution that uses the Yum package system,” as well.

Chapter 1, “Introduction to Puppet,” explains the software’s basic architecture and shows how Puppet deals with large-scale configuration management problems.



In Chapter 2, “First Steps with Puppet,” the author details how to install Puppet, create a simple manifest, and apply it to a machine. He also offers some basic Puppet language examples.

Chapter 3, “Packages, Files, and Services,” focuses on “how to use these key resource types...and how they work together” and presents “a complete and useful example based on the Nginx web server.”

In Chapter 4, “Managing Puppet with Git,” Arundel shows “a simple and powerful way to connect machines together using Puppet, and to distribute your manifests and work on them together collaboratively using the version control system Git.”

The emphasis in Chapter 5, “Managing Users,” is on “good practices for user administration” and implementing them with Puppet. The chapter also covers “how to control access using SSH and manage user privileges using sudo.”

The topics covered in Chapter 6, “Tasks and Templates,” include using “Puppet’s resource types to run commands, schedule regular tasks, and distribute large trees of files.” Also covered: “how to insert values dynamically into files using templates.”

In Chapter 7, “Definitions and Classes,” Arundel explains “how to organize Puppet code into reusable modules and objects. We’ll see how to create definitions and classes, and how to pass parameters to them.”

Chapter 8, “Expressions and Logic,” dives deeper into Puppet code. It “shows how to control flow using conditional statements and logical expressions, and how to build arithmetic and string expressions. It also covers operators, arrays, and hashes.”

Chapter 9, “Reporting and Troubleshooting,” deals with what the author terms “the practical side of working with Puppet,” including diagnosing and solving common problems, debugging the software’s operations, and understanding Puppet’s error messages.

The final section, Chapter 10, “Moving on Up,” wraps up with a range of topics, including how to make Puppet code “more elegant, more readable, and more maintainable.” Arundel also offers “links and suggestions for further reading.” And he describes nine projects to help you “improve your skills and your infrastructure at the same time.” The projects, he says, “provide a series of stepping-stones from your first use of Puppet to a completely automated environment.”

Puppet’s maker, Puppet Labs, offers some virtual-machine options for learning the software. The choices are: (1) a VXM version recommended for VMware Fusion and VMware Workstation; and (2) an OVF version recommended for VirtualBox “and all other non-VMware virtualization software.” Puppet Labs also offers a Puppet Enterprise version of its software that supports up to 10 nodes free.

Along with Linux, Puppet will run on other several platforms, including Windows and Macs,, but you will find little help for those in Arundel’s book. You will need to use Puppet Lab’s online Mac or Windows documentation. And Windows may not be the greatest of choices. As the documentation notes: “Windows nodes can’t act as puppet masters or certificate authorities, and most of the ancillary Puppet subcommands aren’t supported on Windows.”

It can take a bit of work to get Puppet installed and configured. But once you have it running in a Linux environment, John Arundel’s new book can be a solid guide to helping you become both a proficient Puppet user and a more efficient, knowledgeable, and versatile system administrator."
top

Book Review: Creating Mobile Apps with jQuery Mobile

sagecreek sagecreek writes  |  about a year ago

sagecreek (2860541) writes "You can judge this book, at least in part, by the lengthy tagline on its cover: “Learn to make practical, unique, real-world sites that span a variety of industries and technologies with the world’s most popular mobile development library.”

jQuery might not be your favorite framework on the long, long list of JavaScript possibilities. But Shane Gliser unabashedly describes himself as a jQuery “fanboy...if it’s officially jQuery, I love it.”

Gliser is an experienced mobile developer and blogger who operates Roughly Brilliant Digital Studios. He also has some background in mobile UX (user experience), and both qualities show in this smoothly written, well-illustrated, 234-page how-to book that focuses on jQuery Mobile, a “touch-optimized” web framework for smartphones and tablets.

Don’t be surprised when you extract the book’s code examples and related items from a ZIP file that is almost 100MB in size. Gliser covers a lot of ground, and he covers it well in his 10 chapters. And each chapter contains a project.

The first thing you don’t do in Chapter 1, “Prototyping jQuery Mobile,” is work at a computer. In the true spirit of UX, Gliser briefly has you work with a pen and some 3x5 note cards. (Remember those?) Your initial goal is to roughly sketch out some designs for a jQuery Mobile website for a new pizzeria. But why the ancient technology? “We are more willing to simply throw out a drawing that took less than 30 seconds to create,” Gliser writes. And: “Actually sketching by hand uses a different part of the brain and unlocks our creative centers.” Furthermore, those on your team who are not coders can contribute comments, suggestions, and corrections to the emerging design.

In Chapter 2, “A Mom-and-Pop Mobile Website,” you step over to your computer with Chapter 1’s paper prototype in hand. You start converting the sketched design “into an actual jQuery Mobile (jQM) site that acts responsively and looks unique.” You also begin building “a configurable server-side PHP template,” and you work with custom fonts, page curl effects using CSS, and other aspects of creating and optimizing a mobile site.

“Mobile is a very unforgiving environment,” Gliser cautions, “and some of the tips in this section will make more difference than any of the ‘best coding practices.’” Indeed, he wants you to be aware of optimization “at the beginning. You are going to do some awesome work and I don’t want you or your stakeholders to think it’s any less awesome, or slow, or anything else because you didn’t know the tricks to squeeze the most performance out of your systems. It’s never too early to impress people with the performance of your creations.”

Chapter 3, “Analytics, Long forms, and Front-end Validation,” moves beyond “dynamically link[ing] directly into the native GPS systems of iOS and Android.” Instead, Gliser introduces how to work with Google static maps, Google Analytics, long and multi-page forms, and jQuery Validate. As for static maps, he says, “Remember to always approach things from the user’s perspective. It’s not always about doing the coolest thing we can.” Indeed, a static map may be all the user needs to decide whether to drive to a business, such as a pizzeria, or just call for delivery. And, as for Google Analytics: “Every website should have analytics. If not, it’s difficult to say how many people are hitting your site, if we’re getting people through our conversion funnels, or what pages are causing people to leave our site.”

Meanwhile, desktop users are familiar with (and frequently irritated by) long forms and multi-page forms. Lengthy forms can be real deal-breakers for users trying to negotiate them on mobile devices. The author presents some ways to shorten long forms and break them “into several pages using jQuery Mobile.” And he emphasizes the importance of using the jQuery Validate plug-in to add validation to any page that has a form, so the user can see quickly and clearly that an entry has a problem.

The focus in Chapter 4, “QR Codes, Geolocation, Google Maps API, and HTML5 Video,” is on handling concepts that can be “applied to any business that has multiple physical locations.” Gliser uses a local movie theater chain as his development example. It is “considering throwing its hat into the mobile ring,” so a site is created that makes use of QR codes, geolocation, Google Maps, and linking to YouTube movie previews. Then, he shows how to use embedded video to keep users on the movie chain’s site rather than sending them off to YouTube.

In Chapter 5, the goal is “to create an aggregating news site based off social media.” So the emphasis shifts to “Client-side Templating, JSON APIs, and HTML5 Web Storage.” Notes Gliser: “Honestly, from a purely pragmatic perspective, I believe that the template is the perfect place for code. The more flexible, the better. JSON holds the data and the templates are used to transform it. To draw a parallel, XML is the data format and XSL templates are used to transform. Nobody whines about logic in XSL; so I don’t see why it should be a problem in JS templates.”

Next, he shows how to patch into Twitter’s JSON API to get “the very latest set of trending topics” and “whittle down the response to only the part we want...and pass that array into JsRender for...well...rendering” in a manner that will be “a lot cleaner to read and maintain” than looping through JSON and using string concatenation to make the output.

Other topics in Chapter 5 include programmatically changing pages in jQuery Mobile, understanding how jQuery Mobile handles generated pages and Document Object Model (DOM) weight management, and working with RSS feeds. Gliser points out that there is still “a lot more information out there being fed by RSS feeds than by JSON feeds.” The chapter concludes with looks at how to use HTML5 web storage (it’s simple, yet it can get “especially tricky on mobile browsers”), and how to leverage the Google Feed API. Explains Gliser: “The Google Feeds (sic) API can be fed several options, but at its core, it’s a way to specify an RSS or ATOM feed and get back a JSON representation.”

Chapter 6 jumps into “the music scene. We’re going to take the jQuery Mobile interface and turn it into a media player, artist showcase, and information hub that can be saved to people’s home screens,” Gliser writes. He proceeds to show how “ridiculously simple it can be to bring audio into your jQuery Mobile pages.” And he explains how to use HTML5 manifest “and a few other meta tags” to save an app to the home screen. Furthermore, he discusses how to test mobile sites using “Google Chrome (since its WebKit) or IE9 (for the Windows Phone)” as browsers that are shrunken down to mobile size. “Naturally, this does not substitute for real testing,” he cautions. “Always check your creations on real devices. That being said, the shrunken browser approach will usually get you 97.5 percent of the way there. Well...HTML5 Audio throws that operating model right out the window.”

Since “mobile phones are quickly becoming our photo albums,” Gliser’s Chapter 7, “Fully Responsive Photography,” begins with creating a basic gallery using Photoswipe. Then, in a section focused on “supporting the full range of device sizes,” he shows how to start using responsive web design (RWD), “the concept of making a single page work for every device size.” The issues, of course, range from image sizes and resolutions to text sizes and character counts per line, on screens as small as smart phones and tablets, or larger.

In Chapter 8, “Integrating jQuery Mobile into Existing Sites,” three topics are key: (1) “Detecting mobile — server-side, client-side, and the combination of the two”; (2) “Mobilizing full site pages — the hard way”; and (3) Mobilizing full site pages — the easy way.” Gliser avoids some potential “geek war” controversies over “browser sniffing versus feature detection” when detecting mobile devices. He zeroes in first on detection using WURFL for “server-side database-driven browser sniffing.” He also shows how to do JavaScript-based browser sniffing, which he concedes may be “the worst possible way to detect mobile but it does have its virtues,” especially if your budget is small and you want to exclude older devices that can’t handle some new JavaScript templating. He also describes JavaScript-based feature detection using Modernizer, plus some other feature-detection methods.

As for mobilizing full-site pages “the hard way,” he states that there is really “only one good reason: to keep the content on the same page so that the user doesn’t have one page for mobile and one page for desktop. When emails and tweets and such are flying around, the user generally doesn’t care if they’re sending out the mobile view or the desktop view and they shouldn’t.” He focuses on how “it’s pretty easy to tell what parts of a site would translate to mobile” and how to add data attributes to existing tags “to mobilize them. When jQuery’s libraries are not present on the page, these attributes will simply sit there and cause no harm. Then you can use one of our many detection techniques to decide when to throw the jQM libraries in.”

Mobilizing full-size pages “the easy way” involves, in his view, “nothing easier and cleaner than just creating a standalone jQuery Mobile page...and simply import the page we want with AJAX. We can then pull out the parts we want and leave the rest.” His code samples show how to do this.

Chapter 9, “Content Management Systems and jQM” looks at the pros and cons of using three different content management systems (CMS) with jQuery Mobile: WordPress, Drupal, and Adobe Experience Manager. “The key to get up and running quickly with any CMS is, realizing which plugins and themes to use,” Gliser writes. “For WordPress, I would not recommend a jQuery Mobile plugin. As I was experimenting for this chapter, it broke the admin interface and was, in general, a miserable experience. However, there are several jQuery Mobile themes that will serve you well. Some are free, some paid.” He explains how to use mobile theme switchers.

Meanwhile, Drupal offers some standard plugins that provide contact forms, CAPTCHA, and custom database tables and forms, and enable you to “create full blown web apps, not just brochureware sites.” But: “The biggest downside to Drupal is that it has a bit of a learning curve if yo want to tap its true power, Also, without some tuning, it can be a little slow and can really bloat your page’s code,” he says.

As for Adobe Experience Manager (AEM), Gliser merely introduces it as a “premier corporate CMS” and a “major CMS player that comes with complete jQuery Mobile examples.” He doesn’t show “how to install, configure, or code for AEM. That’s a subject for several training manuals the size of this book.” He adds: “If you work for a company that can afford AEM, you’ll already be well-versed in the mobile implementation. The power this platform gives to content authors is astounding.”

Chapter 10, the final chapter, is titled “Putting It All Together — Flood.FM.” Using what you’ve learned in the book, including paper prototyping the interfaces, you create “a website where listeners will be greeted with music from local, independent bands across several genres and geographic regions.”

Along the way, Gliser introduces Balsamiq, “a very popular UX tool for rapid prototyping.” He discusses using Model-View-Controller (MVC), Model-View-ViewModel (MVVM), and Model-View-Whatever (MV*) development structures with jQuery Mobile. He introduces how to work with the Web Audio API , and he illustrates how to prompt users to download the Flood.FM app to their home screens. He finishes up with brief discussions of accelerometers, cameras, “APIs on the horizon,” plus “To app or not to app, that is the question” and whether you should compile an app or not. Finally, he shows PhoneGap Build, the “cloud-based build service for PhoneGap.”

Shane Gliser’s book does indeed cover a lot of ground, clearly and with good examples. If you truly demand that some nits must be picked, I can report that an occasional dash is missing or a comma sometimes shows up out of place, such as this example in Chapter 2: “A practice is only best until a new practice, [misplaced comma] comes along that is better.” Iin the printed book’s table of contents, there are style and spelling glitches in the heading for Chapter 3. “Analytics, long forms, and frontend validation” should be “Analytics, Long Forms, and Front-end Validation.” And, in Chapter 5, Gliser refers to the “Google Feeds API” when it’s actually “Google Feed API.” But the term “Google Feeds API” commonly is misused by developers on Stack Overflow and other sites.

I am not a mobile developer. I am a tech writer, frequent book reviewer, and occasional coder. I have played with some of the code examples in this book, but I have not tried them all. So I can’t say if there are code glitches. However, the book was reviewed before publication by at least four software professionals with impressive resumes.

Aside from occasional spots where the text needed tighter editing, this book is, in my view, well written and rich with information, examples, sources, and tips for working effectively with jQuery Mobile. I intend to put it to good use as I continue learning."

Link to Original Source
top

Book Review: Hadoop Beginner's Guide

sagecreek sagecreek writes  |  about a year and a half ago

sagecreek writes "Hadoop is an open-source, Java-based framework for large-scale data processing. Typically, it runs on big clusters of computers working together to crunch large chunks of data. You also can run Hadoop in “single-cluster mode” on a Linux machine, Windows PC or Mac, to learn the technology or do testing and debugging. The Hadoop framework, however, is not quickly mastered. Apache’s Hadoop wiki cautions: “If you do not know about classpaths, how to compile and debug Java code, step back from Hadoop and learn a bit more about Java before proceeding.” But if you are reasonably comfortable with Java, the well-written Hadoop Beginner’s Guide by Garry Turkington can help you start mastering this rising star in the Big Data constellation.

Dr. Turkington is vice president of data engineering and lead architect for London-based Improve Digital. He holds a doctorate in computer science from Queens University of Belfast in Northern Ireland. His Hadoop Beginner’s Guide provides an effective overview of Hadoop and hands-on guidance in how to use it locally, in distributed hardware clusters, and out in the cloud.

Packt Publishing provided a review copy of the book. I have reviewed one other Packt book previously.

Much of the first chapter is devoted to “exploring the trends that led to Hadoop's creation and its enormous success.” This includes brief discussions of Big Data, cloud computing, Amazon Web Services, and the differences between “scale-up” (using increasingly larger computers as data needs grow) and “scale-out” (spreading the data processing onto more and more machines as demand expands).

“One of the most confusing aspects of Hadoop to a newcomer,” Dr. Turkington writes, “is its various components, projects, sub-projects, and their interrelationships.”

His 374-page book emphasizes three major aspects of Hadoop: (1) its common projects; (2) the Hadoop File Distribution System (HFDS); and (3) MapReduce.

“Common projects,” he explains, “comprise a set of libraries and tools that help the Hadoop product work in the real world.”

The HFDS, meanwhile, “is a filesystem unlike most you may have encountered before.” As a distributed filesystem, it can spread data storage across many nodes. “[I]t stores files in blocks typically at least 64 MB in size, much larger than the 4-32 KB seen in most filesystems.” The book briefly describes several features, strengths, weaknesses, and other aspects of HFDS.

Finally, MapReduce is a well-known programming model for processing large data sets. Typically, MapReduce is used with clusters of computers that perform distributed computing. In the “Map” portion of the process, a single problem is split into many subtasks that are then assigned by a master computer to individual computers known as nodes (and there can be sub-nodes). During the “Reduce” part of the task, the master computer gathers up the processed data from the nodes, combines it and outputs a response to the problem that was posed to be solved. (MapReduce libraries are now available for many different computer languages, including Hadoop.)

“The developer focuses on expressing the transformation between source and result data sets, and the Hadoop framework manages all aspects of job execution, parallelization, and coordination,” Dr. Turkington notes. He calls this “possibly the most important aspect of Hadoop. The platform takes responsibility for every aspect of executing the processing across the data. After the user defines the key criteria for the job, everything else becomes the responsibility of the system.”

In this 11-chapter book, the first two chapters introduce Hadoop and explain how to install and run the software.

Three chapters are devoted to learning to work with MapReduce, from beginner to advanced levels. And the author stresses: “In the book, we will be learning how to write MapReduce programs to do some serious data crunching and how to run them on both locally managed and AWS-hosted Hadoop clusters.” [“AWS” is “Amazon Web Services.”]

Chapter 6, titled “When Things Break” zeroes in on Hadoop’s “resilience to failure and an ability to survive failures when they do happen.much of the architecture and design of Hadoop is predicated on executing in an environment where failures are both frequent and expected.” But node failures and numerous other problems still can arise, so the reader is given an overview of potential difficulties and how to handle them.

The next chapter, “Keeping Things Running,” lays out what must be done to properly maintain a Hadoop cluster and keep it tuned and ready to crunch data.

Three of the remaining chapters show how Hadoop can be used elsewhere within an organization’s systems and infrastructure, by personnel who are not trained to write MapReduce programs.

Chapter 8, for example, provides “A Relational View on Data with Hive.” What Hive provides is “a data warehouse that uses MapReduce to analyze data stored on HFDS,” Dr. Turkington notes. “In particular, it provides a query language called HiveQL that closely resembles the common Structured Query Language (SQL) standard.”

Using Hive as an interface to Hadoop “not only accelerates the time required to produce results from data analysis, it significantly broadens who can use Hadoop and MapReduce. Instead of requiring software development skills, anyone with a familiarity with SQL can use Hive,” the author states.

But, as Chapter 9 makes clear, Hive is not a relational database, and it doesn’t fully implement SQL. So the text and code examples in Chapter 9 illustrate (1) how to set up MySQL to work with Hadoop and (2) how to use Sqoop to transfer bulk data between Hadoop and MySQL.

Chapter 10 shows how to set up and run Flume NG. This is a distributed service that collects, aggregates, and moves large amounts of log data from applications to Hadoop's HFDS.

The book’s final chapter, “Where to Go Next,” helps the newcomer see what else is available beyond the Hadoop core product. “There are,” Dr. Turkington emphasizes, “a plethora of related projects and tools that build upon Hadoop and provide specific functionality or alternative approaches to existing ideas.” He provides a quick tour of several of the projects and tools.

A key strength of this beginner’s guide is in how its contents are structured and delivered. Four important headings appear repeatedly in most chapters. The “Time for action” heading singles out step-by-step instructions for performing a particular action. The “What just happened?” heading highlights explanations of “the working of tasks or instructions that you have just completed.” The “Pop quiz” heading, meanwhile, is followed by short, multiple-choice questions that help you gauge your understanding. And the “Have a go hero” heading introduces paragraphs that “set practical challenges and give you ideas for experimenting with what you have learned.”

Hadoop can be downloaded free from the Apache Software Foundation’s Hadoop website.

Dr. Turkington’s book does a good job of describing how to get Hadoop running on Ubuntu and other Linux distributions. But while he assures that “Hadoop does run well on other systems,” he notes in his text: “Windows is supported only as a development platform, and Mac OS X is not formally supported at all.” He refers users to Apache’s Hadoop FAQ wiki for more information. Unfortunately, few details are offered there. So web searches become the best option for finding how-to instructions for Windows and Macs.

Running Hadoop on a Windows PC typically involves installing Cygwin and openSSH, so you can simulate using a Linux PC. But other choices can be found via sites such as Hadoop Wizard and Hadoop on Windows with Eclipse".

To install Hadoop on a Mac running OS X Mountain Lion, you will need to search for websites that offer how-to tips. Here is one example.

There are other ways get access to Hadoop on a single computer, using other operating systems or virtual machines. Again, web searches are necessary. The Cloudera Enterprise Free product is one virtual-machine option to consider.

Once you get past the hurdle of installing and running Hadoop, Garry Turkington’s well-written, well-structured Hadoop Beginner’s Guide can start you moving down the lengthy path to becoming an expert user.

You will have the opportunity, the book's tagline states, to "[l]earn how to crunch big data to extract meaning from the data avalanche.”

(Si Dunn is an author, screenwriter, and technology book reviewer.)"

Link to Original Source
top

Hadoop Beginner's Guide

sagecreek sagecreek writes  |  about a year and a half ago

sagecreek writes "Hadoop is an open-source, Java-based framework for large-scale data processing. Typically, it runs on big clusters of computers working together to crunch large chunks of data. You also can run Hadoop in “single-cluster mode” on a Linux machine, Windows PC or Mac, to learn the technology or do testing and debugging. The Hadoop framework, however, is not quickly mastered. Apache’s Hadoop wiki cautions: “If you do not know about classpaths, how to compile and debug Java code, step back from Hadoop and learn a bit more about Java before proceeding.” But if you are reasonably comfortable with Java, the well-written Hadoop Beginner’s Guide by Garry Turkington can help you start mastering this rising star in the Big Data constellation.

Dr. Turkington is vice president of data engineering and lead architect for London-based Improve Digital. He holds a doctorate in computer science from Queens University of Belfast in Northern Ireland. His Hadoop Beginner’s Guide provides an effective overview of Hadoop and hands-on guidance in how to use it locally, in distributed hardware clusters, and out in the cloud.

Packt Publishing provided a review copy of the book. I have reviewed one other Packt book previously.

Much of the first chapter is devoted to “exploring the trends that led to Hadoop's creation and its enormous success.” This includes brief discussions of Big Data, cloud computing, Amazon Web Services, and the differences between “scale-up” (using increasingly larger computers as data needs grow) and “scale-out” (spreading the data processing onto more and more machines as demand expands).

“One of the most confusing aspects of Hadoop to a newcomer,” Dr. Turkington writes, “is its various components, projects, sub-projects, and their interrelationships.”

His 374-page book emphasizes three major aspects of Hadoop: (1) its common projects; (2) the Hadoop File Distribution System (HFDS); and (3) MapReduce.

“Common projects,” he explains, “comprise a set of libraries and tools that help the Hadoop product work in the real world.”

The HFDS, meanwhile, “is a filesystem unlike most you may have encountered before.” As a distributed filesystem, it can spread data storage across many nodes. “[I]t stores files in blocks typically at least 64 MB in size, much larger than the 4-32 KB seen in most filesystems.” The book briefly describes several features, strengths, weaknesses, and other aspects of HFDS.

Finally, MapReduce is a well-known programming model for processing large data sets. Typically, MapReduce is used with clusters of computers that perform distributed computing. In the “Map” portion of the process, a single problem is split into many subtasks that are then assigned by a master computer to individual computers known as nodes (and there can be sub-nodes). During the “Reduce” part of the task, the master computer gathers up the processed data from the nodes, combines it and outputs a response to the problem that was posed to be solved. (MapReduce libraries are now available for many different computer languages, including Hadoop.)

“The developer focuses on expressing the transformation between source and result data sets, and the Hadoop framework manages all aspects of job execution, parallelization, and coordination,” Dr. Turkington notes. He calls this “possibly the most important aspect of Hadoop. The platform takes responsibility for every aspect of executing the processing across the data. After the user defines the key criteria for the job, everything else becomes the responsibility of the system.”

In this 11-chapter book, the first two chapters introduce Hadoop and explain how to install and run the software.

Three chapters are devoted to learning to work with MapReduce, from beginner to advanced levels. And the author stresses: “In the book, we will be learning how to write MapReduce programs to do some serious data crunching and how to run them on both locally managed and AWS-hosted Hadoop clusters.” [“AWS” is “Amazon Web Services.”]

Chapter 6, titled “When Things Break” zeroes in on Hadoop’s “resilience to failure and an ability to survive failures when they do happen.much of the architecture and design of Hadoop is predicated on executing in an environment where failures are both frequent and expected.” But node failures and numerous other problems still can arise, so the reader is given an overview of potential difficulties and how to handle them.

The next chapter, “Keeping Things Running,” lays out what must be done to properly maintain a Hadoop cluster and keep it tuned and ready to crunch data.

Three of the remaining chapters show how Hadoop can be used elsewhere within an organization’s systems and infrastructure, by personnel who are not trained to write MapReduce programs.

Chapter 8, for example, provides “A Relational View on Data with Hive.” What Hive provides is “a data warehouse that uses MapReduce to analyze data stored on HFDS,” Dr. Turkington notes. “In particular, it provides a query language called HiveQL that closely resembles the common Structured Query Language (SQL) standard.”

Using Hive as an interface to Hadoop “not only accelerates the time required to produce results from data analysis, it significantly broadens who can use Hadoop and MapReduce. Instead of requiring software development skills, anyone with a familiarity with SQL can use Hive,” the author states.

But, as Chapter 9 makes clear, Hive is not a relational database, and it doesn’t fully implement SQL. So the text and code examples in Chapter 9 illustrate (1) how to set up MySQL to work with Hadoop and (2) how to use Sqoop to transfer bulk data between Hadoop and MySQL.

Chapter 10 shows how to set up and run Flume NG. This is a distributed service that collects, aggregates, and moves large amounts of log data from applications to Hadoop's HFDS.

The book’s final chapter, “Where to Go Next,” helps the newcomer see what else is available beyond the Hadoop core product. “There are,” Dr. Turkington emphasizes, “a plethora of related projects and tools that build upon Hadoop and provide specific functionality or alternative approaches to existing ideas.” He provides a quick tour of several of the projects and tools.

A key strength of this beginner’s guide is in how its contents are structured and delivered. Four important headings appear repeatedly in most chapters. The “Time for action” heading singles out step-by-step instructions for performing a particular action. The “What just happened?” heading highlights explanations of “the working of tasks or instructions that you have just completed.” The “Pop quiz” heading, meanwhile, is followed by short, multiple-choice questions that help you gauge your understanding. And the “Have a go hero” heading introduces paragraphs that “set practical challenges and give you ideas for experimenting with what you have learned.”

Hadoop can be downloaded free from the Apache Software Foundation’s Hadoop website.

Dr. Turkington’s book does a good job of describing how to get Hadoop running on Ubuntu and other Linux distributions. But while he assures that “Hadoop does run well on other systems,” he notes in his text: “Windows is supported only as a development platform, and Mac OS X is not formally supported at all.” He refers users to Apache’s Hadoop FAQ wiki for more information. Unfortunately, few details are offered there. So web searches become the best option for finding how-to instructions for Windows and Macs.

Running Hadoop on a Windows PC typically involves installing Cygwin and openSSH, so you can simulate using a Linux PC. But other choices can be found via sites such as Hadoop Wizard and Hadoop on Windows with Eclipse".

To install Hadoop on a Mac running OS X Mountain Lion, you will need to search for websites that offer how-to tips. Here is one example.

There are other ways get access to Hadoop on a single computer, using other operating systems or virtual machines. Again, web searches are necessary. The Cloudera Enterprise Free product is one virtual-machine option to consider.

Once you get past the hurdle of installing and running Hadoop, Garry Turkington’s well-written, well-structured Hadoop Beginner’s Guide can start you moving down the lengthy path to becoming an expert user.

You will have the opportunity, the book's tagline states, to "[l]earn how to crunch big data to extract meaning from the data avalanche.”

(Si Dunn is an author, screenwriter, and technology book reviewer.)"

Journals

sagecreek has no journal entries.

Slashdot Login

Need an Account?

Forgot your password?