Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Image

Nagios 3 Enterprise Network Monitoring 147

jgoguen writes "Nagios, originally known as Netsaint, has been a long-time favourite for network and device monitoring due to its flexibility, ease of use, and efficiency. Nagios provided, and still provides today, a low-cost, versatile alternative to commercial network monitoring applications. Nagios 3 takes a huge step forward compared to Nagios 2, providing improved flexibility, ease of use and extensibility, all while also making significant performance enhancements. Due to its extensibility and ease of use, no device or situation has yet been found that cannot be monitored using Nagios and a pre-made or custom script, plug-in or enhancement." Read on for the rest of jgoguen's review.
Nagios 3: Enterprise Network Monitoring
author Max Schubert, Derrick Bennett, Jonathan Gines, Andrew Hay, John Strand
pages 339
publisher Syngress
rating 8
reviewer jgoguen
ISBN 978-1-59749-267-6
summary Making Nagios 3 work for you and your business.
The first chapter is devoted to new features in Nagios 3. The major changes implemented for Nagios 3, which includes changes to data storage options and locations, checks, configuration objects, and macros, are discussed here. Operational, performance, and usability enhancements are also discussed here. Users upgrading from Nagios 2, or users who may already be familiar with Nagios 2, will gain the most from this chapter. New users will still gain value from this chapter, however, since a number of changes also involve some of the major features of Nagios. In addition, users who may be referring to configuration file samples created for Nagios 2 will save a great deal of time referring to this chapter for changes. Using Nagios 2 configuration files directly prevents users from enjoying some new features of Nagios 3. Users who will only be writing plug-ins and scripts for their local Nagios deployment might not find Chapter 1 very useful.

Chapters 2 and 3 deal with scaling Nagios to work efficiently within large deployments. First, designing a Nagios configuration for large organizations is shown. This is something that all Nagios administrators should make use of when designing configurations, not only administrators in large organizations, because a properly done configuration for a small organization will easily scale up as the organization grows. I was impressed to see that the authors stress the importance of the end user's input when designing configurations. Administrators who ignore this piece of advice risk the success of Nagios in their organization. Various diagrams help to explain the relationships between the various Nagios configuration objects. A good amount of detail is provided regarding allowing various groups within an organization to have semi-independent control over how Nagios interacts with their hosts and services, and how Nagios alerts their staff. The authors have included numerous configuration file snippets, which allows a Nagios administrator to very quickly create a configuration file and then tweak the configuration parameters to suit local requirements.

Scaling the Nagios graphical user interface (GUI) follows a very simple concept: use a "less is more" approach. Although the specific details here deal with Nagios, the general idea is equally applicable to anyone displaying information they expect their users to actually pay attention to. In general, users should be able to see as much as they want (limited by resources and permissions) but only be shown what they need to know about by default. For example, the system administrator for marketing probably does not need to know when the development disk image server goes down, while the development system administrator would probably be very interested. Utilizing user accounts allows the administrator to allow various groups to have access to Nagios filtered by its fine-grained permissions system. Users from various groups can also be shown only what they need to be shown by default, without the need to select a particular area first. Utilizing user accounts also prevents users who need to view Nagios from having full administrative control, and allows for records of each user's actions to be made. Using a patch provided with the book's download package will enable Nagios to have read-only accounts as well, which is great for organizations who would like to grant certain users (or groups) access to view Nagios but not make any changes. As an example, an organization's help desk could use Nagios to determine quickly whether users are unable to access services because of an outage, or if further troubleshooting is necessary.

The authors continue on here to discuss clustering, failover, and the future of the Nagios GUI. I'm not convinced that these belong in a chapter devoted to scaling the Nagios GUI, since these seem to mostly deal with scaling the entire Nagios deployment. Regardless, they are all very important topics, especially when Nagios is heavily relied upon. Clustering allows remote sites to have a Nagios instance local to the site monitoring hosts and devices rather than requiring a central Nagios instance to monitor remote hosts and services. Not only would monitoring hosts and services take much longer due to the WAN links between the central instance and remote locations, but also due to the security implications of allowing the checks to be done. The authors don't discuss the security side of clustering, but it's still something that every Nagios administrator (and everyone else!) should keep in mind. The clustering section deals primarily with the rationale behind clustering and how to configure the local and remote instances of Nagios properly, but the authors include a good deal of information here that a less experienced Nagios administrator might overlook. Most notable is their discussion about the display of service status when a service is reachable from the master server but not from a remote instance. While Nagios can translate the remote instance's check result to be displayed from its own perspective, it may be more desirable to have the master Nagios GUI display the results from the perspective of the server which made the check. After implementing clustering, some sort of fallback mechanism is required. Failover and redundancy are the two main choices, and that's what the authors discuss next. They don't spend much time on redundancy, since this would require each redundant Nagios instance to perform its own set of checks, which can significantly raise the load on both the monitored hosts and the network in general. Given the problems it can introduce, the authors have spent more time on redundancy than most administrators should spend considering. Failover is a much better solution, and the authors do a great job of covering the setup of a proper failover setup. As usual, they make sure to remind readers of some things that are easily overlooked, especially when you're trying to get Nagios back up and running when the master server crashes.

Chapters 4 and 5 discuss Nagios plug-ins, add-ons, and enhancements. These chapters alone are worth the price of the book because of how much time they can save. It's much faster to copy a script and make minor tweaks than it is to try reinventing the wheel, and with the number of scenarios covered here combined with the Nagios user community there aren't very many things that haven't been done already. Whether you want to test command-line interfaces, CPU usage, memory utilization, bandwidth utilization, HTML pages, LDAP services, or even specialized hardware, there's probably already a plug-in written for it. Most common scenarios actually have a plug-in already included in this book. The available add-ons and plug-ins are equally varied, providing ways to monitor hosts across security zones, configure read-only displays that live in a security zone other than the one Nagios is in, interface with Cacti, and even read out alerts. Even more scenarios can be handled by other scripts provided by the Nagios community.

Chapter 6 goes into detail on how to integrate Nagios into an enterprise environment. This chapter goes into just enough detail to get Nagios configured to work with a large number of third-party services, such as LDAP authentication, Cacti, Puppet, and Splunk. Emphasis here is always placed on the human element; how to use Nagios to help help desk and/or NOC staff do their jobs more efficiently and effectively, and how to gain maximum support for Nagios within the organization. The importance of the human element, in all its forms, simply cannot be overstated, and the the authors have done a wonderful job of outlining a good way to make Nagios an integral part of an organization. A lot of the material towards the end of the chapter, especially the section on smaller Network Operation Centres, could be used by anyone looking for ways to help a small group work together effectively.

Chapter 7 is another chapter with a lot of content easily applicable outside of a Nagios environment. The chapter begins with the authors reminding you to know your network and to watch out for session hijack attacks, then show you how to use Nagios to do both. Nagios can't replace a competent network administrator, but it can make their lives easier and the authors show you here how the configuration you've already done on Nagios already shows you a potential session hijack attack and how it forces you to properly know your network. Nagios forces you to know your network not only by how it's built and by what devices are in use, but it also requires that you have a solid handle on what constitutes normal conditions for all your devices and services.

Another area which is very important to companies, especially companies operating in the United States, that Nagios can assist with is regulatory compliance. The authors outline how a company could use Nagios to assist with compliance with Sarbanes-Oxley (SOX) with COBIT or COSO, Payment Card Industry (PCI) Data Security Standard (DSS), Director of Central Intelligence Directive (DCID) 6/3 and Department of Defence (DoD) Information Assurance Certification and Accreditation Process (DIACAP). Nagios alone isn't enough to be compliant, at the very least detailed documentation will also be required, but the authors give a good overview of how Nagios can assist with compliance in all of these regulations.

The final chapter helps to bring the rest of the book together by walking through a full Nagios configuration for a fictional Fortune 500 corporation. The bulk of this chapter covers the pre-deployment stage of a Nagios deployment, but that doesn't mean that there isn't a lot to learn about deploying Nagios. A major hurdle towards deploying Nagios in an organization is the pre-deployment phase, and the authors outline here how to easily turn this major challenge into a series of simple steps to increase the chances of Nagios' success in your organization. From the very beginning, you can see how involving the customer early and starting small, along with everything else, becomes a part of a process. Although it's specific to Nagios, the process followed here could be easily adapted to integrating any sort of monitoring service. The remainder of the chapter is devoted to how you might integrate Nagios into a Fortune 500 company, finishing the book off with some good advice for integrating Nagios.

Despite all the book's strengths, there is some room for improvement. In chapter 2, it may have been more effective to outline the relationships between the Nagios configuration objects before discussing configuration planning. I found it much easier to think of a configuration for a large organization after knowing about how Nagios' configuration objects relate to each other.

Throughout the book, the authors have included configuration file snippets, scripts, and example script output in the main text. While all of these are quite useful and serve to enhance the book, I think it would have been better if these were all included in an appendix instead, perhaps keeping only the relevant parts of configuration snippets in the main text for clarification.

At the end of chapter 3, the sections on the future of Nagios and the CGI front end are informational and interesting, but they would be better placed in a separate chapter dealing with the potential future of Nagios in general. These and the other major areas of Nagios combined would provide more than enough material for a full chapter on their collective futures.

Overall, this is a great book for anyone using Nagios as more than a casual user, and is still very informative for the casual user. A few of these chapters alone would be worth the price of the whole book.

Disclaimer: I worked with one author when I was asked to review this book.

You can purchase Nagios 3: Enterprise Network Monitoring from amazon.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.

*

This discussion has been archived. No new comments can be posted.

Nagios 3 Enterprise Network Monitoring

Comments Filter:
  • Nagios is great (Score:5, Insightful)

    by kimvette ( 919543 ) on Wednesday October 08, 2008 @02:24PM (#25303197) Homepage Journal

    Nagios is great but even version 3 is by no means easy to configure. Like all too many F/OSS projects, the documentation is lacking or even incorrect in spots, and supplied examples barely scratch the surface of what the application can do.

    I've been running it and it's great - I have it monitoring a bunch of servers (email, hosting, backup, file, etc.) with custom scripts and it works great -- once it's configured.

    • Re: (Score:3, Insightful)

      by kimvette ( 919543 )

      Ooops. submitted to early.

      Nagios is expecially helpful in a smaller environment where you have limited personnel; as long as nagios is up and running you can have it email, page, or text you so that you know there's an issue without having to have personnel monitoring it all manually - and it provides a decent log via the web interface.

      My main point is this: if this book is as good as the reviewer indicates, it should be very well worth buying if you need a F/OSS server monitoring solution.

      • It's even more useful in a large environment - our setup currently monitors 1900 pieces of hardware, each with a few services on them. Without some sort of monitoring system, it would be near impossible to notice if one of those goes down, especially if it's customer hardware which people in the company aren't using.

    • Get Hyperic or one of the other "Small 4" monitoring apps. Never look back.
      • by afidel ( 530433 )
        Hyperic looks cool but the support costs are insane, ~$800/2CPU's/year? I only paid around $1,600 for WhatsUp and pay a fraction of that per year in maintenance to monitor up to 1,000 devices. It's not perfect but I can deal with some warts for that kind of savings!
        • There's a free version of the software. You lose some things like forming groups, etc but it is much better than "whatsup" as it can go on a per service/script scan. The Free version also loses out on the "remediation," as such that if you have a condition like "If apache is not up, run this cleanup and then start apache, and notify this group" Also, Hyperic can give you discounts and even charges half for non production environments.
          • by afidel ( 530433 )
            You can set the polling interval on a per service or monitor basis in WhatsUp as well. It also includes remediation in the base price (though it doesn't include any functionality other than restart service out of the box, but that's usually custom anyways). Their discounts would have to be like 90% to be cost competitive, do you have any idea what their site licensing costs are like? I don't feel like being pestered by a sales puke for information that should be right on their website.
            • I have not used whatsup for years and even then, just a demo. I liked hyperic for the free version and in some environments, especially the OpenView and Tivoli ones, the $800 per 2 CPU would be about an 70% discount.
          • You lose some things like forming groups

            That's actually a pretty big loss, if you ask me.

            I'm no fan of Whatsup, but I use it and it's quite flexible when it comes to monitoring and "remediation" if you dabble on jscript or vbscripts.

            I tried Zenoss and found it superior to Hyperic (imho), but sorely lacking in documentation and not so active community forums. This was last year, I guess it's time give it (and OpenNMS) another try.

            Is there an opensource product that centralizes network and systems monitoring

        • Re: (Score:2, Insightful)

          by perldork ( 1381373 )

          Agreed, time and money spent on integrating Nagios into an organization (or any other free OSS product) to me is much better time and money spent than spending money on licenses and paying support people for a commercial product who then not only get your money but also get the benefits of the knowledge learned from the experience instead of your company or group getting that information.

          Even that wouldn't be sooo bad except that many commercial companies don't even share that knowledge in a way that ot

    • by Fez ( 468752 ) *

      The initial configuration of Nagios can be quite a pain, but as I said in another post farther down the page here, with judicious use of templates, it is now very easy to manage once configured.

    • OpenNMS is better (Score:2, Informative)

      by viridari ( 1138635 )

      I don't know why OpenNMS [opennms.org] doesn't get more credit, maybe because it's a Java app, but it's a damned good one.

      Get a very basic OpenNMS configuration going, point it at a range of IP addresses, and it will auto-discover most of what's out there. If you've got your SNMP agents up and running properly, it'll automatically start checking the more important OID's for you and graphing them with an RRD back end. Most of the setup can be done through the web interface instead of through vi. You don't have to res

    • Nagios is great but even version 3 is by no means easy to configure. Like all too many F/OSS projects, the documentation is lacking or even incorrect in spots, and supplied examples barely scratch the surface of what the application can do.

      Hmmm, I have to say I was pleasantly surprised by the documentation. We had Argus running here for monitoring for awhile and I finally got tired of its very obscure docs and its bugs. Nagios has been an entirely different experience.

      And the Nagios mailing list is very well-read, it seems to me.

      I've been running it and it's great - I have it monitoring a bunch of servers (email, hosting, backup, file, etc.) with custom scripts and it works great -- once it's configured.

      Yes, I will admit it took a bit of time to get the hang of it. But I also remember it took a bit of time when I first tackled BIND and apache way back when! And I agree, we all sleep better at night with Nagios ar

      • Re: (Score:3, Funny)

        by gbjbaanb ( 229885 )

        And I agree, we all sleep better at night with Nagios around (except when I hear a whoop-whoop :)

        you have it sending alerts to the Police?! I'm not sure that's what the IT guys had in mind when they said '24 hr emergency response'. :)

    • If you haven't already, take a look at Zenoss [zenoss.com]. Aside from having a pretty well designed UI (which as I get older I'm beginning to feel deserves more credit in the usefulness dept), supports SNMP by default (I'm not a big fan of clients unless I REALLY need them) *plus* it supports Nagios plugins.

      And I'm not trying to steal any thunder here, I think Nagios is a great option.
  • not good. (Score:3, Insightful)

    by Lord Ender ( 156273 ) on Wednesday October 08, 2008 @02:27PM (#25303247) Homepage

    Is it extensible? Is it easy to use? I didn't get it the first time, better repeat it a few more times...

    My personal experience is that Nagios is probably the LEAST easy to use of any piece of software, period. I hope they changed it in a major way, because last time I tried to use it I was forced to dig through configuration files and learn syntax just to get the thing to see if some server was responding to pings.

    • Re: (Score:2, Informative)

      by Anonymous Coward

      Just get a good front end for nagios, like Groundworks open source. That will make configuration loads easier. (posted as ac 'cause my password is so good I can't remember it)

    • Re:not good. (Score:4, Insightful)

      by walt-sjc ( 145127 ) on Wednesday October 08, 2008 @03:14PM (#25304109)

      Oh please. It's NOT THAT HARD!!!! For what it does, it's fairly simple actually. Compared to any other package of similar capability, it's quite average in terms of difficulty actually. No worse than something like Exim or Apache. Just think of each server as a vhost and each service as a location directive

      • by dubl-u ( 51156 ) *

        Oh please. It's NOT THAT HARD!!!! For what it does, it's fairly simple actually. Compared to any other package of similar capability, it's quite average in terms of difficulty actually. No worse than something like Exim or Apache.

        The difference with something like Exim or Apache is that the tricky concepts you need to understand are mostly external constraints. SMTP is weird and complex. Serving files via HTTP and connecting to web apps is slightly less weird, but much more complex.

        A basic install of Nagios, on the other hand, is doing something pretty simple and straightforward. But at least with the 2.x series it was an unnecessarily giant pain in the ass to configure because you had to understand the Nagios-specific way of looki

    • by amorsen ( 7485 )

      Nagios is probably the easiest to use network monitoring system. That doesn't mean it's particularly easy, just that the others are worse. It breaks down when the network has trouble though; if a significant number of host are unreachable it takes forever for nagios to figure it out. That tends to be exactly when you need it the most.

    • by Qzukk ( 229616 )

      forced to dig through configuration files and learn syntax

      As opposed to what, punching the numbers into a pretty little GUI like one of many [lilacplatform.com]?

    • Re: (Score:3, Informative)

      by mindstrm ( 20013 )

      If all you want is a tool to ping a few servers, nagios is overkill.

      My gut reaction is that if nagios configs seem too complicated, you likely have never had to roll out real enterprise monitoring.

      Our Nagios install monitors thousands of things, many of them custom tests.
      (Transaction volumes, application response times, cron job status, files....).. it can be made to to be the focal point for all the "stuff" the people responsible for monitoring company IT operations need to know about.

    • Check out GroundWork [groundworkopensource.com]. It's basically Nagios + a fairly easy-to-use web interface. We've been using it up at my work for over a year and it works great.
    • Re:not good. (Score:4, Insightful)

      by isorox ( 205688 ) on Wednesday October 08, 2008 @07:00PM (#25306907) Homepage Journal

      Is it extensible?

      Yes, what can't you monitor with nagios?

      Is it easy to use?

      You should see our 2nd line people, if they can use it, anyone can.
      1) Big red problem appears on page
      2) They click the link to the logging system which does an asset-based search showing recent problems.
      3) They click the link to the wiki page for that host, which hints at how to fix it.
      4) Red thing goes away

      There's a difference between *use* and *configure*. Nagios is the easiest monitoring system we've ever used in our department. It's pretty easy to configure too when you know what you're doing (one config file per device host, one directory per logical division of devices, one perl script to splat out the devices, one subversion repository to version track everything)

      I hope they changed it in a major way, because last time I tried to use it I was forced to dig through configuration files and learn syntax just to get the thing to see if some server was responding to pings.

      So? What use is a monitoring program that tells you that. If you want to do decent monitoring, you want to monitor the systems, not the devices those systems happen to run.

      It's a steep learning curve, but have you ever configured apache from scratch? Let alone bind or sendmail.

      • Apache, BIND, and Sendmail are not easy to configure. If someone were hyping their "ease of use" on here, I would criticize them, as well.

        • by isorox ( 205688 )

          Apache, BIND, and Sendmail are not easy to configure. If someone were hyping their "ease of use" on here, I would criticize them, as well.

          Yet all three are used by your grandma every day, they are very easy to use. Easy to maintain too.

    • My personal experience is that Nagios is probably the LEAST easy to use of any piece of software, period. I hope they changed it in a major way, because last time I tried to use it I was forced to dig through configuration files and learn syntax just to get the thing to see if some server was responding to pings.

      Nagios is not that difficult, especially v2.

      The key to a good Nagios rollout is to start small. As in, a few contacts, a few services, and a single host. Learn what the various objects are.
  • Cacti Users (Score:2, Informative)

    by cmorford ( 906819 )
    I've used Nagios, but found Cacti and haven't turned back. Any other Cacti users out there? I found Cacti to be much easier to setup than Nagios and fairly extensible for the advance user.
    • Re: (Score:1, Informative)

      by Anonymous Coward
      Try ClearSite... it's what Cacti is to MRTG. http://clearsite.sourceforge.net/coming-soon.html [sourceforge.net] Linux only for now, but the developers are very nice and will share a newer version if you contact them. -theWiseWan
    • by amorsen ( 7485 )

      Cacti isn't very useful for alerting, just as Nagios really doesn't work well for graphing (one of its more annoying shortcomings).

    • by Fez ( 468752 ) *

      We use both. Cacti for graphing, Nagios for monitoring and paging.

      Nagios 3 did change a bit for the better. However, because they removed MySQL support I had to rewrite large portions of the existing configuration.

      In the process, I made much better use of templating and now each host config is in its own file, and Nagios will load all files in given directories thanks to the cfg_dir= directive.

      For example, all of my servers are in etc/nagios/servers/(servername).cfg, routers are in etc/nagios/routers/(route

    • Re:Cacti Users (Score:5, Informative)

      by thanasakis ( 225405 ) on Wednesday October 08, 2008 @03:19PM (#25304179)

      You are comparing apples with oranges, nagios is for service monitoring, cacti is for diagrams.

      • Not necessarily. You can install the monitor plugin (cactiusers.org) to cacti and get all the alerts you want. While i suppose it doesn't come installed by default, it definitely combines graphing and alerting into a single package that works well.
    • by sarabob ( 544622 )

      I found Cacti a nightmare to configure, setting up custom graphs is comically complicated (why can't I just use rrd syntax rather than clicking buttons?) and we always end up with three data sources for the same things. Support for SNMPv3 is patchy, and we needed to jump through hoops to get the graphs to cope with multiple cpus (cpu usage over 100%? It wraps back to zero...).

      • by Fez ( 468752 ) *

        I've had similar problems with trying to make a custom graph in Cacti. For example, in MRTG, to make a graph that simply added two OIDs, you just set the source to OID1+OID2, and you're done.

        Just try doing that in Cacti. You'll learn more about graph templates and CDEFs than you ever cared to know in the process...

        • by Fweeky ( 41046 )

          Tried Munin [linpro.no]? I was quite impressed when I installed it and found it'd auto-detected a whole bunch of locally graphable stuff.

    • by mindstrm ( 20013 )

      A big install would use both.. as they are very different tools.

      Nagios is a monitoring & alert framework.

      Cacti is a graphing framework...

      Does cacti have some ability to do problem detection, notification, escalation, acknowledgement, resolution, and trend reporting?

      • To some extent yes. There are some plugins at cactiusers.org, one being a monitor plugin. Now it may not be as extensive and nagios, but it's a start.
  • Try Pandora FMS. It does the same as Nagios, is open source but only requires a minimum knowledge of shell scripting to get it working and can monitor everything you can think of inside (using an agent) or outside a host. I monitor about 100 hosts with it and have about 1200 data points every 5-10 minutes (temperatures, network packets, processes etc.) but it scales much larger (using MySQL as backend) even on simple hardware.

  • by Anonymous Coward

    I actually managed to get a girlfriend. And she is a real hottie. I don't want to sound paranoid, but I need to monitor her to make sure she is not cheating, and keeping her AV up to date, you know what I mean.

    So when I read this, "no device or situation has yet been found that cannot be monitored using Nagios and a pre-made or custom script, plug-in or enhancement", I thought this would be perfect.

    I think a Nagios plug in would be best, preferably something with sharp blades. So how do I install this in

    • by yukk ( 638002 )
      I think the device you're looking for is known as a chastity belt [wikipedia.org] but you'd need to couple it with a personal GPS tracking device [travelbygps.com] Or you could just hire a Private Eye [wikipedia.org].
      Of course, whether she continues to like you after all this is not my responsibility.
      • by isorox ( 205688 )

        I think the device you're looking for is known as a chastity belt [wikipedia.org] but you'd need to couple it with a personal GPS tracking device [travelbygps.com]

        Or you could just hire a Private Eye [wikipedia.org].

        Of course, whether she continues to like you after all this is not my responsibility.

        Indeed, Heisenberg did say that you the act of monitoring something will impact it

    • Re: (Score:3, Informative)

      by dfn_deux ( 535506 )
      I understand that your comment was made in jest, but.... Nagios is a really flexible polling and alerting framework. There is nothing in nagios that makes it specifically tailored to monitoring computers or services. For example, there is no reason why you must use the HostAddress directive to hold an IP or a hostname, it could just as easily be a street address, phone number, SSN, etc... And like wise there is no need for polling to actively poll, you can just as easily configure nagios to only respond to
  • Long live Hyperic. Free and extensible without the nonsense to set up that is Nagios. You can use Nagios plugins for it if you so wish.
    • by druiid ( 109068 )

      Hyperic is incredible.. but for my uses I need the enterprise version. Paying an extremely high amount of money for only 25-30 servers is not in the cards... and thus I chose zabbix, which does enough right to be a good replacement.

      • The non enterprise can be used in a 30 server environment, but you DO give up a lot of functionality and have to re-invent things for it to work. I just wish they open sourced the lot of it, but unlikely
  • by rhizome ( 115711 ) on Wednesday October 08, 2008 @03:11PM (#25304041) Homepage Journal

    1st Paragraph: Paraphrase of Foreword.
    2nd Paragraph: What the initial chapter(s) is (are) about.
    3rd Paragraph: What the next chapter is about.
    4th Paragraph: What the chapter after that is about.
    5th Paragraph: What the last chapter(s) is(are) about.
    6th Paragraph: Pithy criticisms for balance.
    7th Paragraph: Conclusion with the required, "This book is useful if you are like me" statement, as in, "Overall, this is a great book for anyone using Nagios as more than a casual user, and is still very informative for the casual user."

  • by hax4bux ( 209237 ) on Wednesday October 08, 2008 @03:14PM (#25304089)

    This book is not a big leap over the supplied Nagios documentation. I bought it out of guilt, but I doubt I have gotten my moneys worth. This is not so much a criticism of the book as praise for the supplied documentation (which is rather decent, given the topic).

    Getting Nagios (or OpenView or whatever management system you have) working is a big job which will not be solved w/a $40 book and a afternoon.

    For all of you who complain about Nagios being complicated, I hope you never see OpenView (et al).

    If you haven't seen Nagios, there is a daemon which performs the collection. The UI is browser based (Apache HTTPD CGI applications). Sometimes there are agents on remote machines to collect status like process tables, disk utilization, etc.

    Nagios is essentially a job scheduler/messaging system. Monitoring is performed by invoking little programs dedicated to collecting information, and these are easy enough to create. There are lots of hooks if you need to extend the system.

    Since the UI is owned by HTTPD, so is access control. Who doesn't know how to set up LDAP or a auth file for Apache? Most of the CGI plugins are implemented in C and are not ugly to look at.

    The agent issue is a little clouded because there are many agents to choose from. I usually just use the Net-SNMP agent because I have a lengthy SNMP background, but that is just my personal choice.

    I will stop here since the article is about a book and not Nagios. I merely wanted to address some of the criticisms of Nagios.

    • > I bought it out of guilt, but I doubt I have gotten my moneys worth.

      I usually figure if I get _anything_ at all out of a book than it's worth the price. I just bought a Puppet book [amazon.com] and just having it around for occasional skimming has gotten me familiar enough with Puppet that I'm willing to give it a whirl. And for $17.99, meh, good enough.

    • Re: (Score:3, Insightful)

      by BitZtream ( 692029 )

      For all of you who complain about Nagios being complicated, I hope you never see OpenView (et al).

      I used to run an OpenView server ... my god, getting that thing to do useful stuff was like getting a cat to listen to your commands, it can be done, but why the hell bother.

      Since that job, I've come to love Nagios (which is still complicated) because its about a billion times easier to deal with than OpenView. Nagios IS complicated, but its job IS complicated and Nagios does a hell of a job when compared to s

  • If you environment is big enough so you can employ at least 1 person to fully work with Nagios, then it's a great product. But out of the box it needs too much time to become usable. I'm talking of Nagios 2, I have no experience (yet) with Nagios 3.
    • Nagios is good, but people have gotta start learning its limitations. I recently brought up the issue of security with some people and their answer: Nagios... There is the right tool for the right job, and Nagios is one small tool in the admin's arsenal to solve one problem. It's being touted as a universal panacea by those with little real knowledge and it's a little scary.

    • by Fweeky ( 41046 )

      We use Nagios in a 2 man team monitoring about 30 hosts and 200 services, including quite a few custom ones. It's not that hard, once you get used to how it works.

    • If you environment is big enough so you can employ at least 1 person to fully work with Nagios, then it's a great product. But out of the box it needs too much time to become usable. I'm talking of Nagios 2, I have no experience (yet) with Nagios 3.

      We use it for about two dozen hosts, a few hundred services, and a handful of technical support users.

      All of which took about a week to get up and running. But other then small re-configurations when we move systems around or change the network, it's very m
  • I used nagios for years.. many many years. It has to be, as many have already pointed out.. the most difficult to configure OSS project ever made.

    That said, it was fairly powerful once configured properly.

    The thing is, though, that is has many shortcomings. I found a much better (although not necessarily as scalable) monitoring and data-gathering solution in Zabbix. They recently released a new version as well that adds many really nice capabilities like ipmi support.

    • WTF? LOL... (Score:3, Insightful)

      by Colin Smith ( 2679 )

      I used nagios for years.. many many years. It has to be, as many have already pointed out.. the most difficult to configure OSS project ever made.


      R$+@$=W $@$1@$H user@thishost -> user@hub
      R$=W!$+ $@$2@$H thishost!user -> user@hub
      R@$=W:$+ $@@$H:$2 @thishost:something
      R$+%$=W $@$>3$1@$2 user%thishost

      Sendmail...

      Nagios is easy, but it only makes sense if you have dozens or hundreds of systems, for less, get something simpler, and it will only work if you understand how to group your hosts, services etc.

      • by dubl-u ( 51156 ) *

        Sendmail is necessarily hard. Mail routing at the time was complicated. Now it's easier, which is why Postfix is a snap to configure for common cases, and why a lot of Sendmail admins never have to see the scary magic at the heart.

        Nagios, on the other hand, is unnecessarily hard. Especially for simple setups and novice users, the pain is ridiculously out of proportion to the gain.

        • Nagios, on the other hand, is unnecessarily hard. Especially for simple setups and novice users, the pain is ridiculously out of proportion to the gain.

          But it's not for the simple setups and novices, it even says that in the manual. What Nagios is easier for are those situations where you need to monitor some custom service.

          A monitoring system reflects the complexity of the systems and services it monitors. If you have a relatively simple network with standard services then Nagios probably isn't required. Try Zabbix instead, it handles those situations fairly well.

  • I currently monitor about 250 hosts with Hobbit (http://sourceforge.net/projects/hobbitmon) and have had good success with it. It has trending (RRD graphs) and alerting thresholds (ie 85% full, email, 95% full pagers) built in together. It is also customizable. We have created several perl scripts that check random applications for various things that are also tracked with Hobbit. How much data Netbackup backed up last night. How many users are logged into our portal. Are the tape drives within Netbackup u
  • Zabbix (Score:5, Interesting)

    by kosmosik ( 654958 ) <kos@ko[ ]sik.net ['smo' in gap]> on Wednesday October 08, 2008 @04:04PM (#25304807) Homepage

    I like Nagios but I can't really imagine how to apply it in large (think ten thousand hosts) setup in multiple regional/organizational branches and so on.

    Also Nagios *is* painful to setup. First of all AFAIK there is no way to delegate administration f.e. to organizational branches. Configuration is just a big pile of config files included from some other config files etc. There is no autodiscovery/autoconfiguration of hosts since Nagios team belives it is BAD etc.

    Well IMHO Nagios is grat but it is like, a big fat pile of hacked scripts and configs. Not too elegant but working.

    Now... I am (well we are in my organization) using Zabbix and I find it great. It is much better organised/elegant than Nagios.

    In Zabbix architecture you have well designed atomic elements like checks, items, services (groups), etc. It also gathers fine tuned historical data for trends and historical review. You can compact the data (lower the resolution) after a given time and so on. It is in fact a very complete monitoring framework with its own internal condition language, escalation engine. You can gather data from network checks, SNMP, custom scripts, Zabbix agents (aviable for most platforms) etc.

    And it has normal configuration, not crude text config files. I have nothing against text files but sometime I don't really want to open my text editor only to quickly setup an ad-hoc overwiev screen with maps, graphs, status displays, clocks and you can have few screens of such rotating on your big screens in NOC. All with mouse clicking.

    I can give it as a tool for sysadmin and he or she can work with it without having to study manuals. Not everybody in your organization is an unix hacker you know...

    We have dozens of branch servers which are managed by local sysadmins and a farm of central servers which is managed by central staff.

    Zabbix works in distributed manner so a local branch can have very detailed view on their infrastructure and at central level I can have an functional/business overview of entire infrastructure, core services (like business systems, transactions etc.) Not just simple checks if RAID is OK - I don't care if RAID in some server is OK. I need to know why (where, who to blame) given service (be it MQ/WebSphere) is not working as desired.

    And also it is free, open source and aviable in most linux/unix distributions as a standard package. So when considering enterprise monitoring platform do yourself a favour and also check Zabbix.

    http://www.zabbix.com/downloads/ZABBIX%20Manual%20v1.6.pdf [zabbix.com]

    • I'll second Zabbix. It has gone through some growing pains, but I like it for its ease of use as well as its flexibility. Until this last version, it did not have good escalations or repeat notifications, which was a big problem. However, with 1.6, that has been corrected.

      One of the things I like about Zabbix was the ability to write custom checks. If you could get any script or program to spit out data, you could very easily capture that data and run checks on it. The windows client could read Windows Perf

      • Re: (Score:3, Informative)

        by kosmosik ( 654958 )

        Well for me what ruled out Nagios was:

        1. It is painfull to setup, don't get me wrong - I've sat my time over configuration and I think I know it a little bit and I can easly set it up for like 100 hosts with some templates +includes +sed magic. But that is what I can do. Not all of my staff can do it and it really is not easy.

        2. It is not distributed. The checks can be distributed. But you cannot have like 20 child Nagios nodes managed by local staff and parent nodes that gather data from children. This is

      • I'll second Zabbix. It has gone through some growing pains, but I like it for its ease of use as well as its flexibility. Until this last version, it did not have good escalations or repeat notifications, which was a big problem. However, with 1.6, that has been corrected.

        As a current user of Zabbix who happens to like it despite perceived shortcomings, I have to say that tears nearly came to my eyes when 1.6 was released. It even has a dashboard which was the feature I had missed the most from Nagios (whe

    • I tried out zabbix this morning, as I'm always keen to try and improve our monitoring (we currently run a nagios 2.x system).

      In-house, we only use rpm'd software, for ease of upgrading and maintenance, so I quickly found RHEL4 rpms for the various bits and loaded them up. Thats when the pain started - I found the post-rpm setup horribly broken, and gave up after an hour or so. The web frontend had to hacked to display, then the DB connection script had to be hacked for it to recognise the DB (which I also h

      • Just what kind of argument is it? You've installed RPMs from unknown source on production server and complain that the RPMs are broken... Quite silly really.

        If you still wan't to check out Zabbix and you use RHEL I recommend EPEL RPMs packaged by Fedora Community. They work fine for me:

        http://fedoraproject.org/wiki/EPEL [fedoraproject.org]

        But there is no automatic database creation script since no sane admin need such thing.

        Instalation is as simple as:

        1. Install RPMs via YUM or by hand.
        2. Create database user and database with

  • Right now I wanted to check Nagios documentation for simple thing - configuration file syntax. This is the basic stuff. It is the first thing that should be defined in reference manual. I like to know how the files are processed. How do I do comments. How do I define multiple line commands and so on.

    Please point me out that I am blind or stupid since I really cannot find it in manuals here:
    http://nagios.sourceforge.net/docs/3_0/toc.html [sourceforge.net]

    Also I find the online manual quite retarded/clunky. It doesn't even has

  • Nagios is a mess (Score:3, Insightful)

    by Kent Recal ( 714863 ) on Wednesday October 08, 2008 @05:20PM (#25305733)

    Blech, nagios is probably the most disgusting hack currently in wide use. It was overdue for a complete rewrite after Nagios2 - but nagios hackers don't seem to have any pain treshold. Nowadays it's not even funny anymore. Nagios has gone *way* over its expiration date. The closest analogy would be a pot of milk that has been sitting in direct sunlight for 6 months straight...

    I strongly suggest that anyone looking for a monitoring solution stays away from the dead horse and looks at the modern alternatives first. There are plenty: Munin, Cacti, Zenoss, Pandora, OpenNMS, just to name a few.

    Most importantly: Take your time before you decide and evaluate thoroughly. A monitoring solution will stick with you for a long time and migrating to a different software is usually a very painful process. Which, btw, is the main reason why so many sites still ride the dead horse...

    • You are partially right - Nagios is a bit legacy.

      But you have mentioned Munin and Cacti - these are just simple graphing solutions. Munin is generaly useless - you have only year/month/day views (or similar), you cannot zoom into fe. 2 hours range last week. Cacti is totally better than Munin.

      But also Cacti is just a simple SNMP pooling and then graphing solution. It has some plugins as tresholds but it really is not that class of solution as Nagios (or better Zabbix).

      Nagios is an *engine* that processes me

  • For those of you that aren't particularly fond of the complexity of Nagios' configuration, check out GroundWork [groundworkopensource.com]. It's basically Nagios + a fairly easy-to-use web interface. We've been using it up at my work for over a year and it works great.
  • Hard to set up? (Score:4, Insightful)

    by isorox ( 205688 ) on Wednesday October 08, 2008 @06:54PM (#25306851) Homepage Journal

    So Nagios is hard to set up? Probably, you can't go from zero to running in 5 minutes. It's a steep learning curve, but if the initial investment of a book (I used building a monitoring environment with nagios) and a few hours, you shouldn't be monitoring things. You won't do it correctly, you may as well throw some cron jobs together.

    The first step in monitoring is working out what you want to monitor. The second step is working out what you really want to monitor. The third step is working out how you want to display problems. When you have 60 people in support working on a 6 shift 24/7 pattern, you can't expect emails to be any use. "Service problems" in nagios is fine, but there's a lot of issues that 2nd line don't need to know about -- solaris security patches on an intranet for example, can wait until the 9-5 admins get in.

    Nagios is painfully easy to administer, if you set it up right. Once you know what you're doing (or even know enough to be dangerous, like myself), you can deploy a new nagios installation in about 20 minutes, add a new device that follows existing rules (new web server for example) in under 5 minutes, and a new device with new plugins in half an hour.

    Nagios then grows organically. When something strange and new breaks we cobble a plugin together,

    Configuration is in plain text files, one for each device on the network. I have these as an subversion working copy, which gives me the ability to track changes and easily roll back any configuration problems.

    We have dozens of weird bespoke plugins, one uses WWW:Mechanize and Perl to run through a workflow on a specifc webpage, another looks at the rate of change of growth of a jboss logfile, and the proportion of stack traces, one logs into a remote machine and checks jumbo pings are working through the network.

    We find nagios essential to monitor the service we provide. I don't particularly care if the server an oracle database runs on is pingable, I care if I can log in and run "select 1 from dual" (or usually something more application specific).

    The small system we monitor is made up of about 800 services over 190 devices.

    • Configuration is in plain text files, one for each device on the network. I have these as an subversion working copy, which gives me the ability to track changes and easily roll back any configuration problems.

      That's a big strength of Nagios (using plain text files). We use FSVS on our servers, with a SVN back-end. It's so nice to be able to track changes and do easy diffs between versions.

      (We use FSVS because it doesn't create .svn folders. It's more suited for version controlling things like /etc
  • by rossz ( 67331 ) <ogre@@@geekbiker...net> on Wednesday October 08, 2008 @07:37PM (#25307249) Journal

    I have never once personally had any dealings with a properly implemented Nagios system. Every single time it was obviously tossed up by someone who had minimal knowledge of how to properly monitor the infrastructure.

    The biggest complaint I hear is "too many alerts". So set your dependencies properly! You say you did that but you still get 600 alerts when the router dies? That's because you told it you wanted the alerts. See that "u" in "notification_options". That means "unreachable". You want to be alerted when the box can't be reached. You probably wanted "d,r", not "d,r.u".

    The next complaint. It's so much work to add a system. Huh? It takes me about 30 seconds to add another system and all the tests I need. The trick is using host groups to automatically assign tests to a system. For example, using a generic LAMP type server. What can we assume about this? It's running Linux, Apache, MySQL, and Perl or PHP. That's a bunch of tests right off. In my world, SNMP is assumed on all systems (because I made it that way, that's why). So we define a bunch of service checks using SNMP, but instead of using "host_name some_hostname", we use "hostgroup_name lamp-servers". Now when I add a new server, I add "hostgroups lamp-servers" to the definition and like magic it gets all the tests I need: snmp port responding, ssh access, apache daemon running, mysql daemon running, web page accessible, disk space good (defined in snmpd.conf), CPU usage, load average, plus sone automatic dependencies: all snmp tests depend on the snmp port responding. Web pages are dependent on the apache daemon running, etc. I even have some simple graphing included automatically. Even the O/S icons are defined by the hostgroups. Each distro has its own hostgroup which takes care of that detail (e.g. centos-system and ubuntu-system).

    Ten simple lines to define a new hosts can result in 20 service checks. I rarely need to define a new service check. And when a router goes out? One alert for the router.

    Not every system is going to be generic like this, but any time I have more than one system require a specific service check, I create a hostgroup to handle it.

    • The Nagios config files are atrocious. Trying to navigate through them is sometimes like an exercise in insanity.

      That being said, if you are in a large organization (e.g., a large web hosting company with multiple datacenters) that needs to monitor thousands of services on thousands of hosts, it can be done. However, you can't go mucking around in the config files all willy-nilly. You have to build a framework around them. At the hosting company I work for, we have deployed Nagios collector nodes in multipl

    • I prefer autodiscovery. When a new device comes up, I don't have to do anything. It magically is part of the correct groups, is graphing, and alerts are done without me lifing a finger. Much better than needing to connect to the box, editing the config files, and reloading the config.

  • "Due to its extensibility and ease of use, no device or situation has yet been found that cannot be monitored using Nagios and a pre-made or custom script, plug-in or enhancement."

    It's so true! We put an air-actuated sensor on his pants, and now we get e-mail and SMS whenever grandpa farts. Thank you Nagios!

"If it ain't broke, don't fix it." - Bert Lantz

Working...