Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Nagios 3 Enterprise Network Monitoring

samzenpus posted more than 5 years ago | from the read-all-about-it dept.

Software 147

jgoguen writes "Nagios, originally known as Netsaint, has been a long-time favourite for network and device monitoring due to its flexibility, ease of use, and efficiency. Nagios provided, and still provides today, a low-cost, versatile alternative to commercial network monitoring applications. Nagios 3 takes a huge step forward compared to Nagios 2, providing improved flexibility, ease of use and extensibility, all while also making significant performance enhancements. Due to its extensibility and ease of use, no device or situation has yet been found that cannot be monitored using Nagios and a pre-made or custom script, plug-in or enhancement." Read on for the rest of jgoguen's review.The first chapter is devoted to new features in Nagios 3. The major changes implemented for Nagios 3, which includes changes to data storage options and locations, checks, configuration objects, and macros, are discussed here. Operational, performance, and usability enhancements are also discussed here. Users upgrading from Nagios 2, or users who may already be familiar with Nagios 2, will gain the most from this chapter. New users will still gain value from this chapter, however, since a number of changes also involve some of the major features of Nagios. In addition, users who may be referring to configuration file samples created for Nagios 2 will save a great deal of time referring to this chapter for changes. Using Nagios 2 configuration files directly prevents users from enjoying some new features of Nagios 3. Users who will only be writing plug-ins and scripts for their local Nagios deployment might not find Chapter 1 very useful.

Chapters 2 and 3 deal with scaling Nagios to work efficiently within large deployments. First, designing a Nagios configuration for large organizations is shown. This is something that all Nagios administrators should make use of when designing configurations, not only administrators in large organizations, because a properly done configuration for a small organization will easily scale up as the organization grows. I was impressed to see that the authors stress the importance of the end user's input when designing configurations. Administrators who ignore this piece of advice risk the success of Nagios in their organization. Various diagrams help to explain the relationships between the various Nagios configuration objects. A good amount of detail is provided regarding allowing various groups within an organization to have semi-independent control over how Nagios interacts with their hosts and services, and how Nagios alerts their staff. The authors have included numerous configuration file snippets, which allows a Nagios administrator to very quickly create a configuration file and then tweak the configuration parameters to suit local requirements.

Scaling the Nagios graphical user interface (GUI) follows a very simple concept: use a "less is more" approach. Although the specific details here deal with Nagios, the general idea is equally applicable to anyone displaying information they expect their users to actually pay attention to. In general, users should be able to see as much as they want (limited by resources and permissions) but only be shown what they need to know about by default. For example, the system administrator for marketing probably does not need to know when the development disk image server goes down, while the development system administrator would probably be very interested. Utilizing user accounts allows the administrator to allow various groups to have access to Nagios filtered by its fine-grained permissions system. Users from various groups can also be shown only what they need to be shown by default, without the need to select a particular area first. Utilizing user accounts also prevents users who need to view Nagios from having full administrative control, and allows for records of each user's actions to be made. Using a patch provided with the book's download package will enable Nagios to have read-only accounts as well, which is great for organizations who would like to grant certain users (or groups) access to view Nagios but not make any changes. As an example, an organization's help desk could use Nagios to determine quickly whether users are unable to access services because of an outage, or if further troubleshooting is necessary.

The authors continue on here to discuss clustering, failover, and the future of the Nagios GUI. I'm not convinced that these belong in a chapter devoted to scaling the Nagios GUI, since these seem to mostly deal with scaling the entire Nagios deployment. Regardless, they are all very important topics, especially when Nagios is heavily relied upon. Clustering allows remote sites to have a Nagios instance local to the site monitoring hosts and devices rather than requiring a central Nagios instance to monitor remote hosts and services. Not only would monitoring hosts and services take much longer due to the WAN links between the central instance and remote locations, but also due to the security implications of allowing the checks to be done. The authors don't discuss the security side of clustering, but it's still something that every Nagios administrator (and everyone else!) should keep in mind. The clustering section deals primarily with the rationale behind clustering and how to configure the local and remote instances of Nagios properly, but the authors include a good deal of information here that a less experienced Nagios administrator might overlook. Most notable is their discussion about the display of service status when a service is reachable from the master server but not from a remote instance. While Nagios can translate the remote instance's check result to be displayed from its own perspective, it may be more desirable to have the master Nagios GUI display the results from the perspective of the server which made the check. After implementing clustering, some sort of fallback mechanism is required. Failover and redundancy are the two main choices, and that's what the authors discuss next. They don't spend much time on redundancy, since this would require each redundant Nagios instance to perform its own set of checks, which can significantly raise the load on both the monitored hosts and the network in general. Given the problems it can introduce, the authors have spent more time on redundancy than most administrators should spend considering. Failover is a much better solution, and the authors do a great job of covering the setup of a proper failover setup. As usual, they make sure to remind readers of some things that are easily overlooked, especially when you're trying to get Nagios back up and running when the master server crashes.

Chapters 4 and 5 discuss Nagios plug-ins, add-ons, and enhancements. These chapters alone are worth the price of the book because of how much time they can save. It's much faster to copy a script and make minor tweaks than it is to try reinventing the wheel, and with the number of scenarios covered here combined with the Nagios user community there aren't very many things that haven't been done already. Whether you want to test command-line interfaces, CPU usage, memory utilization, bandwidth utilization, HTML pages, LDAP services, or even specialized hardware, there's probably already a plug-in written for it. Most common scenarios actually have a plug-in already included in this book. The available add-ons and plug-ins are equally varied, providing ways to monitor hosts across security zones, configure read-only displays that live in a security zone other than the one Nagios is in, interface with Cacti, and even read out alerts. Even more scenarios can be handled by other scripts provided by the Nagios community.

Chapter 6 goes into detail on how to integrate Nagios into an enterprise environment. This chapter goes into just enough detail to get Nagios configured to work with a large number of third-party services, such as LDAP authentication, Cacti, Puppet, and Splunk. Emphasis here is always placed on the human element; how to use Nagios to help help desk and/or NOC staff do their jobs more efficiently and effectively, and how to gain maximum support for Nagios within the organization. The importance of the human element, in all its forms, simply cannot be overstated, and the the authors have done a wonderful job of outlining a good way to make Nagios an integral part of an organization. A lot of the material towards the end of the chapter, especially the section on smaller Network Operation Centres, could be used by anyone looking for ways to help a small group work together effectively.

Chapter 7 is another chapter with a lot of content easily applicable outside of a Nagios environment. The chapter begins with the authors reminding you to know your network and to watch out for session hijack attacks, then show you how to use Nagios to do both. Nagios can't replace a competent network administrator, but it can make their lives easier and the authors show you here how the configuration you've already done on Nagios already shows you a potential session hijack attack and how it forces you to properly know your network. Nagios forces you to know your network not only by how it's built and by what devices are in use, but it also requires that you have a solid handle on what constitutes normal conditions for all your devices and services.

Another area which is very important to companies, especially companies operating in the United States, that Nagios can assist with is regulatory compliance. The authors outline how a company could use Nagios to assist with compliance with Sarbanes-Oxley (SOX) with COBIT or COSO, Payment Card Industry (PCI) Data Security Standard (DSS), Director of Central Intelligence Directive (DCID) 6/3 and Department of Defence (DoD) Information Assurance Certification and Accreditation Process (DIACAP). Nagios alone isn't enough to be compliant, at the very least detailed documentation will also be required, but the authors give a good overview of how Nagios can assist with compliance in all of these regulations.

The final chapter helps to bring the rest of the book together by walking through a full Nagios configuration for a fictional Fortune 500 corporation. The bulk of this chapter covers the pre-deployment stage of a Nagios deployment, but that doesn't mean that there isn't a lot to learn about deploying Nagios. A major hurdle towards deploying Nagios in an organization is the pre-deployment phase, and the authors outline here how to easily turn this major challenge into a series of simple steps to increase the chances of Nagios' success in your organization. From the very beginning, you can see how involving the customer early and starting small, along with everything else, becomes a part of a process. Although it's specific to Nagios, the process followed here could be easily adapted to integrating any sort of monitoring service. The remainder of the chapter is devoted to how you might integrate Nagios into a Fortune 500 company, finishing the book off with some good advice for integrating Nagios.

Despite all the book's strengths, there is some room for improvement. In chapter 2, it may have been more effective to outline the relationships between the Nagios configuration objects before discussing configuration planning. I found it much easier to think of a configuration for a large organization after knowing about how Nagios' configuration objects relate to each other.

Throughout the book, the authors have included configuration file snippets, scripts, and example script output in the main text. While all of these are quite useful and serve to enhance the book, I think it would have been better if these were all included in an appendix instead, perhaps keeping only the relevant parts of configuration snippets in the main text for clarification.

At the end of chapter 3, the sections on the future of Nagios and the CGI front end are informational and interesting, but they would be better placed in a separate chapter dealing with the potential future of Nagios in general. These and the other major areas of Nagios combined would provide more than enough material for a full chapter on their collective futures.

Overall, this is a great book for anyone using Nagios as more than a casual user, and is still very informative for the casual user. A few of these chapters alone would be worth the price of the whole book.

Disclaimer: I worked with one author when I was asked to review this book.

You can purchase Nagios 3: Enterprise Network Monitoring from amazon.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.

cancel ×

147 comments

Sorry! There are no comments related to the filter you selected.

Spam alert! (0, Troll)

Animats (122034) | more than 5 years ago | (#25303039)

A review, by an associate of the author, of an obscure product, with a picture of the book plastered on the front page of Slashdot. Who was paid off for that?

Re:Spam alert! (1)

Hijacked Public (999535) | more than 5 years ago | (#25303105)

Slashdot, as per the usual.

Re:Spam alert! (-1, Troll)

Anonymous Coward | more than 5 years ago | (#25303259)

The author merely tossed Samzenpus' salad. [wikipedia.org]

Convincing /. to also display the picture took a little extra coaxing - placing CmdrTaco's balls in the mouth and then going like , "hmmmmmmmmmmmmmmmmmmmmmmmmm".

Re:Spam alert! (1)

nobodylocalhost (1343981) | more than 5 years ago | (#25303225)

You sir, have no idea how right you are about Nagios... It spams, a lot. And depends on how well you know what you are doing, it will spam you from couple mail per hour to literally e-mail bomb you so you can't even open your e-mail client.

Re:Spam alert! (4, Insightful)

SgtAaron (181674) | more than 5 years ago | (#25304493)

You sir, have no idea how right you are about Nagios... It spams, a lot. And depends on how well you know what you are doing, it will spam you from couple mail per hour to literally e-mail bomb you so you can't even open your e-mail client.

I'm thinking that you may be one of those that need the book. :-) The amount and frequency of alert emails is easily configurable. And I think you need a new mail client! How about trying mutt? :)

The "notification_interval" can be set to 0 so that nagios will only send one alert, period. Now, if you have a bunch of services/hosts down you will get a lot of messages unless you've taken steps to mitigate that. But isn't that better than *not* knowing your network has run home to momma?

We've been using Nagios now for months and it may be the least buggy code running on any of our machines. Rock-solid, I tell you.

regards,

Re:Spam alert! (1)

mosinu (987941) | more than 5 years ago | (#25305479)

Actually if you setup dependencies properly even a major network outage won't mail bomb you to badly.

Re:Spam alert! (2, Interesting)

perldork (1381373) | more than 5 years ago | (#25308491)

Set up dependencies is a must, use notification_delay wisely, and only send out emails or other 'push' notifications for problems that have to have immediate attention. I like to take the approach of monitoring everything that is important but only send emails out for problems that really truly require immediate attention from on-call staff.

Re:Spam alert! (5, Informative)

Ngarrang (1023425) | more than 5 years ago | (#25303235)

A review, by an associate of the author, of an obscure product, with a picture of the book plastered on the front page of Slashdot. Who was paid off for that?

Obscure product? What world have you been living in?

Re:Spam alert! (0)

Anonymous Coward | more than 5 years ago | (#25303371)

I suffer from logizomechanophobia, you insensitive clod! I can barely bring myself to type on a computer, much less network them and run a full TCP/IP stack!

Re:Spam alert! (1)

_Sprocket_ (42527) | more than 5 years ago | (#25303411)

Point taken. However, I'm not sure exactly how obscure Nagios is. In the IT circles I run around, it's pretty well known. But then again, I'm in a fairly mixed environment.

Why not? (1)

Drakkenmensch (1255800) | more than 5 years ago | (#25303435)

Some of us still have the iBall legacy reader for the .DTF (Dead Tree Format) file type!

Re:Spam alert! (5, Funny)

MrNaz (730548) | more than 5 years ago | (#25304357)

Nagios is only obscure if you are not a network admin, Linux geek or data center operator.

So the real question is; what are you doing here?

Re:Spam alert! (2, Insightful)

IceCreamGuy (904648) | more than 5 years ago | (#25304371)

...of an obscure product...

The only things I can think of that would make someone say something like this are:

You're not a systems administrator

Your new to systems administration

You're a bad systems administrator

You don't keep up with grade-A open source enterprise solutions

you work for a company that has a budget big enough that you don't ever consider open source

Re:Spam alert! (1, Insightful)

Anonymous Coward | more than 5 years ago | (#25304705)

Nagios may be many things but ease of use...??? I suspect the author is suffering from crack-induced delusion.

Re:Spam alert! (1)

perldork (1381373) | more than 5 years ago | (#25308377)

Agreed that the review does seem less than impartial but I can assure you that Syngress doesn't pay anyone enough to have money to 'pay people off.' :p

Nagios is great (4, Insightful)

kimvette (919543) | more than 5 years ago | (#25303197)

Nagios is great but even version 3 is by no means easy to configure. Like all too many F/OSS projects, the documentation is lacking or even incorrect in spots, and supplied examples barely scratch the surface of what the application can do.

I've been running it and it's great - I have it monitoring a bunch of servers (email, hosting, backup, file, etc.) with custom scripts and it works great -- once it's configured.

Re:Nagios is great (2, Insightful)

kimvette (919543) | more than 5 years ago | (#25303243)

Ooops. submitted to early.

Nagios is expecially helpful in a smaller environment where you have limited personnel; as long as nagios is up and running you can have it email, page, or text you so that you know there's an issue without having to have personnel monitoring it all manually - and it provides a decent log via the web interface.

My main point is this: if this book is as good as the reviewer indicates, it should be very well worth buying if you need a F/OSS server monitoring solution.

Re:Nagios is great (1)

Anarke_Incarnate (733529) | more than 5 years ago | (#25304139)

Get Hyperic or one of the other "Small 4" monitoring apps. Never look back.

Re:Nagios is great (1)

afidel (530433) | more than 5 years ago | (#25305257)

Hyperic looks cool but the support costs are insane, ~$800/2CPU's/year? I only paid around $1,600 for WhatsUp and pay a fraction of that per year in maintenance to monitor up to 1,000 devices. It's not perfect but I can deal with some warts for that kind of savings!

Re:Nagios is great (1)

Anarke_Incarnate (733529) | more than 5 years ago | (#25305399)

There's a free version of the software. You lose some things like forming groups, etc but it is much better than "whatsup" as it can go on a per service/script scan. The Free version also loses out on the "remediation," as such that if you have a condition like "If apache is not up, run this cleanup and then start apache, and notify this group" Also, Hyperic can give you discounts and even charges half for non production environments.

Re:Nagios is great (1)

afidel (530433) | more than 5 years ago | (#25305497)

You can set the polling interval on a per service or monitor basis in WhatsUp as well. It also includes remediation in the base price (though it doesn't include any functionality other than restart service out of the box, but that's usually custom anyways). Their discounts would have to be like 90% to be cost competitive, do you have any idea what their site licensing costs are like? I don't feel like being pestered by a sales puke for information that should be right on their website.

Re:Nagios is great (1)

Anarke_Incarnate (733529) | more than 5 years ago | (#25305631)

I have not used whatsup for years and even then, just a demo. I liked hyperic for the free version and in some environments, especially the OpenView and Tivoli ones, the $800 per 2 CPU would be about an 70% discount.

Re:Nagios is great (1)

perldork (1381373) | more than 5 years ago | (#25308573)

Yes, there are the ugly monsters. Examples:
* Tivoli, Openview, BMC anything

Then there are the less expensive monsters that came out to beat the ugly monsters by being relatively cheaper and slightly more open and useful. Examples:
* Spectrum, eHealth
* SevOne

Then there are the 'cheap but not so NOC friendly' products that came out to make basic monitoring easy. Examples:
* What's Up Gold
* Server's Alive

Then there are the OSS projects that came out in reaction to the expensive less than open commercial projects. Examples:
* Nagios
* Pandora
* Cacti
* Big Brother
* MonIT

Then there are the products that try to hit the middle ground between free/OSS and commercial but more reasonable than the monsters. Examples:
* Zabbix
* Hobbit
* ZenOss

and more expensive but less than the monsters ... example:
* Hyperic

My list is not comprehensive by any means .. so companies and consultants have a lot of different migration paths they can use to get away from the very expensive, very stovepipe NMSs that used to rule the field of monitoring to less expensive ones with some custom work allowed to free ones that require a fair amount of custom work with the tradeoff of no licensing or overpriced support options (obviously once you go down doing in-house work you incur the cost of having in-house developers maintain the work unless your organization is willing to let you contribute back to the OSS community .. and even then you will still have in house support needed).

Re:Nagios is great (1)

Anarke_Incarnate (733529) | more than 5 years ago | (#25308647)

Having been in Openview shops exclusively, I can see how powerful, and yet frickin' misguided it is. Openview is used, not because of how powerful or elegant it is (it is one, but not the other) but because "HP makes it, and everybody uses it, so we are taking less risk."

I have never been in an environment that actually used Openview and LIKED it, and the alerts they got. Often times, the operator at the console sees a scroll of gibberish and just recognizes that the reds and oranges are meant to be that way , and ignores them for the most part, EXCEPT when they are meaningless.

I have been called at 3AM because the status of a node changed from NORMAL to NORMAL.......

However, when we lost 90% of our data center to over heating from a total A/C failure......well, they just assumed an outage that big was a scheduled one they were not made aware of and ignored it, due to it filling their screen up......

All the NMS software in the world can't fix lazy, stupid or overwhelmed.

Re:Nagios is great (2, Insightful)

perldork (1381373) | more than 5 years ago | (#25308439)

Agreed, time and money spent on integrating Nagios into an organization (or any other free OSS product) to me is much better time and money spent than spending money on licenses and paying support people for a commercial product who then not only get your money but also get the benefits of the knowledge learned from the experience instead of your company or group getting that information.

Even that wouldn't be sooo bad except that many commercial companies don't even share that knowledge in a way that other customers can benefit from unless they pay for consulting time ... most commercial NNM producers have horrid public forums and KBs that really only cover issues related to upgrades and licensing as opposed to lessons learned by other customers.

This of course only applies to organizations that have development/IT groups that are large enough to support custom integration efforts, I understand that there are many places who can't afford to invest in in-house development or who really do not want to learn how to do systems/application/network monitoring themselves.

Re:Nagios is great (1)

Fez (468752) | more than 5 years ago | (#25304209)

The initial configuration of Nagios can be quite a pain, but as I said in another post farther down the page here, with judicious use of templates, it is now very easy to manage once configured.

OpenNMS is better (2, Informative)

viridari (1138635) | more than 5 years ago | (#25304381)

I don't know why OpenNMS [opennms.org] doesn't get more credit, maybe because it's a Java app, but it's a damned good one.

Get a very basic OpenNMS configuration going, point it at a range of IP addresses, and it will auto-discover most of what's out there. If you've got your SNMP agents up and running properly, it'll automatically start checking the more important OID's for you and graphing them with an RRD back end. Most of the setup can be done through the web interface instead of through vi. You don't have to restart the daemon every time you add a node.

If Nagios drives you a bit batty, check out OpenNMS.

Re:OpenNMS is better (1)

Elshar (232380) | more than 5 years ago | (#25306427)

I've been wanting to try OpenNMS for years now, but it doesn't work out of the box with FreeBSD (all the java dependencies, etc..) and I could never actually get it to compile properly even after fiddling with it. It's really a shame too, I've only heard good things about it and REALLY tried to get it working with my current system (All *BSD boxes). Maybe someday I'll get a linux box that will work with it, but as of now I already have a network monitoring box with nagios, cacti, etc on it.

Re:Nagios is great (1)

SgtAaron (181674) | more than 5 years ago | (#25304611)

Nagios is great but even version 3 is by no means easy to configure. Like all too many F/OSS projects, the documentation is lacking or even incorrect in spots, and supplied examples barely scratch the surface of what the application can do.

Hmmm, I have to say I was pleasantly surprised by the documentation. We had Argus running here for monitoring for awhile and I finally got tired of its very obscure docs and its bugs. Nagios has been an entirely different experience.

And the Nagios mailing list is very well-read, it seems to me.

I've been running it and it's great - I have it monitoring a bunch of servers (email, hosting, backup, file, etc.) with custom scripts and it works great -- once it's configured.

Yes, I will admit it took a bit of time to get the hang of it. But I also remember it took a bit of time when I first tackled BIND and apache way back when! And I agree, we all sleep better at night with Nagios around (except when I hear a whoop-whoop :)

Re:Nagios is great (2, Funny)

gbjbaanb (229885) | more than 5 years ago | (#25305077)

And I agree, we all sleep better at night with Nagios around (except when I hear a whoop-whoop :)

you have it sending alerts to the Police?! I'm not sure that's what the IT guys had in mind when they said '24 hr emergency response'. :)

Zenoss (1)

msimm (580077) | more than 5 years ago | (#25305207)

If you haven't already, take a look at Zenoss [zenoss.com] . Aside from having a pretty well designed UI (which as I get older I'm beginning to feel deserves more credit in the usefulness dept), supports SNMP by default (I'm not a big fan of clients unless I REALLY need them) *plus* it supports Nagios plugins.

And I'm not trying to steal any thunder here, I think Nagios is a great option.

Re:Nagios is great (1)

Maniacal (12626) | more than 5 years ago | (#25306469)

We use GroundWork [groundworkopensource.com] . They provide a graphical front end for the nagios configuration that takes a lot of the pain out of it. I think they only support Nagios 2 currently but we've been happy with it and it's free. They have VMware appliances as well which gives you a zero install deploy option making it even easier.

Re:Nagios is great (1)

mosinu (987941) | more than 5 years ago | (#25306653)

Like all too many F/OSS projects, the documentation is lacking or even incorrect in spots, and supplied examples barely scratch the surface of what the application can do.

Last time I checked you were free to update and contribute to that lack of documentation so many bitch about that is missing or incorrect....

not good. (2, Insightful)

Lord Ender (156273) | more than 5 years ago | (#25303247)

Is it extensible? Is it easy to use? I didn't get it the first time, better repeat it a few more times...

My personal experience is that Nagios is probably the LEAST easy to use of any piece of software, period. I hope they changed it in a major way, because last time I tried to use it I was forced to dig through configuration files and learn syntax just to get the thing to see if some server was responding to pings.

Re:not good. (2, Informative)

Anonymous Coward | more than 5 years ago | (#25304029)

Just get a good front end for nagios, like Groundworks open source. That will make configuration loads easier. (posted as ac 'cause my password is so good I can't remember it)

Re:not good. (3, Insightful)

walt-sjc (145127) | more than 5 years ago | (#25304109)

Oh please. It's NOT THAT HARD!!!! For what it does, it's fairly simple actually. Compared to any other package of similar capability, it's quite average in terms of difficulty actually. No worse than something like Exim or Apache. Just think of each server as a vhost and each service as a location directive

Re:not good. (0)

Anonymous Coward | more than 5 years ago | (#25304517)

Take a look at Zabbix, then tell me nagios is simple.

Unless Nagios provides a feature you need that isn't in Zabbix (escalations? Log monitoring?) I'd strongly recommend it

Re:not good. (1)

dubl-u (51156) | more than 5 years ago | (#25307665)

Oh please. It's NOT THAT HARD!!!! For what it does, it's fairly simple actually. Compared to any other package of similar capability, it's quite average in terms of difficulty actually. No worse than something like Exim or Apache.

The difference with something like Exim or Apache is that the tricky concepts you need to understand are mostly external constraints. SMTP is weird and complex. Serving files via HTTP and connecting to web apps is slightly less weird, but much more complex.

A basic install of Nagios, on the other hand, is doing something pretty simple and straightforward. But at least with the 2.x series it was an unnecessarily giant pain in the ass to configure because you had to understand the Nagios-specific way of looking at the world and handling configuration. It may be easy once you know it, but there's a steep learning curve to get you that far.

Now I'm not griping. It is free. Although I was tempted a couple of times, I never quite got around to fixing it or building a competitor. But if I had to talk somebody through setting up Apache or Nagios over the phone, I'd rather do 5 of the Apache calls than 1 of the Nagios calls.

Re:not good. (1)

amorsen (7485) | more than 5 years ago | (#25304125)

Nagios is probably the easiest to use network monitoring system. That doesn't mean it's particularly easy, just that the others are worse. It breaks down when the network has trouble though; if a significant number of host are unreachable it takes forever for nagios to figure it out. That tends to be exactly when you need it the most.

Re:not good. (1)

Qzukk (229616) | more than 5 years ago | (#25304249)

forced to dig through configuration files and learn syntax

As opposed to what, punching the numbers into a pretty little GUI like one of many [lilacplatform.com] ?

Re:not good. (3, Informative)

mindstrm (20013) | more than 5 years ago | (#25304429)

If all you want is a tool to ping a few servers, nagios is overkill.

My gut reaction is that if nagios configs seem too complicated, you likely have never had to roll out real enterprise monitoring.

Our Nagios install monitors thousands of things, many of them custom tests.
(Transaction volumes, application response times, cron job status, files....).. it can be made to to be the focal point for all the "stuff" the people responsible for monitoring company IT operations need to know about.

Re:not good. (1)

pak9rabid (1011935) | more than 5 years ago | (#25305815)

Check out GroundWork [groundworkopensource.com] . It's basically Nagios + a fairly easy-to-use web interface. We've been using it up at my work for over a year and it works great.

Re:not good. (3, Insightful)

isorox (205688) | more than 5 years ago | (#25306907)

Is it extensible?

Yes, what can't you monitor with nagios?

Is it easy to use?

You should see our 2nd line people, if they can use it, anyone can.
1) Big red problem appears on page
2) They click the link to the logging system which does an asset-based search showing recent problems.
3) They click the link to the wiki page for that host, which hints at how to fix it.
4) Red thing goes away

There's a difference between *use* and *configure*. Nagios is the easiest monitoring system we've ever used in our department. It's pretty easy to configure too when you know what you're doing (one config file per device host, one directory per logical division of devices, one perl script to splat out the devices, one subversion repository to version track everything)

I hope they changed it in a major way, because last time I tried to use it I was forced to dig through configuration files and learn syntax just to get the thing to see if some server was responding to pings.

So? What use is a monitoring program that tells you that. If you want to do decent monitoring, you want to monitor the systems, not the devices those systems happen to run.

It's a steep learning curve, but have you ever configured apache from scratch? Let alone bind or sendmail.

Cacti Users (2, Informative)

cmorford (906819) | more than 5 years ago | (#25303415)

I've used Nagios, but found Cacti and haven't turned back. Any other Cacti users out there? I found Cacti to be much easier to setup than Nagios and fairly extensible for the advance user.

Re:Cacti Users (1, Informative)

Anonymous Coward | more than 5 years ago | (#25303947)

Try ClearSite... it's what Cacti is to MRTG. http://clearsite.sourceforge.net/coming-soon.html [sourceforge.net] Linux only for now, but the developers are very nice and will share a newer version if you contact them. -theWiseWan

Re:Cacti Users (1)

amorsen (7485) | more than 5 years ago | (#25304145)

Cacti isn't very useful for alerting, just as Nagios really doesn't work well for graphing (one of its more annoying shortcomings).

Re:Cacti Users (1)

Fez (468752) | more than 5 years ago | (#25304159)

We use both. Cacti for graphing, Nagios for monitoring and paging.

Nagios 3 did change a bit for the better. However, because they removed MySQL support I had to rewrite large portions of the existing configuration.

In the process, I made much better use of templating and now each host config is in its own file, and Nagios will load all files in given directories thanks to the cfg_dir= directive.

For example, all of my servers are in etc/nagios/servers/(servername).cfg, routers are in etc/nagios/routers/(routername).cfg, and so on.

If I want to add a server, I just pick a similar one and copy the file, change the name/IP/services/etc, and reload Nagios. With the older config (held over from Nagios 1.x) I had to edit half a dozen files or more just to add a single server. Thankfully those days are over!

Some of these abilities may have been in Nagios 2.x, but because the old config "Just Worked" it was not changed.

Re:Cacti Users (4, Informative)

thanasakis (225405) | more than 5 years ago | (#25304179)

You are comparing apples with oranges, nagios is for service monitoring, cacti is for diagrams.

Re:Cacti Users (1)

cmorford (906819) | more than 5 years ago | (#25304327)

Not necessarily. You can install the monitor plugin (cactiusers.org) to cacti and get all the alerts you want. While i suppose it doesn't come installed by default, it definitely combines graphing and alerting into a single package that works well.

import Nagios data into Cacti? (0)

Anonymous Coward | more than 5 years ago | (#25306059)

You are comparing apples with oranges, nagios is for service monitoring, cacti is for diagrams.

If you're polling the devices anyway, why can't you feed the data into a database and draw charts? Why do you need two systems? It all comes from the same source.

Re:import Nagios data into Cacti? (0)

Anonymous Coward | more than 5 years ago | (#25306313)

because it's the most straightforward thing to do. 99% of the time it is very very cheap to poll, using SNMP for example. Nagios was not designed to replace mrtg, so not surprisingly the solution it gives is like bolted on the system. Try monitoring traffic on hundreds of interfaces and see where nagios gets you. Not pretty. On the other hand, cacti does not have the capabilities of nagios when it comes to service check scheduling. Using both really scales very well. At least in my experience that is.

Re:import Nagios data into Cacti? (1)

perldork (1381373) | more than 5 years ago | (#25308345)

If you're polling the devices anyway, why can't you feed the data into a database and draw charts? Why do you need two systems? It all comes from the same source.

You don't need two tools, the PNP Plugin does RRD graphing from performance data returned from Nagios (there are other add-ons for Nagios that do this as well), PNP is just the most flexible in my opinion.

Re:Cacti Users (1)

sarabob (544622) | more than 5 years ago | (#25304217)

I found Cacti a nightmare to configure, setting up custom graphs is comically complicated (why can't I just use rrd syntax rather than clicking buttons?) and we always end up with three data sources for the same things. Support for SNMPv3 is patchy, and we needed to jump through hoops to get the graphs to cope with multiple cpus (cpu usage over 100%? It wraps back to zero...).

Re:Cacti Users (1)

Fez (468752) | more than 5 years ago | (#25304409)

I've had similar problems with trying to make a custom graph in Cacti. For example, in MRTG, to make a graph that simply added two OIDs, you just set the source to OID1+OID2, and you're done.

Just try doing that in Cacti. You'll learn more about graph templates and CDEFs than you ever cared to know in the process...

Re:Cacti Users (1)

mindstrm (20013) | more than 5 years ago | (#25304465)

A big install would use both.. as they are very different tools.

Nagios is a monitoring & alert framework.

Cacti is a graphing framework...

Does cacti have some ability to do problem detection, notification, escalation, acknowledgement, resolution, and trend reporting?

Re:Cacti Users (1)

cmorford (906819) | more than 5 years ago | (#25304585)

To some extent yes. There are some plugins at cactiusers.org, one being a monitor plugin. Now it may not be as extensive and nagios, but it's a start.

Re:Cacti Users (0)

Anonymous Coward | more than 5 years ago | (#25305513)

Fucktard ;

cacti is just a graphing tool with some extras

No way to send mail with link to SLA report (0)

Anonymous Coward | more than 5 years ago | (#25303555)

I gave up using Nagios when I found that there was no way to send an email to management linking to a preprepared (by month) Nagios SLA report.

What's the point of having a great monitoring tool if you can't share the reports in an useful and practical manner.

So I switched to Zabbix. Not happy with that either.

Re:No way to send mail with link to SLA report (1)

Neil Watson (60859) | more than 5 years ago | (#25304171)

You might look at Opennms or Zenoss. The key to choosing the best monitoring service for you is to clearly define what you want to monitor and how you would like that information presented.

If you feel Nagios is too difficult (1)

guruevi (827432) | more than 5 years ago | (#25303857)

Try Pandora FMS. It does the same as Nagios, is open source but only requires a minimum knowledge of shell scripting to get it working and can monitor everything you can think of inside (using an agent) or outside a host. I monitor about 100 hosts with it and have about 1200 data points every 5-10 minutes (temperatures, network packets, processes etc.) but it scales much larger (using MySQL as backend) even on simple hardware.

Monitor my gf (1, Funny)

Anonymous Coward | more than 5 years ago | (#25303885)

I actually managed to get a girlfriend. And she is a real hottie. I don't want to sound paranoid, but I need to monitor her to make sure she is not cheating, and keeping her AV up to date, you know what I mean.

So when I read this, "no device or situation has yet been found that cannot be monitored using Nagios and a pre-made or custom script, plug-in or enhancement", I thought this would be perfect.

I think a Nagios plug in would be best, preferably something with sharp blades. So how do I install this in my gf, and have Nagios monitor it?

Re:Monitor my gf (1)

yukk (638002) | more than 5 years ago | (#25304343)

I think the device you're looking for is known as a chastity belt [wikipedia.org] but you'd need to couple it with a personal GPS tracking device [travelbygps.com] Or you could just hire a Private Eye [wikipedia.org] .
Of course, whether she continues to like you after all this is not my responsibility.

Re:Monitor my gf (1)

isorox (205688) | more than 5 years ago | (#25306669)

I think the device you're looking for is known as a chastity belt [wikipedia.org] but you'd need to couple it with a personal GPS tracking device [travelbygps.com]

Or you could just hire a Private Eye [wikipedia.org] .

Of course, whether she continues to like you after all this is not my responsibility.

Indeed, Heisenberg did say that you the act of monitoring something will impact it

Re:Monitor my gf (2, Informative)

dfn_deux (535506) | more than 5 years ago | (#25306049)

I understand that your comment was made in jest, but.... Nagios is a really flexible polling and alerting framework. There is nothing in nagios that makes it specifically tailored to monitoring computers or services. For example, there is no reason why you must use the HostAddress directive to hold an IP or a hostname, it could just as easily be a street address, phone number, SSN, etc... And like wise there is no need for polling to actively poll, you can just as easily configure nagios to only respond to passive updates. So, just for the sake of argument, if you really wanted to use nagios to track/control a human's actions and movements you could combine passive monitoring by having an investigator follow your target and supply them with either a phone number, email address, or website where they could submit a "check result" while at the same time you could do active monitoring by utilizing any number of GPS/cellular logging devices combined with a small analysis script with some thresholds. If you wanted you could even use the output of the gps to update the relative location of "nodes" on your status map... I believe one of the examples in the documentation has phone numbers for local pizza places used as HostAddresses and has a dial out script to check the average rings to answer for phone availability validation.

Nagios can die in a fire (1)

Anarke_Incarnate (733529) | more than 5 years ago | (#25303997)

Long live Hyperic. Free and extensible without the nonsense to set up that is Nagios. You can use Nagios plugins for it if you so wish.

Re:Nagios can die in a fire (1)

druiid (109068) | more than 5 years ago | (#25304567)

Hyperic is incredible.. but for my uses I need the enterprise version. Paying an extremely high amount of money for only 25-30 servers is not in the cards... and thus I chose zabbix, which does enough right to be a good replacement.

Re:Nagios can die in a fire (1)

Anarke_Incarnate (733529) | more than 5 years ago | (#25304781)

The non enterprise can be used in a 30 server environment, but you DO give up a lot of functionality and have to re-invent things for it to work. I just wish they open sourced the lot of it, but unlikely

Slashdot Book Review Template (3, Informative)

rhizome (115711) | more than 5 years ago | (#25304041)

1st Paragraph: Paraphrase of Foreword.
2nd Paragraph: What the initial chapter(s) is (are) about.
3rd Paragraph: What the next chapter is about.
4th Paragraph: What the chapter after that is about.
5th Paragraph: What the last chapter(s) is(are) about.
6th Paragraph: Pithy criticisms for balance.
7th Paragraph: Conclusion with the required, "This book is useful if you are like me" statement, as in, "Overall, this is a great book for anyone using Nagios as more than a casual user, and is still very informative for the casual user."

I have this book, it is not impressive. (5, Insightful)

hax4bux (209237) | more than 5 years ago | (#25304089)

This book is not a big leap over the supplied Nagios documentation. I bought it out of guilt, but I doubt I have gotten my moneys worth. This is not so much a criticism of the book as praise for the supplied documentation (which is rather decent, given the topic).

Getting Nagios (or OpenView or whatever management system you have) working is a big job which will not be solved w/a $40 book and a afternoon.

For all of you who complain about Nagios being complicated, I hope you never see OpenView (et al).

If you haven't seen Nagios, there is a daemon which performs the collection. The UI is browser based (Apache HTTPD CGI applications). Sometimes there are agents on remote machines to collect status like process tables, disk utilization, etc.

Nagios is essentially a job scheduler/messaging system. Monitoring is performed by invoking little programs dedicated to collecting information, and these are easy enough to create. There are lots of hooks if you need to extend the system.

Since the UI is owned by HTTPD, so is access control. Who doesn't know how to set up LDAP or a auth file for Apache? Most of the CGI plugins are implemented in C and are not ugly to look at.

The agent issue is a little clouded because there are many agents to choose from. I usually just use the Net-SNMP agent because I have a lengthy SNMP background, but that is just my personal choice.

I will stop here since the article is about a book and not Nagios. I merely wanted to address some of the criticisms of Nagios.

Re:I have this book, it is not impressive. (1)

tcopeland (32225) | more than 5 years ago | (#25304505)

> I bought it out of guilt, but I doubt I have gotten my moneys worth.

I usually figure if I get _anything_ at all out of a book than it's worth the price. I just bought a Puppet book [amazon.com] and just having it around for occasional skimming has gotten me familiar enough with Puppet that I'm willing to give it a whirl. And for $17.99, meh, good enough.

I like Zabbix (0)

Anonymous Coward | more than 5 years ago | (#25304205)

Nagios was ok but the mapping on Zabbix is way better. I have it running in a virtual server on top of ubuntu. works great! Very flexible, can create your own icons, the graphs are easy to setup..

We used what's up gold before. which has a nice interface but when you consider buying a server license, buying what's up for $2,500 for 100 devices, and then dealing with ms sql, Zabbix wins.

not in stable portage (0, Troll)

Anonymous Coward | more than 5 years ago | (#25304319)

... come back to me when nagios v3 is marked stable. :P

Re:not in stable portage (1)

Darby (84953) | more than 5 years ago | (#25305725)

... come back to me when nagios v3 is marked stable. :P

Unmask it, it works fine for me on x86_64. I upgraded over a month ago, no issues and many improvements.

Re:not in stable portage (1)

jaydubscott (897863) | more than 5 years ago | (#25308643)

umm...from the website (http://www.nagios.org/)

Latest versions:
Nagios 3.0.3 (stable)

Nagios is well knows in bigger enviroment (1)

Krneki (1192201) | more than 5 years ago | (#25304369)

If you environment is big enough so you can employ at least 1 person to fully work with Nagios, then it's a great product. But out of the box it needs too much time to become usable. I'm talking of Nagios 2, I have no experience (yet) with Nagios 3.

Re:Nagios is well knows in bigger enviroment (1)

thegrassyknowl (762218) | more than 5 years ago | (#25307455)

Nagios is good, but people have gotta start learning its limitations. I recently brought up the issue of security with some people and their answer: Nagios... There is the right tool for the right job, and Nagios is one small tool in the admin's arsenal to solve one problem. It's being touted as a universal panacea by those with little real knowledge and it's a little scary.

Nagios has much better competitors (2, Interesting)

druiid (109068) | more than 5 years ago | (#25304549)

I used nagios for years.. many many years. It has to be, as many have already pointed out.. the most difficult to configure OSS project ever made.

That said, it was fairly powerful once configured properly.

The thing is, though, that is has many shortcomings. I found a much better (although not necessarily as scalable) monitoring and data-gathering solution in Zabbix. They recently released a new version as well that adds many really nice capabilities like ipmi support.

WTF? LOL... (2, Insightful)

Colin Smith (2679) | more than 5 years ago | (#25305903)

I used nagios for years.. many many years. It has to be, as many have already pointed out.. the most difficult to configure OSS project ever made.


R$+@$=W $@$1@$H user@thishost -> user@hub
R$=W!$+ $@$2@$H thishost!user -> user@hub
R@$=W:$+ $@@$H:$2 @thishost:something
R$+%$=W $@$>3$1@$2 user%thishost

Sendmail...

Nagios is easy, but it only makes sense if you have dozens or hundreds of systems, for less, get something simpler, and it will only work if you understand how to group your hosts, services etc.
 

Re:WTF? LOL... (1)

dubl-u (51156) | more than 5 years ago | (#25307729)

Sendmail is necessarily hard. Mail routing at the time was complicated. Now it's easier, which is why Postfix is a snap to configure for common cases, and why a lot of Sendmail admins never have to see the scary magic at the heart.

Nagios, on the other hand, is unnecessarily hard. Especially for simple setups and novice users, the pain is ridiculously out of proportion to the gain.

Hobbit: spin off of Big Brother (1)

djs1w (567018) | more than 5 years ago | (#25304573)

I currently monitor about 250 hosts with Hobbit (http://sourceforge.net/projects/hobbitmon) and have had good success with it. It has trending (RRD graphs) and alerting thresholds (ie 85% full, email, 95% full pagers) built in together. It is also customizable. We have created several perl scripts that check random applications for various things that are also tracked with Hobbit. How much data Netbackup backed up last night. How many users are logged into our portal. Are the tape drives within Netbackup up or down? The list goes on and on. It runs on Unix and Windows, although the Windows client isn't as robust.

Zabbix (4, Interesting)

kosmosik (654958) | more than 5 years ago | (#25304807)

I like Nagios but I can't really imagine how to apply it in large (think ten thousand hosts) setup in multiple regional/organizational branches and so on.

Also Nagios *is* painful to setup. First of all AFAIK there is no way to delegate administration f.e. to organizational branches. Configuration is just a big pile of config files included from some other config files etc. There is no autodiscovery/autoconfiguration of hosts since Nagios team belives it is BAD etc.

Well IMHO Nagios is grat but it is like, a big fat pile of hacked scripts and configs. Not too elegant but working.

Now... I am (well we are in my organization) using Zabbix and I find it great. It is much better organised/elegant than Nagios.

In Zabbix architecture you have well designed atomic elements like checks, items, services (groups), etc. It also gathers fine tuned historical data for trends and historical review. You can compact the data (lower the resolution) after a given time and so on. It is in fact a very complete monitoring framework with its own internal condition language, escalation engine. You can gather data from network checks, SNMP, custom scripts, Zabbix agents (aviable for most platforms) etc.

And it has normal configuration, not crude text config files. I have nothing against text files but sometime I don't really want to open my text editor only to quickly setup an ad-hoc overwiev screen with maps, graphs, status displays, clocks and you can have few screens of such rotating on your big screens in NOC. All with mouse clicking.

I can give it as a tool for sysadmin and he or she can work with it without having to study manuals. Not everybody in your organization is an unix hacker you know...

We have dozens of branch servers which are managed by local sysadmins and a farm of central servers which is managed by central staff.

Zabbix works in distributed manner so a local branch can have very detailed view on their infrastructure and at central level I can have an functional/business overview of entire infrastructure, core services (like business systems, transactions etc.) Not just simple checks if RAID is OK - I don't care if RAID in some server is OK. I need to know why (where, who to blame) given service (be it MQ/WebSphere) is not working as desired.

And also it is free, open source and aviable in most linux/unix distributions as a standard package. So when considering enterprise monitoring platform do yourself a favour and also check Zabbix.

http://www.zabbix.com/downloads/ZABBIX%20Manual%20v1.6.pdf [zabbix.com]

Re:Zabbix (1)

knarfling (735361) | more than 5 years ago | (#25304947)

I'll second Zabbix. It has gone through some growing pains, but I like it for its ease of use as well as its flexibility. Until this last version, it did not have good escalations or repeat notifications, which was a big problem. However, with 1.6, that has been corrected.

One of the things I like about Zabbix was the ability to write custom checks. If you could get any script or program to spit out data, you could very easily capture that data and run checks on it. The windows client could read Windows Performance Counters, so a TON of custom checks were easily written. In my last job I used it to monitor an incoming feed from another company. If I didn't receive info from the company for 20 minutes, I could send out alerts for someone to check the feed. I am sure that could be done with Nagios, but it was much, much easier with Zabbix.

Re:Zabbix (2, Informative)

kosmosik (654958) | more than 5 years ago | (#25305169)

Well for me what ruled out Nagios was:

1. It is painfull to setup, don't get me wrong - I've sat my time over configuration and I think I know it a little bit and I can easly set it up for like 100 hosts with some templates +includes +sed magic. But that is what I can do. Not all of my staff can do it and it really is not easy.

2. It is not distributed. The checks can be distributed. But you cannot have like 20 child Nagios nodes managed by local staff and parent nodes that gather data from children. This is a killer feature of Zabbix for me. I can send out a standard configured box/server with Zabbix to my local staff. Give them access via LDAP/AD. And tell them to configure it so it suits *their* local setup (well we have quite uncommon/unstandardized branches - historical/political reasons). Then I can gather data from their local system (they have configured it) and process it in central place so I can have a clear overview what is going on in infrastructure. I really have no clue on how to do it with Nagios - probably it is possible with some ninja-like-hacking but it is not something (ninja-like-hacking) you like for big organization. You need a clean, managable stuff.

3. Zabbix can collect and really process historical data. If for some reason I wish to know how in past year my network bandwith evolved I can quite easly click and get some nice graphs, reports and even prognose some stuff based on various trends.

To summarize Nagios for me seems like perfect tool for sysadmin. But it is not so good for enterprise monitoring where you have quite different goals.

Third on Zabbix (0)

Anonymous Coward | more than 5 years ago | (#25305397)

I had put in a comment earlier about zabbix and it seems to be missing.

anyway, I third on zabbix. It works great and is very flexible. I really like the mapping!!

Re:Zabbix (0)

Anonymous Coward | more than 5 years ago | (#25307685)

I have to agree, Zabbix is by far one of the best *sysadmin* tools I have ever used.

Rather use a hammer than a rock (0)

Anonymous Coward | more than 5 years ago | (#25305033)

By the very nature of network administration the associated tools are going to be complex. Nagios offers functionality and control options that are appropriate given the type of operations for which most admins are responsible. I'm looking forward to digging into V.3. Like the man said, don't hate the hammer hate the house!

I use PandoraFMS (0)

Anonymous Coward | more than 5 years ago | (#25305539)

I've been using PandoraFMS 1.3.1 and now pandorafms 2.0 and for me it's the best monitoring tool nowadays.
Nagios, for me has a very poor reporting which is not helpful whatsoever, and the interface is awful.
What I like of pandorafms is the graphs and how easy is to monitor a host from the webconsole, nagios it's horrible to setup.

my 2 cents

Nagios is a steaming pile of doggy-do (0)

Anonymous Coward | more than 5 years ago | (#25305635)

no, it really is the biggest pile of cow crap I've ever seen in my entire life.

it sucks ass worse than a hoover. I'm not joking.

get a *real* monitoring system

Nagios documentation (1)

kosmosik (654958) | more than 5 years ago | (#25305643)

Right now I wanted to check Nagios documentation for simple thing - configuration file syntax. This is the basic stuff. It is the first thing that should be defined in reference manual. I like to know how the files are processed. How do I do comments. How do I define multiple line commands and so on.

Please point me out that I am blind or stupid since I really cannot find it in manuals here:
http://nagios.sourceforge.net/docs/3_0/toc.html [sourceforge.net]

Also I find the online manual quite retarded/clunky. It doesn't even has a search! I wonder why they havent use somekind of wiki (any serious wiki system has search) or similar.

Re:Nagios documentation (1)

hax4bux (209237) | more than 5 years ago | (#25305685)

Have you looked at the sample commands which come w/the distribution?

Re:Nagios documentation (2, Informative)

Spad (470073) | more than 5 years ago | (#25306081)

You'll be wanting:

http://nagios.sourceforge.net/docs/3_0/configmain.html [sourceforge.net]
http://nagios.sourceforge.net/docs/3_0/objectdefinitions.html [sourceforge.net]
http://nagios.sourceforge.net/docs/3_0/configcgi.html [sourceforge.net]

Initially. There's a lot of stuff that isn't linked directly off the TOC, which is a pain, but it can be found with a bit of digging (or download the PDF and search it).

The FAQs (http://www.nagios.org/faqs/) also have a fair amount of useful info (Such as why the bloody thing won't use GD2 without a lot of arsing around).

I'd also recommend the forums here: http://nagios.meulie.net/ [meulie.net] (Though they seem to be down at the moment).

Re:Nagios documentation (1)

kosmosik (654958) | more than 5 years ago | (#25306229)

OK but it still does not answer my question how to put long variable into multiple lines. Do I do it "bash" style like\
that. Or some other way?

Also the pain - I've said that the documentation is the pain. Also I prefer complete/reference documentation over FAQs/forums.

Nagios is a mess (2, Insightful)

Kent Recal (714863) | more than 5 years ago | (#25305733)

Blech, nagios is probably the most disgusting hack currently in wide use. It was overdue for a complete rewrite after Nagios2 - but nagios hackers don't seem to have any pain treshold. Nowadays it's not even funny anymore. Nagios has gone *way* over its expiration date. The closest analogy would be a pot of milk that has been sitting in direct sunlight for 6 months straight...

I strongly suggest that anyone looking for a monitoring solution stays away from the dead horse and looks at the modern alternatives first. There are plenty: Munin, Cacti, Zenoss, Pandora, OpenNMS, just to name a few.

Most importantly: Take your time before you decide and evaluate thoroughly. A monitoring solution will stick with you for a long time and migrating to a different software is usually a very painful process. Which, btw, is the main reason why so many sites still ride the dead horse...

Re:Nagios is a mess (1)

kosmosik (654958) | more than 5 years ago | (#25305883)

You are partially right - Nagios is a bit legacy.

But you have mentioned Munin and Cacti - these are just simple graphing solutions. Munin is generaly useless - you have only year/month/day views (or similar), you cannot zoom into fe. 2 hours range last week. Cacti is totally better than Munin.

But also Cacti is just a simple SNMP pooling and then graphing solution. It has some plugins as tresholds but it really is not that class of solution as Nagios (or better Zabbix).

Nagios is an *engine* that processes messages. Something like message broker. Cacti is something quite different.

GroundWork (1)

pak9rabid (1011935) | more than 5 years ago | (#25305895)

For those of you that aren't particularly fond of the complexity of Nagios' configuration, check out GroundWork [groundworkopensource.com] . It's basically Nagios + a fairly easy-to-use web interface. We've been using it up at my work for over a year and it works great.

Hard to set up? (3, Insightful)

isorox (205688) | more than 5 years ago | (#25306851)

So Nagios is hard to set up? Probably, you can't go from zero to running in 5 minutes. It's a steep learning curve, but if the initial investment of a book (I used building a monitoring environment with nagios) and a few hours, you shouldn't be monitoring things. You won't do it correctly, you may as well throw some cron jobs together.

The first step in monitoring is working out what you want to monitor. The second step is working out what you really want to monitor. The third step is working out how you want to display problems. When you have 60 people in support working on a 6 shift 24/7 pattern, you can't expect emails to be any use. "Service problems" in nagios is fine, but there's a lot of issues that 2nd line don't need to know about -- solaris security patches on an intranet for example, can wait until the 9-5 admins get in.

Nagios is painfully easy to administer, if you set it up right. Once you know what you're doing (or even know enough to be dangerous, like myself), you can deploy a new nagios installation in about 20 minutes, add a new device that follows existing rules (new web server for example) in under 5 minutes, and a new device with new plugins in half an hour.

Nagios then grows organically. When something strange and new breaks we cobble a plugin together,

Configuration is in plain text files, one for each device on the network. I have these as an subversion working copy, which gives me the ability to track changes and easily roll back any configuration problems.

We have dozens of weird bespoke plugins, one uses WWW:Mechanize and Perl to run through a workflow on a specifc webpage, another looks at the rate of change of growth of a jboss logfile, and the proportion of stack traces, one logs into a remote machine and checks jumbo pings are working through the network.

We find nagios essential to monitor the service we provide. I don't particularly care if the server an oracle database runs on is pingable, I care if I can log in and run "select 1 from dual" (or usually something more application specific).

The small system we monitor is made up of about 800 services over 190 devices.

nagios = headache (1)

zeki893 (916592) | more than 5 years ago | (#25306967)

The number of people complaining in this topic about the ease of use of nagios shows that nagios is lacking. Trying to figure out nagios is a waste of time when there are so many alternatives out there that are much easier to use.
i.e. cacti(mostly for graphing, but can be used for alerts using plugins), pandora FMS, groundworks, sitescope, the Dude.

Re:nagios = headache (1)

Destoo (530123) | more than 5 years ago | (#25308229)

Groundworks is actually just an interface for Nagios. It was very straightforward to set up.
the Dude is an excellent Windows alternative.
I'll try to look at Cacti.

Too many amateurs using Nagios (2, Informative)

rossz (67331) | more than 5 years ago | (#25307249)

I have never once personally had any dealings with a properly implemented Nagios system. Every single time it was obviously tossed up by someone who had minimal knowledge of how to properly monitor the infrastructure.

The biggest complaint I hear is "too many alerts". So set your dependencies properly! You say you did that but you still get 600 alerts when the router dies? That's because you told it you wanted the alerts. See that "u" in "notification_options". That means "unreachable". You want to be alerted when the box can't be reached. You probably wanted "d,r", not "d,r.u".

The next complaint. It's so much work to add a system. Huh? It takes me about 30 seconds to add another system and all the tests I need. The trick is using host groups to automatically assign tests to a system. For example, using a generic LAMP type server. What can we assume about this? It's running Linux, Apache, MySQL, and Perl or PHP. That's a bunch of tests right off. In my world, SNMP is assumed on all systems (because I made it that way, that's why). So we define a bunch of service checks using SNMP, but instead of using "host_name some_hostname", we use "hostgroup_name lamp-servers". Now when I add a new server, I add "hostgroups lamp-servers" to the definition and like magic it gets all the tests I need: snmp port responding, ssh access, apache daemon running, mysql daemon running, web page accessible, disk space good (defined in snmpd.conf), CPU usage, load average, plus sone automatic dependencies: all snmp tests depend on the snmp port responding. Web pages are dependent on the apache daemon running, etc. I even have some simple graphing included automatically. Even the O/S icons are defined by the hostgroups. Each distro has its own hostgroup which takes care of that detail (e.g. centos-system and ubuntu-system).

Ten simple lines to define a new hosts can result in 20 service checks. I rarely need to define a new service check. And when a router goes out? One alert for the router.

Not every system is going to be generic like this, but any time I have more than one system require a specific service check, I create a hostgroup to handle it.

Ease of use? (1)

VoidCrow (836595) | more than 5 years ago | (#25307325)

I always found it an absolute pig to configure.
Load More Comments
Slashdot Login

Need an Account?

Forgot your password?

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>