Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Server Monitoring Solutions?

Cliff posted about 11 years ago | from the keeping-an-eye-on-things dept.

Software 58

bwhaley asks: "The University I work for has asked me to research software solutions for server monitoring. More specifically, a piece of software that will monitor server variables such as load, swap usage, POP/IMAP processes, total processes, and all the other interesting data about a server's health. Watching these variables can give administrators advance warning about potential problems with the server. We are currently using an in-house solution written in Perl but its age is showing. I have found plenty of proprietary solutions such as HP OpenView and Sun Management Center, but these cost thousands of dollars. What solutions do Slashdot readers use? Are there any powerful open source solutions that I'm missing? Is anyone else running homegrown software that they are happy with? We are running an entirely Solaris environment but I am interested in any UNIX solution."

Sorry! There are no comments related to the filter you selected.

GS (-1, Troll)

Anonymous Coward | about 11 years ago | (#7226339)

goat sex

CompSci students? (2, Insightful)

agent dero (680753) | about 11 years ago | (#7226343)

I would suggest talking to whoever teaches computer science and software. Get the kids doing this for an education to rewrite your perl scripts that do the same job.

That's something you can pass off as helping everybody, saving y'all money and teaching compSci kids how to work with the computers and OSes

Re:CompSci students? (1)

Glonoinha (587375) | about 11 years ago | (#7226878)

Oh man I was reading this thinking 'yes! Yes! YES! I have an answer!' right up until I read the last few lines about actual platforms.

I was headed towards setting up perfmon as a service and having one machine lookup the values from all the other machines and display them in either graph or save it as data - but this is obviously not the answer you were looking for.

Hey, I tried. I am only just now coming up to speed on Linux so it will be a while before I am useful in that arena. Slashdot motto : if you can't be right, at least post early (hey, I tried.)

Gotta do something while XP installs over the course of the next 37 minutes.

Re:CompSci students? (0)

Anonymous Coward | about 11 years ago | (#7250677)

Then watch as the comp sci students read all the other students emails, install their own programs, read prof emails, and hack their way through the whole system.

Big Brother (1)

macx666 (194150) | about 11 years ago | (#7226344)

Check out bb4.com [bb4.com] .

Nagios... (1)

pardey (568849) | about 11 years ago | (#7226346)

Nagios [nagios.org] might be what you're looking for. Cheers.

Re:Nagios... (1)

pardey (568849) | about 11 years ago | (#7226388)

Or not - I'll read the post more carefully next time...

Nagios (1)

ralphus (577885) | about 11 years ago | (#7226349)

Have you heard of Nagios [nagios.org] ?

Re:Nagios (3, Insightful)

Sentry21 (8183) | about 11 years ago | (#7226431)

I second Nagios. I set it up as a technology test I was doing a while back to monitor our internal network and some remote servers (arbitrary web servers on the internet) for a lark - got it telling uptime, system load, swap, memory usage, processors, network load and the like on our Linux and Win2K machines (including various network interfaces - when the wired interface on the laptop was disconnected, it paged me - useless for our situation, but good for multihomed machines).

It can monitor all kinds of machines, services, ports, networks, pings, traceroutes, anything. Beautiful setup, and highly recommended.

--Dan

Re:Nagios (1)

omegaman_1 (540902) | about 11 years ago | (#7233606)

I'll third nagios (www.nagios.org) as I've used it and its previous incarnation (netsaint) in production environments. It has a very extensible setup. It has a very active development community as well. You could probably set up a limited test of its functionality on a spare box in a weekend.

Re:Nagios (1)

ajayrockrock (110281) | about 11 years ago | (#7226528)

Yup, Nagios is pretty much what you're after. I had it up and monitoring all my servers in about a day.

The only advice I can give is take the time and read the docs. They are very good and understanding what's going on will save you loads of time down the road when you want to add stuff.

later,
ajay

Nagios (1)

merphant (672048) | about 11 years ago | (#7226361)

I haven't used it but it seems like Nagios [nagios.org] is what you want. It's GPL and is supposedly very powerful.

Re:Nagios (1)

Aparthy (7792) | about 11 years ago | (#7226395)

I second nagios. We use it at work to monitor around 700 hosts and all of their services. Just don't have one machine monitor more then a few hundred hosts, it tends to get a bit behind at time.

Big Brother (1)

Kowh (61371) | about 11 years ago | (#7226367)

Big Brother [bb4.com]

There's a vibrant community with lots of scripts [deadcat.net] to extend functionality.

It's free as in beer (but not freedom) for almost all uses, and is open source. You only have to pay if you use it to generate money.

BigSister (1)

TBone (5692) | about 11 years ago | (#7243389)

BS is the rewrite of BB4, which uses actual shell scripts, to make the modules use Perl and be much more "correctly" modular.

Re:Big Brother (0)

Anonymous Coward | about 11 years ago | (#7255454)

Uh, if you have to pay to use it to generate money then it isn't open source. Source code may be available but that doesn't mean it is open source.

Easy (1)

keesh (202812) | about 11 years ago | (#7226372)

Big monitor, gkrellm over remote X and someone to sit there and watch :)

Re:Easy (1)

bohlke (176080) | about 11 years ago | (#7226406)

i am just looking at my 10 remote gkrellm now :-)
its a big bunch of information :-)

it is fun to find some degree of paterns ;-)

Re:Easy (1)

kidlinux (2550) | about 11 years ago | (#7231756)

Better yet, run a local copy of gkrellm and connect to the remote gkrellmd. gkrellm is nice for quick glances but doesn't keep any history of what it monitors, which I imagine is part of what the poster is looking for.

It's nice to be able to analyze the historical data to make predictions and such.

Re:Easy (1)

keesh (202812) | about 11 years ago | (#7231972)

Unfortunately, gkrellmd doesn't yet handle plugins entirely correctly...

top (1)

pizza_milkshake (580452) | about 11 years ago | (#7226385)

top [gnu.org] is terrific

Re:top (1)

ader (1402) | about 11 years ago | (#7227705)

OK, the version of top to which you're referring is actually here [sourceforge.net] , and it only works on Linux anyway.

top for Solaris and other Unices is here [groupsys.com] . It's great for monitoring a single system in real time, but it's not what the poster is seeking.

Ade_
/

Alarms are good, but... (1)

keiferb (267153) | about 11 years ago | (#7226438)

pretty pictures are more fun to look at! Check out cacti for all of your process/bandwidth/load/usage graphing needs. It's available at raxnet.net [raxnet.net]

Cheap mexican labor (-1, Troll)

Anonymous Coward | about 11 years ago | (#7226448)

Served us right in the past.

What about Nagios? (1)

a.koepke (688359) | about 11 years ago | (#7226453)

Nagios [nagios.org] is a great server monitoring system and seems to have what you need.

Its meant for Linux but works under most *NIX variants

Big Sister (2, Informative)

Quixotic137 (26461) | about 11 years ago | (#7226505)

If you don't want to pay for Big Brother, take a look at Big Sister [graeff.com] . It does at least much of the same thing, but free (as in beer and speech).

Re:Big Sister (1)

bakes (87194) | about 11 years ago | (#7227279)

I quite like Big Sister as well. At my last job I was using it to monitor around 50 servers, shown split into their four different functional groups.

Service failures generated emails, and we also configured it to sned an SMS to us out of office hours. The servers were mostly windows NT boxes, so when a BSOD took out a web or FTP server, we were alerted within a few minutes. The default was about 20 minutes, I had to tweak that setting. That was easy because it's all written in perl (with the exception of some of the interfaces to the windows performance counters, I think).

I also added extra links to run scripts to show network activity graphs from MRTG for the switches. I was a pretty sweet setup once I had it the way I wanted.

Big Sister can check for a response on a TCP port, check for running processes, memory or swap space, montior the run queue length, file system free space, or most other things you need, plus you can add your own easily. You can also configure thresholds so can be notified if they are reached.

It's obviously not as pretty as the many-multiple-thousands-of-dollars solutions, but it's pretty good.

Re:Big Sister (1)

teemu.s (677447) | about 11 years ago | (#7227678)

according to this Page [graeff.com] , the author of big sister is not willing to maintain the windows
port anymore - without sponsoring (-which IMHO is a good way to go)..

Re:Big Sister (1)

leitz (641854) | about 11 years ago | (#7228102)

Might want to verify, but BB probably wouldn't cost for a Uni. My understanding is that even a commercial entity can use it for free if the servers being monitored are non-commerce; i.e. your QA and development servers.

Re:Big Sister (1)

adam872 (652411) | about 11 years ago | (#7235862)

Big Sister is pretty powerful and quite extensible too. Be aware that it takes a non-trivial amount of effort to set up, as I found out. It works on all major O/S flavours though, which is a plus. It also interfaces with other packages, such as OpenView, should you ever need it to.

We are doing a similar evaluation where I work. I think we'll end up with OpenView if the costs work out OK. There are other good commercial solutions on the market, such as Foglight, Storage Profiler, Sun Management Console, Tivoli. It really depends on how much one wants to pay.

Nagios (1)

nocomment (239368) | about 11 years ago | (#7226542)

That's easy, use nagios [nagios.org] . It what I use and it's great. For the holes it doesn't fill, go try out mrtg. :-)

Two suggestions (1)

Fished (574624) | about 11 years ago | (#7226593)

First, try nagios, which is open source from www.nagios.org. It takes a small commitment to setup, but works *very* well.

Second, you might try Sun netconnect since you are running all Solaris. I haven't used it myself, but some people at my nameless company have and think well of it.

nagios (1)

gyratedotorg (545872) | about 11 years ago | (#7226635)

how about nagios [nagios.org] ?

SNMP + MRTG/Cricket/... + Mon (4, Informative)

fdragon (138768) | about 11 years ago | (#7226673)

I don't know why everyone forgets the default solution. SNMP comes with almost all Unix systems and Microsoft Windows.

If your Unix system doesn't come with one Net-SNMP [net-snmp.org] will install on many of them.

The SNMP daemon by default understands how to monitor Load Avg, Memory, Processes, and so forth. It may not be able to tell you details of the process, such as what user is logged into the POP3 daemon, but it will tell you that you have 500 of them running, and alert you (via SNMP Traps) of that fact.

ALl you need to do once you have checked the documentation for your SNMP agent and then configured it, is to setup a single (ok, maybe 2 or 3) machine to send your traps to so you can kick of alerts. With some simple scripting in $FAVORITE_SCRIPTING_LANGUAGE you can email, page, text message, update web page, or $OTHER.

Cricket [sourceforge.net] or MRTG [ee.ethz.ch] are nice utilities that will poll the servers in question (by default every 5 minutes) and produce graphs. MRTG was designed to handle network equipment and graph the bandwidth utilization, but with a change to the SNMP string, will graph anything. Cricket is the same concept but does things a little differently by using a tree configuration system for property inheritance and does graph generation on the fly instead of the at poll time method MRTG uses.

And last but not least, Transmeta produced a very good perl script monitoring package known simply as Mon [kernel.org] . This package will do active polling of the servers including issuing a transaction to the service you are monitoring. Due to the way this software monitors, you can actually see if the remote machine is alive by actually utilizing the service to monitor instead of just the "I can ping it, it must be up" mentality some people have.

Best part about all the above mentioned software is that they are all applications with an OSI Approved OpenSource license. This means you don't spend anything but TIME, and possibly a few machines to do the actual monitoring with.

And you may wonder about the impact of system performance due to the monitoring by SNMP, MRTG/Cricket, and Mon. The short answer is that I couldn't detect a noticable increase. Other utilities such as Argent (Commercial Pay For Software) would impact a HP-UX V Class 8 CPU with 8GB RAM machine from 0% on all 8 CPUs to about 20% on ALL 8 CPUs while it telneted to the machine, created about 150KB of test scripts, and then ran them.

Re:SNMP + MRTG/Cricket/... + Mon (0)

Anonymous Coward | about 11 years ago | (#7227103)

This means you don't spend anything but TIME, and possibly a few machines to do the actual monitoring with.

Whew, good thing my boss pays me $0/hour!

Seriously, the good thing about Free Software is that it gives you freedom.. it still requires an outlay of cash.

JFFNMS (3, Informative)

szysz (214137) | about 11 years ago | (#7226720)

You could use my project !

JFFNMS - Just for Fun Network Management System.

The site is JFFNMS.org [jffnms.org]
Look at the features, it has all you need, and of course the screenshots.

It will work on any Unix with PHP support, it will also monitor any standard compilant SNMP device or TCP Port, also if you have SNMP enabled it will tell you now many connections do you have to the specified port, apart from the connection delay.

Its open source, and fully supported, I just made the latest release a few days ago.

You could also look at the two working demos.

I hope any of you could use it, it really shows a lot of things about a host, that being a Server or a Router.

Re:JFFNMS (1)

szysz (214137) | about 11 years ago | (#7226765)

Ohh.. I forgot to tell you that we are number 2 in Google for Network Management System [google.com]

And that we have really nice graphs to show the server health, and also have a good trigger/action system so you can get emails or sms messages when something happens.

If you have any question, please ask it on the JFFNMS List.

Javier

ProactiveNet (1)

austad (22163) | about 11 years ago | (#7226760)

I know you're looking for something free, but others here with some dollars to spend might like this. ProactiveNet [proactivenet.com] does standard monitoring of network devices, can grab any variable available via snmp, microsoft perfmon counters, or even using shell scripts to parse data and return values you wish to monitor. It also has very extensive monitoring capabilities for just about any kind of database (it can execute any query you wish or monitor performance tables), and many kinds of middleware.

It keeps a database which keeps track of the normal values throughout the day and sets high and low thresholds. So, if you have a problem, it can use this data to try to pin down where your problem actually lies. It's actually works quite well, well enough that I just bought it. I evaluated several different products, including the standard HP and CA stuff, but the ProactiveNet stuff kicked the crap out of these in features, price, performance, and usability.

One word.... (1)

anderiv (176875) | about 11 years ago | (#7226940)

Nagios [nagios.com]

Works great, easy to configure, and can do all of the things you are requiring (CPU load/memory/processes/etc). It has a very robust dependency mechanism, and has many levels of notifications.

I've been using it for 3 years now with zero problems. It looks like v2.0 will be out in beta form by the end of the month.

Moodss (0)

Anonymous Coward | about 11 years ago | (#7226954)

Check out moodss ( http://jfontain.free.fr/moodss/ )

It's a modular monitor framework that does incredible things.

It comes with modules to monitor machines (both local and remote), network devices, database, etc. But the best part is you can write your own modules to monitor whatever you desire.

Nagios (1)

Karora (214807) | about 11 years ago | (#7227501)

Sheesh, is Slasdot a substitute for research?

Nagios [nagios.org] - I'll say it again.

OpenNMS (1)

winchester (265873) | about 11 years ago | (#7227657)

I am at this very moment experimenting with OpenNMS (www.opennms.org) in my testlab. Perhaps that is worth some investigation.

Been there, done that... (1)

ader (1402) | about 11 years ago | (#7227675)

For a specifically Solaris solution, look at Orcallator [orcaware.com] , but read my experiences [fluff.org] with that and SARGE first.

I'd second the various Nagios recommendations. The object templating configuration is very powerful once you get your head round it.

Ade_
/

lrrd & nagios (1)

CAPSLOCK2000 (27149) | about 11 years ago | (#7227712)

Lrrd is great for graphing. You can graph anything through a simple script, and a lot of example script are allready included.
Lrrd uses a single server that polls one or more clients for information.
Nagios is better at monitoring the network as a whole, and responding to events. If for example a router goes down, nagios knows that the servers behind it will be unreachable as well, and won't bother you with alerts for them. As nagios can also react to events, it would be possible to change the default route to route around the broken router.

Again Nagios (1)

rf0 (159958) | about 11 years ago | (#7228119)

Yes nagios is the best. I've had it running totally on Solaris and you can also hack in Windows support. Also wit hthe right plugins you can monitor load, disk space etc...

Rus

Windows guys should check out ServersAlive (1)

Brento (26177) | about 11 years ago | (#7228182)

I've been using the very inexpensive ServersAlive from Woodstone [woodstone.nu] since 1999, and I've been very pleased with it. It's much friendlier to use than Big Brother or MRTG (and yes, I use both of those as well). The user interface is great, very easy to point-click your way through, and you can also SSH or Telnet into it to do other administrative tasks.

It can check everything from pings, snmp, databases, web pages, services, processes, port checks, and more. For whatever it doesn't check, you can design external checks, and users share their external checks for things like Lotus Notes and file counts.

The alerting is absolutely top-notch: you can set up teams and people, and each person can have their own notification settings & schedules via ICQ, MSN, email, pager, and more. I love it because I can have my alerts delivered to the right place at the right time.

The user community is very active: there's a great email list with a lot of helpful people. I've personally written lots of web templates [brentozar.com] for it, and other users have added external checks for stuff like Lotus Notes, ODBC database checking, and more. The developers are also extremely responsive, and they do beta builds every few days with new features. For example, MSN recently turned off their old protocols, but Woodstone had already made available a new version that works with the new protocol, and explained to the email list what the ramifications were.

The newest version 4 added an Enterprise Version that can log to ODBC, so you can build web-based analytical reporting as well. That version goes for $179, but there's a free 10-check-only version and a $99 normal version. Can't say enough good stuff about this - it's outlasted four network admins at my company because the alerting at from my house (using ServersAlive) has always outperformed every solution we've put in at the office, including Big Brother, WhatsUpGold, and a few others.

Loggerithim (1)

gphat (5647) | about 11 years ago | (#7228287)

My project, Loggerithim [loggerithim.org] is right up your alley.

Nagios (1)

TheTomcat (53158) | about 11 years ago | (#7228384)

We have had great success with Nagios [nagios.org] . We even wrote custom plugins to monitor certain other aspects of our custom system (in PHP, no less).

S

OpenMapper (0)

Anonymous Coward | about 11 years ago | (#7228771)

If you want a GUI, you might want to check out OpenMapper [sourceforge.net] .

Nagios + Cricket + SNMP (1)

EvilOpie (534946) | about 11 years ago | (#7228917)

At work here we use a combination of two things to monitor our servers. First is Nagios [nagios.org] (previously NetSaint). Nagios is good because it can do very basic checks from just pinging a server to see if it's up (and network routers, switches, firewalls, printers, etc...) to actually checking to see if a certain service is up. Such as requesting a webpage to make sure that your HTTP server is running, or making an SMTP or FTP request to check that those services respond too. (it also does more, but there's no use in listing them all here.) We have nagios setup to send out pages whenever a server is reported as going down.

Also what we use is just a simple implimentation of SNMP plus Cricket (an interface for MRTG) to graph the SNMP data over time. That tells us things like CPU load, memory + swap usage, and a number of other things. Both products work pretty well and they give us a very good idea as to what is going on with our servers and such. And on the bright side, they're free! The only cost you need is the hardware to run them on.

And if you really wanted to get fancy, you could always try something like Smoke Ping [ee.ethz.ch] which tells you the latency to your servers over time. It'll report the average time for a ping reply, plus a graph of how far away from the norm a ping is. Works great for if you want to see things like if a server's network response time slows down at various points of the day, or during heavy CPU load and things like that. It's a very nice product, and it sits on MRTG just like Cricket does, so you don't even need a separate box for it.

NAGIOS is the best I have seen (1)

Grizzletooth (245582) | about 11 years ago | (#7229757)

We use NAGIOS [nagios.org] to monitor our ISP network of 125+ machines and nearly 600 independent services. Completely customizable with plug-in modules to monitor anything you like.
I remember an older one called Big Brother that was a little lighter weight.

Server Monitoring Solutions? (1)

krishnaD (514548) | about 11 years ago | (#7231519)

What about spong?

description: A systems and network monitoring system -- server programs
This package includes the spong daemon, which collects and stores
information from the spong client programs, and the program for sending
out messages when problems occur.
.
Spong is a simple systems and network monitoring package. It does not
compete with Tivoli, OpenView, UniCenter, or any other commercial
packages. It is not SNMP based, it communicates via simple TCP based
messages. It is written in perl and easily modifiable.
.
Its features include:
.
* client based monitoring (CPU, disk, processes, logs, etc.)
* monitoring of network services (smtp, http, ping, pop, dns, etc.)
* grouping of hosts (routers, servers, workstations, PCs)
* rules based messaging when problems occur
* configurable on a host by host basis
* results displayed via text or web based interface
* history of problems
* verbose information to help diagnosis problems
* modular programs to makes it easy to add or replace check functions
or features
* Big Brother BBSERVER emulation to allow Big Brother Clients to be use

Nagios implementation article (0)

Anonymous Coward | about 11 years ago | (#7231576)

[linuxjournal.com]
Network Management with Nagios is an article about deploying Nagios for a large mixed Linux/Unix/Microsoft environment at John Deere.

OpenNMS is going to lead the way. (1)

iMacorIBM (708902) | about 11 years ago | (#7233869)

There was a brief mention of OpenNMS earilier; Clearly this needs some more input. Nagios is a great tool too, but it is not as geared towards enterprise use.

OpenNMS is.

OpenNMS handles all common port services and SNMP/MIB capability (as any NMS should do). It does everything all the tools mentioned above here can do (and even incorporates a few).

It has a front-end powered by apache tomcat4 and uses postgreSQL(like Nagios) for it's database. It has commercial support, is easily deployed on multiple architectures including Solaris and Linux and has packages for Debian, Redhat, etc. (Email me for the latest in stepwise Debian deployment docs)

The reporting capabilities approach Corcord's tool capability with availablility reports emailed out from the server in PDF format. RRDtool graphs handle response time reporting on any monitored service, with a user interface for specifying specific graph output intervals. SNMP graphs for mib2.system OIDs are built in.

There is a MIB compiler for integrating any SNMP event. Custom scripts can be executed on specific events.

The pollers are very advanced, checking for specific versions and responses. They have dynamic poll frequency change on outages, and built-in down-time calendars.

I could go on, but I suggest instead that you joing the opennms-discuss list and continue your research there.

Watch out OpenView, Tivoli and Spectrum. With experience on these tools, I believe that a large part of the future of enterprise NMS based management lies within the OpenNMS community.

Best of all, the community has great people involved that have good perspective on the connection between business processes and the monitoring tools. And everyone wants to help you.

One thing Nagios has that has not been a part of OpenNMS until recently is the GUI map. This is due in part to the OpenNMS focus on enterprise functionality rather than 'slickness'.

With nearly 0.1 terabytes of downloads a month and a 25MB binary release it is easy to see the popularity of this tool. (OpenNMS.org posts this information)

To be fair, I am going to fully deploy Nagios over here to see how it is doing, though I don't think it can scale like the OpenNMS java backend.

Help nagios with swag, bigbro is rich enough (0)

Anonymous Coward | about 11 years ago | (#7235400)

You could use Big Brother as mentioned above and help Quest Software pay off that $6.6(USD) million dollar purchase (SEC Form 10-Q). I guess that works to $3.3(USD) million each.

Or try http://www.nagios.org and help them by purchasing some SWAG!

Nagios (1)

macdaddy (38372) | about 11 years ago | (#7236408)

Nagios [nagios.org] . Simple as that. You won't regret it.

Better alternative (0)

Anonymous Coward | about 11 years ago | (#7237802)

There is better alternative to Nagios. It's called Zabbix [sf.net] . Check screenshots! The software is very simple to use and allows to see performance graphs of any resolution (up-to 1 sec). Also, it has excellent notification possibilities. We are using it here to monitor network of more than 40 servers (HP-UX, Solaris) running all sorts of applications (Oracle, SAP, Domino). I've spoken to the author, v1.0 will be released very soon ;)

for crying out loud (1)

Stinking Pig (45860) | about 11 years ago | (#7245244)

Two days and no one's mentioned Nagios or OpenNMS? Both massively popular and useful.

Check for New Comments
Slashdot Login

Need an Account?

Forgot your password?