Beta

Slashdot: News for Nerds

×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

cancel ×

705 comments

Uptime (5, Funny)

cdoggyd (1118901) | more than 3 years ago | (#35269456)

Because you won't be able to brag about your uptime numbers.

Re:Uptime (5, Funny)

Anrego (830717) | more than 3 years ago | (#35269560)

I once had to move my router (486 running slackware and with a multi-year uptime) across the room it was in. It was connected to a UPS, however the cable going from the UPS to the computer was wrapped through the leg of the table it was sitting on.

I actually _removed the table leg_ so I could hawl the 486 still plugged into the UPS across the room and quickly plug it in before it powered down!

and then we had the first real substantial power failure in years like a few months later.. and the thing had to go down :(

But yeah.. now I reboot frequently to verify that everything still comes up properly.

Re:Uptime (2)

idontgno (624372) | more than 3 years ago | (#35269736)

and then we had the first real substantial power failure in years like a few months later.. and the thing had to go down :(

Perhaps caused by minor hard drive damage caused by relocating the system while under power?

A rotary-media hard drive is fairly robust, if static. If spinning, it's more fragile than a Slashdotter's ego.

I mean, it's your server, and it's an ancient 486 and all, so respect the hardware to the limit and extent you want to, but for me, if it's mine and uses hard drives, it doesn't move 2 inches or tip 5 degrees while it's powered.

Re:Uptime (4, Funny)

Anrego (830717) | more than 3 years ago | (#35269868)

I meant mains power.. due to a hurricane actually (hurricane Juan).

The machine came out fine (and actually still runs.. though I don't use it as a router any more). Those old drives are surprisingly robust ..

But yeah.. I was actually surprised.. and I did it more for the sake of the doing (the only reason I even left the machine going was because of the uptime). I'd never pull a stunt like that with a real machine :D

Re:Uptime (1)

Captain Centropyge (1245886) | more than 3 years ago | (#35269874)

Most notebook computers use spinning drives, and no one whines about moving those around while they're powered up. Just saying...

Re:Uptime (1)

ehrichweiss (706417) | more than 3 years ago | (#35269922)

"Perhaps caused by minor hard drive damage caused by relocating the system while under power?"

He clearly said "first real substantial POWER failure"(emphasis mine).....as in the power failed for longer than the UPS batteries could hold out for.

Re:Uptime (1)

kju (327) | more than 3 years ago | (#35269834)

I once suffered from this illness myself. Thankfully I was able to overcome it.

Persistent myth? (5, Interesting)

6031769 (829845) | more than 3 years ago | (#35269470)

This is not a myth I had heard before. In fact, none of the *nix sysadmins I know would dream of rebooting the box to clear a problem except as a last resort. Where has this come from?

Re:Persistent myth? (4, Informative)

SCHecklerX (229973) | more than 3 years ago | (#35269518)

Windoze admins who are now in charge of linux boxen. I'm now cleaning up after a bunch of them at my new job, *sigh*

- root logins everywhere
- passwords stored in the clear in ldap (WTF??)
- require https over http to devices, yet still have telnet access enabled.
- set up sudo ... to allow everyone to do everything
- iptables rulesets that allow all outbound from all systems. Allow ICMP everywhere, etc.

Re:Persistent myth? (4, Insightful)

arth1 (260657) | more than 3 years ago | (#35269680)

Don't forget 777 and 666 permissions all over the place, and SELinux and iptables disabled.

As for "ALL(ALL) ALL" entries in sudoers, Ubuntu, I hate you for ruining an entire generation of linux users by aping Windows privacy escalations by abusing sudo. Learn to use groups, setfattr and setuid/setgid properly, leave admin commands to administrators, and you won't need sudo.

find /home/* -user 0 -print

If this returns ANY files, you've almost certainly abused sudo and run root commands in the context of a user - a serious security blunder in itself.

Re:Persistent myth? (1)

pugugly (152978) | more than 3 years ago | (#35269896)

I tend to disagree - Ubuntu is designed (in large part) for the end user, and for that class where the admin == main user, sudo is a good idea, enforcing the separation of privileges but allowing reasonable usage.

I've noticed a few article lately about how 'real men' login as root at all times, but I've worked in Unix/Linux since the 90's, and this seems to be a recent phenomena.

Pug

Re:Persistent myth? (0)

Anonymous Coward | more than 3 years ago | (#35269718)

Windoze admins who are now in charge of linux boxen. I'm now cleaning up after a bunch of them at my new job, *sigh*

"Windoze" ... "boxen" ... Your poor employer; somehow I feel like things aren't going to be improving much. Seriously, as a Unix person myself, you're embarrassing me. Makes me feel the need to throw out this disclaimer: Most Unix admins and programmers are actually not nearly as immature as this person!

Re:Persistent myth? (1)

Tordek (863609) | more than 3 years ago | (#35269866)

I'll give you "Windoze", but you're really complaining about boxen [catb.org] ?

Re:Persistent myth? (1)

hedwards (940851) | more than 3 years ago | (#35269536)

People who don't know any better. On Windows systems sometimes the system gets so that there's a bit of corrupted memory that prevents a program from running correctly if the computer isn't completely shut off and let to sit for a few seconds before being turned back on. I was personally skeptical until I saw that work for myself. I still don't really understand why that's the case, but IIRC it had to do with some errors you could run into with Autocad.

I'm not familiar with Unix itself enough to comment, but with both Linux and *BSD you're able to start and restart services without a reboot and the architecture is such that you're much less likely to end up in a situation where you can't perform whatever action you need to in order to clear the error manually. I'm not sure how I would even go about looking up how to do a lot of that with a Windows box.

Re:Persistent myth? (3, Funny)

Dracos (107777) | more than 3 years ago | (#35269784)

I'm not familiar with Unix itself enough to comment, but with both Linux and *BSD...

I'm not sure how to respond to that.

Re:Persistent myth? (5, Insightful)

afabbro (33948) | more than 3 years ago | (#35269548)

This is not a myth I had heard before.

+1. This article should be held up as a perfect example of building a strawman.

"It's a persistent myth that some natural phenomena travel faster than the speed of light, but at least one physicist says it's impossible..."

"It's a persistent myth that calling free() after malloc() is unnecessary, but some software engineers disagree..."

"It's a persistent myth that only the beating of tom-toms restores the sun after an eclipse. But is that really true?"

Re:Persistent myth? (0)

Anonymous Coward | more than 3 years ago | (#35269566)

On my server the BIOS battery is dead for the last one or two years. Rebooting it means BIOS gets fucked up and it can't boot and needs manual intervention. Otherwise it has been quite stable for the last 3 years. Next reboot for that server is probably never going to happen - it will run until it dies and/or is replaced. Most likely destination is the scrap heap. Heck, it only had a handful of reboots in the last 7 years of operation...

Secondly, rebooting a well written OS generally does *nothing*.

Re:Persistent myth? (1)

Captain Centropyge (1245886) | more than 3 years ago | (#35269928)

You realize they make replacement batteries, right...?

Re:Persistent myth? (1)

trybywrench (584843) | more than 3 years ago | (#35269570)

I came here to say the same thing, I've never thought to reboot a unix box to fix a problem. In fact, in the face of a serious operating system issue I want to do everything I can do to avoid the temporary purgatory that is a reboot.

Re:Persistent myth? (0)

starfishsystems (834319) | more than 3 years ago | (#35269616)

The myth - as with so many others, it seems to me - arises from relatively junior people bringing their unquestioned practices and prejudices in from the Microsoft world.

I see it all the time. Often it's a "reach for the GUI" reflex toward making something work. A Unix veteran would look for a config file, would save the original version of the file before experimentally investigating, would then restore the original file if the investigation came up empty. A Windows veteran would simply click on things in the application to see if it made any difference, and after whatever success or failure emerges from the experiment, would then walk away, having tracelessly changed that application from its original state.

Re:Persistent myth? (2)

arth1 (260657) | more than 3 years ago | (#35269732)

Unfortunately, the GUI-befuddled people cause problems even on distro levels. Perfectly serviceable text configuration files give way to humongous xml files, or even databases without a plain text front end.
This makes administration a real pain, and adds nothing except catering to the point-and-drool generation.

Re:Persistent myth? (1)

idontgno (624372) | more than 3 years ago | (#35269850)

even databases without a plain text front end.

AIX, I'm looking at you. I haven't had to admin AIX since 5.3 days, but while our team was learning AIX (coming from Solaris) we would modify system configuration by editing the /etc config files like God intended. And they'd keep reverting to pre-edit config if we had to reboot. Which happened a lot because we had some flaky hardware.

It took the local IBM Customer Engineer weenie telling us about "SMIT" and the AIX ODB to realize we weren't editing the real system config... just the text file created from the ODB when AIX was booted.

Damn SMIT. It's blasphemy to alter the system's config with anything besides "vi".

Re:Persistent myth? (0)

Anonymous Coward | more than 3 years ago | (#35269802)

Most unix systems I have been associated with even had /etc/ checked into a revision control system to track who changed what and when.

Re:Persistent myth? (1)

Zencyde (850968) | more than 3 years ago | (#35269918)

I'm not sure if this is a Windows thing. I've seen plenty of people go to flip a switch in attempt to turn on the lights, fail, and go to the next switch. The issue is that they don't toggle things back to their initial state. It's simply poor systems practice. In this case, you're liable to find the lights on in another part of the house and waste some electricity. In some more dire scenarios, say when a gas system is turned on, it could result in death. It's just dumb to do and people sometimes don't think through situations as thoroughly as they should. But I don't think you could blame this action on Microsoft or Windows.

Re:Persistent myth? (1)

Ephemeriis (315124) | more than 3 years ago | (#35269734)

This is not a myth I had heard before. In fact, none of the *nix sysadmins I know would dream of rebooting the box to clear a problem except as a last resort. Where has this come from?

The idea that you ought to just reboot to fix things comes from the Windows world.

I've got several Windows servers that absolutely have to be rebooted nightly to keep them running happily. This isn't because I'm some crappy admin or anything like that... Rather, the software running on them just isn't stable. It's actually the vendor's suggestion that these servers be rebooted nightly. Not that particular services need to be restarted - but that the entire box should be rebooted.

I'm not entirely sure what the problem is... Corrupt data in RAM? Memory leaks? Files not closing right? Whatever. They need to be rebooted, or they become cranky.

I'm OK with that. It's what the vendor recommends. It's what we do for those boxes. It generally works.

But we've also got a few Linux boxes... And we do not reboot those when things go wrong. We've got Linux boxes that've been up and running for years. If something goes wrong on one of our Linux boxes, it's probably because somebody screwed up a config, or an update went awry, or a bit of hardware is failing.

When something breaks on a Linux box, and we call support, the answer has never been "reboot it". They always want to see what is going on in the system as-is, and they've always been able to fix the issue.

I haven't personally been bit by rebooting a Linux box and making everything worse... But I've seen enough other people get bit, and I've read enough horror stories on-line.

Re:Persistent myth? (1)

ByOhTek (1181381) | more than 3 years ago | (#35269740)

I wouldn't read too much into it. From what I can tell the author is a idiot. He knows some stuff, probably to an impressive extent even, but he's too arrogant and one-size-fits-all.

I don't know of any Unix admin who reboots early-on. Even the few I know (myself included) who came over from windows (or still admin it).

Re:Persistent myth? (1)

vux984 (928602) | more than 3 years ago | (#35269836)

In fact, none of the *nix sysadmins I know would dream of rebooting the box to clear a problem except as a last resort. Where has this come from?

I take a somewhat contrary stance, rebooting is like testing the backup recovery procedure, or the backup power system... you have to do it to know that you can do it.

If you are a afraid to reboot your server when its working fine because you don't know it will come back up, then you ALREADY HAVE A PROBLEM.

That said, I fully understand the desire not to reboot especially if it may take down a production server and cause downtime... but if uptime is that critical you should already have a backup system ready to go.

There are absolutely business situations and scenarios where a 'reboot as a last resort' is the right approach. But for a lot of people... probably the majority of them, rebooting from time to time especially in controlled circumstances makes some sense.

If you've got dodgy hardware that might fail on a reboot olr some other boot sequence problem... its generally better to find out about it under controlled circumstances, rather than in the midst of some other data corruption/service won't start/catastrophe... the last thing you need while fighting a server problem is to have to resort to a reboot and find out your drive controller is toast too... or that some twit mangled /boot.

Re:Persistent myth? (0)

Anonymous Coward | more than 3 years ago | (#35269844)

Where has this come from?

I believe it may come from buggy beta firmware for unsupported bleeding edge hardware that you are presently working with the engineers on. You know! Those times when even SysRq laughs at any command you throw at it!

These games are not for the weak of spirit...

Re:Persistent myth? (1)

amorsen (7485) | more than 3 years ago | (#35269884)

IIRC it was reasonably common advice in Unix books from the 80's (I can't provide citations because I borrowed from the library). Reboot at least weekly with a full fsck. Supposedly file systems weren't as stable back then.

Since the myth was well and truly dead by the time I managed to touch a Unix box for the first time (1993), it seems a bit late to try to kill it.

Uh.. no (5, Informative)

Anrego (830717) | more than 3 years ago | (#35269474)

I for one believe in frequent-ish reboots.

I agree it shouldn't be relied upon as a troubleshooting step (you need to know what broke, why, and why it won't happen again). That said, if you go years without rebooting a machine... there is a good chance that if you ever do (to replace hardware for instance) it won't come back up without issue. Verifying that the system still boots correctly is imo a good idea.

Also, all that fancy high availability failover stuff... it's good to verify that it's still working as well.

The "my servers been up 3 years" e-pene days are gone folks.

Re:Uh.. no (2)

Anonymous Coward | more than 3 years ago | (#35269528)

Disagree.

Rebooting is bad. It booted the first time, Why would it not boot the second?

If you don't have proper controls than you should not have anyone touching the box.

Re:Uh.. no (4, Insightful)

Anrego (830717) | more than 3 years ago | (#35269662)

Maybe true if the box is set up then never touched. If anything new has been installed on it.. or updated.. I think it's a good idea to verify that it still boots while the change is still fresh in your head. Yes you have changelogs (or should), but all the time spent reading various documentation and experimenting on your proto box (if you have one) is long gone. There's lots of stuff you can install and start using, but could easily not come up properly on boot.

And why are reboots bad. If downtime is that big a deal, you should have a redundant setup. If you have a redundant setup, rebooting should be no issue. I've seen a very common trend where people get some "out of the box" redundancy solution running... then check of "redundancy" on the "list of shit we need" and forget about it. Actually verifying from time to time that your system can handle the loss of a box without issue is important (in my view).

Re:Uh.. no (1)

OzPeter (195038) | more than 3 years ago | (#35269750)

Disagree.

Rebooting is bad. It booted the first time, Why would it not boot the second?

If you don't have proper controls than you should not have anyone touching the box.

Even with controls you are assuming that anybody who touched the box between boots has performed their work flawlessly and/or the actions that they performed will do as expected. Yes you can replicate an environment and practice changing things and rebooting, but unless you have 100% replicated things then all you are testing is your assumption that the replication was complete. So it still comes down to an assumption that can only be tested by a physical reboot.

Re:Uh.. no (3, Insightful)

OzPeter (195038) | more than 3 years ago | (#35269798)

(wishing that /. would allow edits)

To add to my previous comment. The general consensus of disaster recovery best practice is that you do not test a backup strategy, you test a restore strategy. Rebooting a server is testing a system restore process.

Re:Uh.. no (1)

Isaac-1 (233099) | more than 3 years ago | (#35269924)

BIOS battery

Re:Uh.. no (2)

JonySuede (1908576) | more than 3 years ago | (#35269540)

. That said, if you go years without rebooting a machine... there is a good chance that if you ever do (to replace hardware for instance) it won't come back up without issue.

we reboot our unix server once a month exactly for this reason, we have been bitten once so we learned this the hard way.

Re:Uh.. no (1)

hedwards (940851) | more than 3 years ago | (#35269556)

Well, that's the thing, with cloud computing you generally don't have to waste the resources on a server that's up all the time and enough to cover the full load, depending upon the service or set up it's definitely possible to set yourself up to have additional capacity come online as needed, and for the most part those other servers are pretty much identical.

Re:Uh.. no (1)

Stenchwarrior (1335051) | more than 3 years ago | (#35269608)

I agree with you. I used to build it into the cron to reboot every Sunday at 11:00p. The medical practice management software that ran on there tended to build up temp files and not remove them automatically...this was a fault of the application. My startup script would remove them and keep the hard drive (a whopping 4GB) from filling up. Since the services that needed to run were appropriately added to the same script there was never an issue of them not starting which is one of the main reasons you wouldn't want to reboot.

Re:Uh.. no (2)

GreyLurk (35139) | more than 3 years ago | (#35269692)

Why reboot? Why not just kill off the process, clear the temp files, and restart the process?

Re:Uh.. no (1)

Stenchwarrior (1335051) | more than 3 years ago | (#35269828)

We also had serial terminals attached through DigiBoard and Stallion Boards, both were notorious for flaking out unless rebooted regularly, as well. Maybe this was a unique situation but every *nix machine I built after I left the medical field received the same treatment.

Re:Uh.. no (1)

Anrego (830717) | more than 3 years ago | (#35269914)

Oh man.. no word of a lie.. I actually _winced_ when I read DigiBoard!

So.. much.. pain...

Re:Uh.. no (2)

DaMattster (977781) | more than 3 years ago | (#35269614)

I for one believe in frequent-ish reboots.

I agree it shouldn't be relied upon as a troubleshooting step (you need to know what broke, why, and why it won't happen again). That said, if you go years without rebooting a machine... there is a good chance that if you ever do (to replace hardware for instance) it won't come back up without issue. Verifying that the system still boots correctly is imo a good idea.

Also, all that fancy high availability failover stuff... it's good to verify that it's still working as well.

The "my servers been up 3 years" e-pene days are gone folks.

Well, you make a point but, shouldn't a server be replaced when it gets old enough anyway? Wouldn't it be nice to have a server up for 3 years of reliability? At this point, who really cares if a reboot would cause a failure? You have backups, plan to replace the aging hardware. It doesn't pay to be miserly with server hardware, especially because its quality has gone on a downward trend as demand for cheaper pricing goes up. And how does verifying a system boot really ensure the the server is working correctly? Too often, I have seen a server boot without problem but other latent problems arise - i.e. failing network cards and failing cooling fans.

Re:Uh.. no (2)

Gaygirlie (1657131) | more than 3 years ago | (#35269624)

I do actually recommend to RTFA. He quite clearly says you shouldn't need to reboot the whole system unless you're patching kernel itself, more-or-less everything else can be just restarted or reloaded, including kernel modules, and he even backs up his argument against rash reboots with some valid logic. (Though it's something any system administrator worth anything should already know without a random person on teh internets telling him! Really, shame on you if you just reboot every time you see a problem.) He doesn't say to never reboot, either, even though the submission does make it sound like it.

Re:Uh.. no (1)

Anrego (830717) | more than 3 years ago | (#35269714)

I recommend you actually read my post ;p

I clearly said.. right there in the second paragraph.. that I agree with him on not using reboot as a troubleshooting mechanism.

Re:Uh.. no (0)

Anonymous Coward | more than 3 years ago | (#35269898)

...and he even backs up his argument against rash reboots with some valid logic.

Of course that logic runs down to "Unlike Windows, with Unix you can never tell if somebody hasn't screwed up vital files needed to boot the machine."

Re:Uh.. no (2)

jcoy42 (412359) | more than 3 years ago | (#35269666)

Well, that's your opinion.

The boot up process starts a lot of extra electrical noise in the box by spinning up all the fans, HDs, probing things, etc. That's usually when something breaks. What I have seen is that boxes which get rebooted frequently tend to burn out faster. I have had 2 otherwise equivalent machines, purchased at the same time, one used for dev and one for production, and the dev machine burned out 2 years before we retired the production machine (burned out means too many fan/disk/CPU failures to bother with). The biggest difference? The dev machine was updated and rebooted far more frequently. The production machine we took care to only muck with when we had to, and when possible, we fixed it without a reboot.

Now it could be that the frequent updates on the dev machine is what caused it to burn out faster (more random use), and sure, it could have been a fluke, but look at it this way- when does a light bulb burn out? When you turn it on or when it's left on?

Re:Uh.. no (1)

dAzED1 (33635) | more than 3 years ago | (#35269698)

err...or, you could figure out why there was a problem. Rebooting a system removes a lot of forensic data, and you should know long before it's dying that there is a problem.
There's nothing a reboot "fixes."

Re:Uh.. no (1)

dch24 (904899) | more than 3 years ago | (#35269724)

Everything old is new again. "I can reboot an instance, it's cloud-based with HA!" That means you are not the target market for this article.

Who do you think keeps your magical cloud running with five-9's of uptime? You can't seriously think the VM host will run better after a reboot. Who do you think manages the HA load balancer? (Hint: it is managed, just like everything else.) What if they had to reboot it?

"I need to reboot every month/week/solar cycle because otherwise I have no disaster recovery!" I suppose if you really are worried you have a test deployment and a production deployment, and you're very careful to use tools to guarantee they stay perfectly synced. So... you reboot the test machine to test, right?

"I can't afford a test machine, and I can't control the service configuration, so I can't guarantee it will boot up!" You're in a world of hurt. While you're at it, why don't you run the occasional rm -rf / (then hit Ctrl-C) just so you can enjoy the pleasure of a reboot?

Rebooting destroys information (1)

mangu (126918) | more than 3 years ago | (#35269726)

I never reboot unless the system hangs up completely. In recent years I had to reboot once, when the air conditioning failed and a server had a bad memory alarm.

By keeping reboot as an extreme measure, I know when something truly bad happened. If I reboot without reason, I lose that information.

Re:Uh.. no (1)

Yaur (1069446) | more than 3 years ago | (#35269744)

Totally agree. If things fail you want it to happen when you have control of the situation, not whenever some retard decides to pull the wrong cable.

Re:Uh.. no (1)

digitalhermit (113459) | more than 3 years ago | (#35269764)

Interesting, but not true.

"Frequent-ish" reboots can work in non-enterprise environments where you have downtime windows. In international organizations that run 24-7, this is rarely the case without lots of coordination. Now, if you design a system with high availability and redundancy, you can very well take down one node in a cluster for maintenance... Or if you virtualization you could migrate the VM to another host transparently. Alas, in many enterprises there are one-off systems that exist for a particular purpose and that have gone from a skunkworks project in a business unit to a semi-critical app.

So you end up with an app sitting on a non-redundant physical machine that cannot get a 1 hour maintenance window without extensive planning. And by this I mean alerting Madrid, London, Dubai, Cancun, Alaska, etc.. and trying to schedule hundreds of users to deal with the outage.

The argument is that if that system is so "mission critical" then it needs to have redundancy. Hah, welcome to corporate politics where unless a system is involved with revenue generation or payroll then that project essentially has no budget.

For this reason I love that Unix systems can go years without a reboot. I'm one of three admins that manage close to 400 OS instance... close to 50 applications. Dozens of databases... Dozens of physical machines. Rebooting just to see if the system comes back up is a pipe dream.

Re:Uh.. no (1)

tbuskey (135499) | more than 3 years ago | (#35269770)

Reboots to fix problems should never be done.

Reboots as a matter of policy isn't a bad idea.

If your system reboots periodically, you force network disconnections, memory cleanup, etc.

Users that logged on months ago are no longer tying up resources. Maybe they don't need it but forgot to logout. Or their client died so there's a zombie on the server.

Re:Uh.. no (1)

m509272 (1286764) | more than 3 years ago | (#35269800)

Agreed. Servers should be rebooted periodically. Once every 3 months is a good number. Almost every time we've had a server up for a year or two there were problems bringing it back up up when it went down unexpectedly or for some sort of hardware maintenance. Of course, many of the people that were the sys admins had gone elsewhere and hours went by before they finally figured out some startup script was copied and altered just to get it to come up the last time. Better off scheduling a shutdown and restart when it's convenient.

Re:Uh.. no (1)

RollingThunder (88952) | more than 3 years ago | (#35269852)

Agreed. A reboot isn't a panacea for troubleshooting, but they still should be performed. I view them as akin to drills in the military - they drill and practice so that flaws in the process can be identified early on.

slashdot: *world link farmers (5, Insightful)

Anonymous Coward | more than 3 years ago | (#35269476)

i'm really tired of this semi-technical stuff on slashdot that seems aimed at semi-competent manager-types.

Counter point -- pre-emptive reboot (5, Insightful)

Syncerus (213609) | more than 3 years ago | (#35269486)

One minor point of disagreement. I'm a fan of the pre-emptive reboot at specific intervals, whether the interval be 30 days, 60 days, or 90 days is up to you. In the past, I've found the pre-emptive reboot will trigger hidden system problems, but at a time when you're actually ready for them, rather than at a time when they happen spontaneously ( 2:30 in the morning ).

Re:Counter point -- pre-emptive reboot (2)

Wovel (964431) | more than 3 years ago | (#35269552)

Interestingly, all his arguments against rebooting would bolster your argument for periodic planned reboots. One of his points was that someone may have screwed up the system, it would be better to find that in a controlled environment.

I will stay away from periodic reboots and remain firmly entrenched in the land of if it ain't broke, don't fix it.

Re:Counter point -- pre-emptive reboot (1)

arth1 (260657) | more than 3 years ago | (#35269920)

What's the purpose of this scheduled reboot, though?
What, exactly, are you trying to pre-empt?

If it doesn't serve a purpose, or the purpose can be solved without causing downtime, just don't do it.

Of course you reboot, in controlled settings (4, Insightful)

pipatron (966506) | more than 3 years ago | (#35269498)

FTFA:

Some argued that other risks arise if you don't reboot, such as the possibility certain critical services aren't set to start at boot, which can cause problems. This is true, but it shouldn't be an issue if you're a good admin. Forgetting to set service startup parameters is a rookie mistake.

This is retarded. A good admin will test so that everything works, before it will get a chance to actually break. Anyone can fuck up, forget something, whatever. Doesn't matter how experienced you are. Murphys law. The only way to test if it will come up correctly during a non-planned downtime is to actually reboot while you have everything fresh in memory and while you're still around and can fix it. Rebooting in that case is not a bad thing, it's a responsible thing to do.

Re:Of course you reboot, in controlled settings (1)

Syncerus (213609) | more than 3 years ago | (#35269512)

I agree with your comments completely.

Re:Of course you reboot, in controlled settings (1)

Darth_brooks (180756) | more than 3 years ago | (#35269882)

Word.

Reboots are a nice test of "Oh shit" situations, such as complete power failures. There are a lot of admins out there who don't have the luxury of giant battery backups that will cover everything until the automatic generators kick in. There's something that's just a tiny bit comforting about watching the machine from push to post to prompt. You know if the CMOS or RAID controller batteries are bitching about needing to b replaced (even if SNMP *might* be able to tell you this). You know that there's an NFS mount that takes a ridiculous amount of time to complete, and there are precious few ways of verifying that a BIOS upgrade went though successfully without watching.

Uptime numbers are just penis wagging.

What a load of BS (4, Insightful)

kju (327) | more than 3 years ago | (#35269508)

I RTFA (shame on me) and it is in my opinion absolutely stupid.

There is actually only one real reason given and that is that if you reboot after some services ceased working, you might end up with a unbootable machine.

In my opinion this outcome is absolutely great. Ok, maybe no great, but it is important and rightful. It forces you to fix the problem properly instead of ignoring the known problems and missing yet unknown problems which might bite you in the .... shortly after.

Also: When services start being flakey on my system, i usually want to run an fsck. In 16 years linux/unix administrations I found quite a time that the FS was corrupted without an apparent reason and with beeing unnoticed before. So a fsck is usually a good thing to run when strange things happen and to be able to run it, i nearly always need to reboot.

I can't grasp what kind of thinking it must be to continue running a server where some services fail or behave strangely. You could end up with more damage than cause by a outage when the reboot does not go through. You just might want to do the reboot at off-peak hours.

Re:What a load of BS (-1)

Anonymous Coward | more than 3 years ago | (#35269794)

In 16 years linux/unix administrations I found quite a time that the FS was corrupted without an apparent reason and with beeing unnoticed before. So a fsck is usually a good thing to run when strange things happen and to be able to run it, i nearly always need to reboot.

What kind of shitty filesystem are you using then? ext? Yeah, it sucks. In fact I have had fsck itself completely destroy an ext filesystem on more than one occasion.

Use a good filesystem like XFS. I do run fsck on them occasionally (maybe once every year or two) and never had any problem at all.

*NIX 101 (2)

Zero1za (325740) | more than 3 years ago | (#35269510)

This is like *NIX 101.

But then, try changing the locale on a running system...

Re:*NIX 101 (1)

corychristison (951993) | more than 3 years ago | (#35269856)

But then, try changing the locale on a running system...

This depends on your linux distro... on gentoo:
# init 3 (assuming you're not ssh'd in)
  - edit /etc/env.d/02locale
# env-update && source /etc/profile
# init 5

and you're good to go. :-)

Reboots (1)

DaMattster (977781) | more than 3 years ago | (#35269516)

By and large there is really no need to reboot a UNIX machine unless you are making a change to the kernel, i.e. an upgrade or a recompile with an added feature. Other than that, the author is correct. I have machines with uptimes of two years. It would have been more had I not had to power the machine down for a physical move.

Ummm, that's a crap article (4, Insightful)

Sycraft-fu (314770) | more than 3 years ago | (#35269520)

More or less it is "You shouldn't reboot UNIX servers because UNIX admins are tough guys, and we'd rather spend days looking for a solution than ruin our precious uptime!"

That is NOT a reason not to reboot a UNIX server. In fact it sounds like if you've a properly designed environment with redundant servers for things, a reboot might be just the thing. Who cares about uptime? You don't win awards for having big uptime numbers, it is all about your systems working well and providing what they need and not blowing up in a crisis.

Now, there well may be technical reasons why a reboot is a bad idea, but this article doesn't present any. If you want to claim "You shouldn't reboot," then you need to present technical reasons why not. Just having more uptime or being somehow "better" than Windows admins is not a reason, it is silly posturing.

Re:Ummm, that's a crap article (2)

pz (113803) | more than 3 years ago | (#35269754)

Please point out exactly where in the article the issue of uptime is raised. I fail to see it. Many others have also suggested that long uptimes ("e-pene" as one poster put it) is the reason for avoiding reboots. There has been no such suggestion that I could find. I authored a post to the previous thread about the origins of the Unix attitude against reboots that was highly rated and nowhere in that post, or in the follow-on replies, was uptime ever considered an issue.

The issue -- the only issue -- is interrupting service to many users. Modern machines that serve tens to thousands of users cannot be brought down willy-nilly without incurring the wrath of those users, and rightfully so. Bringing down a system because the sysadmin was too lazy to understand what the problem was is inexcusable. The sysadmin's job is to keep the service running. When there's one user, such as in QA, or a single-user desktop, reboots can happen at will. When there are many many users, such as in a production box, file server, or similar, reboots should never be used as a problem-solving tool.

So let go of the old, dead horse about uptime bragging rights. A correct, properly maintained Unix system does not need to be rebooted except under highly unusual circumstances. The reason that Windows boxes are treated differently is because Windows is a comparatively new OS that started out life as a one-seat system whereas, paraphrasing what I wrote in an earlier post, Unix and its intellectual antecedents had been running multi-seat systems for nigh on three decades before Windows started doing that. It's fact, not being better or worse, and the Unix and Windows cultures have grown around those two views.

Re:Ummm, that's a crap article (1)

GreyLurk (35139) | more than 3 years ago | (#35269816)

I don't think the article was expounding on Unix "Manliness" and uptime metrics... It mostly just highlighted the mistake that a lot of junior admins make (Both Windows and Unix) that it doesn't matter if you understand why the problem is happening, just mashing the power button until it goes away is the best rout forward.

Rather than presenting technical reasons why you shouldn't reboot, It's actually probably better to ask for technical reasons why you *should* reboot. Rebooting a server to try and fix a problem is just one step above "percussive maintenance" in the hierarchy of problem solving.

Now, I will actually suggest one additional reason to reboot a Unix Server not mentioned in the article, and that is installing a new service that's intended to be included in the boot up sequence. However, that suggestion is just a Quality Assurance measure: Make sure that the service powers up as it's supposed to, in case some unexpected downtime does happen, and ensure that the service comes back up as expected. Otherwise, Hardware and Kernel upgrades should be the only reason for a reboot.

Re:Ummm, that's a crap article (2)

Jose (15075) | more than 3 years ago | (#35269832)

Now, there well may be technical reasons why a reboot is a bad idea, but this article doesn't present any.

hrm, the article states: ...If you shrug and reboot the box after looking around for a few minutes, you may have missed the fact that a junior admin inadvertently deleted /boot and some portions of /etc and /usr/lib64 due to a runaway script they were writing. That's what was causing the segfaults and the wonky behavior. But since you rebooted the server without digging into the problem, you've made it much worse, and you'll soon boot a rescue image -- with all kinds of ponderous work awaiting you -- while a production server is down.

and:
In many cases, it's extremely important not to reboot, because the key to fixing the problem is present on the system before the reboot, but will not be immediately available after. The problem will recur, and if the only known solution is to reboot, then the problem will never be fixed unless or until someone decides not to reboot and instead tries to find the root of the problem.

and while I disagree with this one slightly..as the problem may still be present after a reboot..I defintely agree with what the author is saying...find the actual root of the problem, and fix it..don't just cross your fingers and hope a reboot will fix the problem.

Also the author never mentions preserving uptime of the server as a goal..he does mention a few times patching in place..which will mean killing services, effectively making that particular server unavailable.

Re:Ummm, that's a crap article (1)

Gaygirlie (1657131) | more than 3 years ago | (#35269842)

If you want to claim "You shouldn't reboot," then you need to present technical reasons why not.

1) You should first find out what is broken and why. Rebooting without doing that only means that it may happen again and perhaps with catastrophic results.
2) If you have found out the reason then you can fix it even without rebooting in almost all cases.
3) Depending on the issue you might render your system unable to boot if you restart without checking first and then the system will be offline even longer than necessary.

Tbh, all these sound very reasonable reasons to my ear. As I said in another comment they are all things that any admin worth his/her salt should know about already, but it still doesn't make them unreasonable.

HP-UX says... (2)

RedK (112790) | more than 3 years ago | (#35269526)

You lie.

Seriously. I don't know what HP is doing, but NFS hangs/stuck processes that you can't kill -9 your way out of is just wrong.

Re:HP-UX says... (2)

inflex (123318) | more than 3 years ago | (#35269626)

NFS is designed to be like that, block/hang until connection is restored... though not sure about the resilliance to the sig-9 though. You do now have the option on some NFS systems to have a soft-block.

Re:HP-UX says... (2)

RedK (112790) | more than 3 years ago | (#35269694)

I've had systems with HP-UX that could rpcinfo/showmount on the NFS server and yet still had hanged filesystems. Soft, hard, whatever mount option, it's random. Then when you try to shut down the NFS subsystem, the rpc processes get stuck, you try to kill -9 and they simply don't die. umount -f doesn't work. Nothing works.

You really have to have experience on HP-UX to understand the pain... And if only I was talking about the old 11iv1 instead of the brand spanking new 11iv3 with ONCplus up to date.

Re:HP-UX says... (4, Informative)

sribe (304414) | more than 3 years ago | (#35269768)

Seriously. I don't know what HP is doing, but NFS hangs/stuck processes that you can't kill -9 your way out of is just wrong.

Kind of a well-known, if very old, problem. From Use of NFS Considered Harmful [time-travellers.org] :

k. Unkillable Processes

When an NFS server is unavailable, the client will typically not return an error to the process attempting to use it. Rather the client will retry the operation. At some point, it will eventually give up and return an error to the process.
In Unix there are two kinds of devices, slow and fast. The semantics of I/O operations vary depending on the type of device. For example, a read on a fast device will always fill a buffer, whereas a read on a slow device will return any data ready, even if the buffer is not filled. Disks (even floppy disks or CD-ROM's) are considered fast devices.

The Unix kernel typically does not allow fast I/O operations to be interrupted. The idea is to avoid the overhead of putting a process into a suspended state until data is available, because the data is always either available or not. For disk reads, this is not a problem, because a delay of even hundreds of milliseconds waiting for I/O to be interrupted is not often harmful to system operation.

NFS mounts, since they are intended to mimic disks, are also considered fast devices. However, in the event of a server failure, an NFS disk can take minutes to eventually return success or failure to the application. A program using data on an NFS mount, however, can remain in an uninterruptable state until a final timeout occurs.

Workaround: Don't panic when a process will not terminate from repeated kill -9 commands. If ps reports the process is in state D, there is a good chance that it is waiting on an NFS mount. Wait 10 minutes, and if the process has still not terminated, then panic.

Virtualization to the rescue (4, Interesting)

Anonymous Showered (1443719) | more than 3 years ago | (#35269538)

I run web servers for a few dozen clients, and rebooting a remote machine was always scary. There was the possibility that something might not boot up during startup (e.g. SSHd) and I would be locked out. I would then have to travel to my data center downtown (about 30 minutes away) and troubleshoot the problem. Since I don't have 24/7 access to the DC (I don't have enough business with the DC to warrant an owned security pass...) I have to wait until they open to the general clientèle in the morning.

With ESXi, however, I'm not that scared anymore. If something does go wrong, I have a console to the VM through vCenter client (the application that manages virtual machines on the server). It's happened once where a significant upgrade of FreeBSD 7.2 to 8.1 was problematic. Coincidentally, it was because I didn't upgrade the VMware tools (open-vmware-tools port). Nonetheless, I managed to fix the problem through vCenter.

This is why I love virtualization in general. It's making managing servers easier for me.

Re:Virtualization to the rescue (1)

inflex (123318) | more than 3 years ago | (#35269642)

It's why a "good" server has a lights-out system in it that lets you gain access to the machine as it boots as if you were there with a keyboard/console.

Of course, yes, the VM-route is nice, I do that too now ( so long as you don't mess up the host :D ).

Re:Virtualization to the rescue (1)

Spad (470073) | more than 3 years ago | (#35269664)

Not to mention the joy of snapshots.

I read TFA (2, Interesting)

pak9rabid (1011935) | more than 3 years ago | (#35269542)

What a load of horse shit.

Library uprades are the tricky part (2, Informative)

Anonymous Coward | more than 3 years ago | (#35269572)

Often system upgrades (eg. security fixes) include new versions of libraries and such. It's impossible for the package manager to know which processes are using those libraries so it can't automatically restart everything. Consider if you have custom processes running, the package manager wouldn't even know about them.

Therefore you have to do it manually, but then you have the same problem. It's damn hard to know which processes are using the libraries that were upgraded. Really, really hard if it's a big server running hundreds or thousands of processes. Often it's easier just to reboot so you make sure everything is running the current version of all the libraries. If you don't then you can't be sure that all the security fixes are actually running on the system since it will be using the old cached versions of the libraries in RAM.

Not better than the others (2, Interesting)

cpct0 (558171) | more than 3 years ago | (#35269574)

Quotes from stupid people:
You should never reboot a Mac, it's not like Windows.
You should never reboot Unix/Lunux, it's not like Windows.

Well, you shouldn't reboot Windows either. You reboot it when it goes sour. Our Windows servers seldom go sour, so we don't reboot them. Same for Mac or *nix.

Problem is when it starts to cause problems. Like our /var/spool partition deciding it has better things to do than exist... or the ever so important NFS or iSCSI mount that decides to Go West, and gives us the ??? ls we all dread ... with umounting impossible, so remounting impossible, and all these stale files and stuff. You either tweak these things for hours cleaning up all processes, or you reboot.

In fact, being a good sysadmin, all my servers are MEANT to be rebooted if something goes sour. One SVN project goes sour? check if it's not the repository itself that got problems, or if the system needs to save something to safely exist ... and if not, reboot the server. Everything magically restarts itself, does its little sanity check, and a quick look at a remote syslog to make certain everything is all right. 2 minutes lost for everyone, not 3 hours of trying to clean up mess left by some stray process somewhere or trying to kill the rogue 100 compression and rsync jobs that got started eating up all RAM, CPU and network.

Since all our servers are single processes and are either VMs or single machines, it's a breeze to do this. iSCSI will diligently wait before the machine is back up before trying to reconnect. NFS will keep its locked files up, and will reconnect to them. No, seriously, everything simply reconnects!

Of course, the idea is to minimize these occurences, so we learn from it, and we try to repair what could've caused this problem in the first place. And there's a place to do this in a server crash postmortem. But no need to make users wait while we try to figure out wth.

Oh, this fool again (2)

Enry (630) | more than 3 years ago | (#35269588)

While it's true servers don't need to be restarted as often as Windows counterparts, there are valid reasons for restarting a server:

- new kernel, new features
- new kernel, new security patches (yes, these are distinct reasons)
- ensure all services restart in the event of a real failure
- we have cases where memory fills and the system starts thrashing. It may cure itself eventually, but you can't get in via SSH or console (and no, the OOM killer doesn't kick in).

I think item #3 is important. If you have a crusty system that's been in place for a while and it reboots for some reason, you now have to spend time to make sure everything started, figure out what didn't start, and why. This doesn't mean you need to restart once a week, but every 6-12 months is certainly reasonable.

This is a myth? (4, Interesting)

pclminion (145572) | more than 3 years ago | (#35269590)

I've heard a lot of myths. I've never heard a myth stating "You need to reboot a UNIX system to fix problems." If anything I've heard the opposite myth. Who promulgates this shit?

I do remember ONE time a UNIX system needed a reboot. We (developer team) were managing our own cluster of build machines. The head System God was out of town for two weeks. We were having problems with a build host, and tried everything. Day after day. Finally, on the last day before System God was due to return, it occurred to me that the one thing we hadn't tried was to reboot the machine. The reboot fixed the problem, whatever it was.

I felt stupid. One, for not figuring out the problem in a way that could avoid a reboot. Two, for not recording enough information to determine root cause in a post-mortem analysis. Three, for configuring a system in such a way that a reboot might be required in order to fix a problem.

To this day I believe that reboot was unnecessary, although at the time it was the fastest way to resolving the immediate blocking issue.

I would have docked you a weeks pay... (1, Interesting)

Anonymous Coward | more than 3 years ago | (#35269792)

...for wasting company time on non-solutions instead of doing a reboot that took 1 minute.

Unnecessary Windows reference (0)

Anonymous Coward | more than 3 years ago | (#35269592)

I just don't get it why was it necessary to make reference to Windows here? Most of the legit reasons not to reboot Unix box he listed apply to Windows and it's analogous subsystems too.

Sometimes ... (4, Funny)

PPH (736903) | more than 3 years ago | (#35269596)

... the crap I read on Slashdot is so unbelievable, I have to reboot my laptop in the hopes that it will go away.

Not just *nix (2)

Spad (470073) | more than 3 years ago | (#35269604)

The same argument can be applied to Windows servers; sometimes rebooting will only make things worse, or at least no make things any better. Unfortunately, these days the trusty reboot is often the first option instead of last resort; at the very least some basic troubleshooting needs to be done to identify potential causes before you likely erase half the evidence.

I suffer from a desktop variant of this issue at work, whereby re-imaging has become the "troubleshooting" tool of choice, to the point that all thought has now left the support process so that I've witnessed an engineer re-image a PC 3 times (at 30+ minutes each time) before someone else identified that the issue was being caused by a BIOS setting and that re-imaging was a complete waste of time.

Let's face it, if your admin/support staff are lazy and/or stupid, then it doesn't matter which approach they take because they're not going to fix the problem anyway.

All too true (-1)

gweihir (88907) | more than 3 years ago | (#35269610)

Lets face it, the Windows people have to use this crutch only because the Windows "kernel" and MS "server" software is pretty buggy. That is generally not true for UNIX and in UNIX system, you are far more likely to have a configuration problem than a problem with the software itself. For these, reboots do indeed not help. For a kernel or server that has messed up, say, its memory management, rebooting does not fix the problem permanently, but it helps to keep the system running for some additional time. And you cannot really fix the problem for the MS cultural isolation area, first, because it would be hard to do and second, since you have no source code, it is actually impossible.

That rebooting is actually a valid strategy in the MS area, just shows how many, many decades they are behind underneath the shiny surface.

Re:All too true (1)

Shados (741919) | more than 3 years ago | (#35269908)

Actually it isn't. There's virtually always a reason why something screws up, regardless of if you're in Windows or Unix, and you won't need to reboot. The only exception is for patches, where Windows requires it a bit too often for comfort.

I've worked for a few companies where rebooting a Windows Server for anything except patches/maintenance would require a full root cause analysis, and it pretty much never happened. We virtually always were able to find what was going wrong and fix it without rebooting. This isn't 1998 anymore: Windows Server absolutely can stay up for long periods of time, and there's always ways to prevent reboots.

Paul Venezia (0)

Anonymous Coward | more than 3 years ago | (#35269628)

Paul Venezia is an uneducated piece of garbage.

Broken Logic (1)

Grindalf (1089511) | more than 3 years ago | (#35269658)

Is it just me, or is the logic behind this article broken?

New rule for Slashdot (5, Insightful)

aztektum (170569) | more than 3 years ago | (#35269672)

/. editors: I propose a new rule. Submissions with links to PCWorld, InfoWorld, PCMagazine, Computerworld, CNet, or any other technology periodical you'd see in the check out line of a Walgreens be immediately deleted with prejudice.

They're the Oprah Magazine of the tech world. They exist to sell ads by writing articles with grabby headlines and little substance.

True Scotsman fallacy (1)

O('_')O_Bush (1162487) | more than 3 years ago | (#35269686)

Did anyone else notice the reek of the True Scotsman fallacy? If you agree with him, he brags about it. If you don't, he cites the reason to be because you aren't a TRUE pro-unix admin.

Sorta grates on my nerves a bit.

so the people who supposedly spread this myth... (1)

dAzED1 (33635) | more than 3 years ago | (#35269708)

The new crop of sysadmins are sortof funny. I wasn't aware that there was a myth that rebooting a server fixed anything, among the unix ranks. Of course that doesn't fix anything.
Are the people spreading this myth the same folks that log in as root because hey - they're the sysadmin, and access controls are for wimps?

Windows Too (0)

Anonymous Coward | more than 3 years ago | (#35269774)

Actually I find I take the same approach with windows as well. Most of the time if you reboot a windows box the same problem is just going to repeat it self time and time again. So rebooting isn't actually a solution here either.

Currently working as a software developer in a company I got so pissed off with the servers rebooting all the time. When the resident it guy was on holiday it fell to me (next person who had any experience of doing this sort of thing). But the time he came back the network was running perfectly fine. Managers asked what i had done. I just said i fixed a few things properly about 4-5 major things instead of just rebooting it ... Long story short. The it guy isn't there any more. I took over his role on top of my own and the network has been running find ever since ...

Oh I got a chunk of the other guys pay for it too :)

Nothing to read here, move along.... (1)

macson_g (1551397) | more than 3 years ago | (#35269806)

Its a second time in a week when a barely interesting post from this guy's blogs makes it to the main page. What's wrong with you, /. ?

Seperates the n00bs from the pros. (0)

Anonymous Coward | more than 3 years ago | (#35269810)

If anything would break on a reboot of a Unix system, your sysadmins aren't doing their fucking jobs and need to be crucified on the shattered remains of a cabinet.

If you can't reboot a specific, single server without production impact, your architects aren't doing their fucking jobs and need to be crucified on a whiteboard easel.

If all you care about is some asinine 'uptime' number, turn in your fucking credentials now - you have no business being anywhere near a command line.

I know this is Slashdot, and fanbois abound, but this is coming it a bit high. Yes, Suzy, even Unix systems need to be rebooted now and then.

Eh? (2)

ledow (319597) | more than 3 years ago | (#35269840)

- Design system
- Build system (involves inevitable reboots)
- Test system (involves inevitable reboots)
- Move system into production.

Once the services you need start up the way you want, don't play with it. Put it into service and have backups of the original image, any changes you make and a working replacement (Yes, have a working replacement - there is *nothing* better than having another machine sitting next to your server that can take over its job with the flick of a switch while you repair it - it also lets you test changes safely, and whenever you're sure the system is how you want it, you push the same image to your "copy of" server).

If you do it properly, that machine will then stay up until hardware failure. Sometimes that *can* be years away. If you do it properly, you shouldn't ever, ever, ever be rebooting a server that's in production - you're just masking the real problem. Yeah, it'll work most of the time but it's just a way of papering over the cracks. The server hung, the service died, the settings got out of sync, or whatever, for a reason. Just rebooting is ignoring that reason for sake of service continuance - if the service is that vital, you should have high enough availability to cover such incidences or that same problem will come back to bite you later.

Nobody cares about enormous uptimes, but having a server that you haven't NEEDED to touch in months is a good thing. It means that it has a well-defined function and has been performing correctly - that's your "stable" version and should be treated as such. Every time you make a change to a server, it then becomes a "current/experimental" version that you should be wary of.

At worst, when a problem appears, you turn ON a replacement server and fix the one that is showing problems. If its role is well-specified, you don't get "feature creep" where it's running a million things that it never used to and they're not in your startup properly because it's never rebooted enough for you to test them.

On Windows, or Unix, you shouldn't have to reboot. If you do, it's to test something or correctly reinitialise after fixing a problem (a post-solution reboot just to make sure it works as required isn't a bad thing but certainly not "required"). The worry of hardware failure on boot shouldn't stop you rebooting, and similarly you shouldn't reboot just to "spot" problems. Both suggest inattention and lack of suitable backups/replacements/high availability solutions.

Systems can easily go 3-4 years in operation without requiring a reboot. If your hardware is good quality, you're monitoring the server as you should be, you have adequate backups/replacements and the role it performs isn't changed, there's no need to ever reboot it past initial testing. I have internal school servers that only get rebooted in the summer (i.e. once per annum) and that's only because the power goes off to upgrade the electrics each year.

If it wasn't for that, I'd just leave them running. They don't need kernel 2.6.192830921830 and they have been doing that same job reliably for a LONG time. I'm not going to kick them into a reboot "just because". Similarly even the tiniest memory leak in their processes would cause me problems that I would spot immediately.

As it is, 450 happy users all day long for years. The last one I installed actually took a whack from a collapsed networking cabinet coming off the wall (full of fully-populated Gigabit switches) and dropping six feet onto it. Apart from a small dent it carried on just fine, and the disks were idle, and SMART / data integrity show no problems. I rebuilt the entire network cabling around it because switching it off wasn't necessary. If it did reboot and it didn't come up in the expected state? There's a copy of it on another machine on the other side of the room - it's predecessor that also didn't reboot for years but wasn't fast enough to run the amount of PHP / MySQL we needed it to among its other functions. Having the replacement machine also lets me see *exactly* whether its the server having problems, or other associated services - slap a replacement in (just switching the Ethernet cables between the two is sufficient because both machines are setup to think they are the same IP - and yet again, no reboot required!), if problem persists almost certainly NOT the server at all.

The article is almost an ad ... for Windows. (1)

Jahf (21968) | more than 3 years ago | (#35269848)

"If you shrug and reboot the box after looking around for a few minutes, you may have missed the fact that a junior admin inadvertently deleted /boot and some portions of /etc and /usr/lib64 due to a runaway script they were writing. That's what was causing the segfaults and the wonky behavior. But since you rebooted the server without digging into the problem, you've made it much worse, and you'll soon boot a rescue image -- with all kinds of ponderous work awaiting you -- while a production server is down."

That argument is somehow pro-Unix?

I mean, yeah, a Windows person can screw with boot files, too. However if a Windows person were to read that paragraph it certainly wouldn't do a thing to encourage them on the solidity of *nix. It basically translates to "if you're having a problem, don't restart, you may not be able to boot again because your other admins may be incapable of writing proper scripts since every *nix system is different on its boot structure ... so ALWAYS do a full check of the existence of your system binaries before rebooting."

Do Windows folks reboot too easily before examining logs, restarting services, etc? Sure. But this article extrapolates this point beyond the deep end.

Load More Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Create a Slashdot Account

Loading...