×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Windows Upgrade, FAA Error Cause LAX Shutdown

michael posted more than 9 years ago | from the first-woodpecker-to-come-along dept.

Bug 862

fname writes "The recent shutdown of LAX due to an FAA radio outage was apparently caused by a Windows 2000 integration flaw, possibility related to an old Windows 95 bug. An article at the LA Times claims that the outage was caused by human error, as the system will automatically shut down after 49.7 days (related to this Windows 95 flaw?), and a technician didn't reboot the system monthly as he should have. This happened after an upgrade from Unix to Windows. I don't think blame should be assigned to the technician who missed the task; rather, it seems a gross oversight for the FAA to guarantee that such a critical system will crash after only one missed maintenance task. Who's really at fault?"

cancel ×
This is a preview of your comment

No Comment Title Entered

Anonymous Coward 1 minute ago

No Comment Entered

862 comments

Repent, Sinners! (5, Insightful)

mfh (56) | more than 9 years ago | (#10313316)

The recent shutdown of LAX due to an FAA radio outage was apparently caused by a Windows 2000 integration flaw, possibility related to an old Windows 95 bug.

Okay... a Win95 bug leads to the LAX shutdown because the *same* bug was later found in Win2k? Yup, closed source is the answer, Mr. Gates. I hereby repent my sins of Open Source Freedom and agree that security by obscurity is the answer! /sarcasm

a technician didn't reboot the system monthly as he should have

You have to love a system that requires downtime as part of uptime. How many Linux users have this problem? (Please press the Start button to shut down (stop) the computer.)

Re:Repent, Sinners! (3, Insightful)

LostCluster (625375) | more than 9 years ago | (#10313375)

I've seen AIX-based database systems that require an overnight downtime to do reindexing, since non-SQL formats like DBase have always been a little funky when they start having to deal with million-record tables. It's amazing how ugly legacy databases can be compared to today's tech.

Retard (-1, Flamebait)

ArchieBunker (132337) | more than 9 years ago | (#10313412)

How big of a retard are you? The win95 bug had a patch available and I really doubt it shows up in win2k. Our win2k servers at work run for months without issue. You're quoting the submitter and the slashdot janitors ooops I mean "editors" didn't bother cleaning it up. Win2k is based on NT4, hardly close to win95.

Re:Repent, Sinners! (5, Funny)

Da Twink Daddy (807110) | more than 9 years ago | (#10313420)

You have to love a system that requires downtime as part of uptime. How many Linux users have this problem? (Please press the Start button to shut down (stop) the computer.)

Sure,

init 6
doesn't sound like it should start (initialize) anything...

Re:Repent, Sinners! (2, Informative)

Phillup (317168) | more than 9 years ago | (#10313519)

doesn't sound like it should start (initialize) anything

So... it should not initialize (begin) run level 6?

Re:Repent, Sinners! (-1, Offtopic)

Anonymous Coward | more than 9 years ago | (#10313426)

You took an "apparently", and a mis-spelled "possibly" from an article summary, and turned it into a Windows-bashing comment that got +5 Insightful. Now I regret using my mod points earlier today.

Re:Repent, Sinners! (-1, Flamebait)

Anonymous Coward | more than 9 years ago | (#10313502)

Call +(352) 26 53 37, and ask to speak to the Devil's^H^H^H^H^H^H^HBill Gates' Brother...

The richest man in the world may be unreachable for us lowly peons, but his brother is only a phone-call away! [This is an international call, but well worthwhile!]

And, if you have some endless loop of black paper lying around, here's the fax number: (352) 26 53 37 30

Of course, sitting on the Xerox machine works too, especially if its color.

Frist Prostage? (-1, Troll)

Anonymous Coward | more than 9 years ago | (#10313320)

Fristy yayah.

first post (-1, Troll)

Anonymous Coward | more than 9 years ago | (#10313323)

i win

Anyone want to clue them in to scheduled jobs? (3, Insightful)

FyRE666 (263011) | more than 9 years ago | (#10313324)

It's obviously lunacy for any company to replace a proven system, which has given years of reliable service with some piece of trash that crashes if left running for over a month. That said, I was under the impression that a simple "at" job could be used on a Windows machine to run a script periodically (at is similar to cron, except far less capable, of course). Such a script could, if I'm not mistaken, be used to reboot the machine. One would think this would be an ideal way to hide the problem very nicely.

We use a similar system to reboot all of our NT servers every weekend to help prevent crashes during the week (doesn't work of course, but still).

Re:Anyone want to clue them in to scheduled jobs? (1)

DarkKnightRadick (268025) | more than 9 years ago | (#10313371)

We use a similar system to reboot all of our NT servers every weekend to help prevent crashes during the week (doesn't work of course, but still).

You and LAX must not have installed Windows properl. /sarcasm. (;

No (1)

temojen (678985) | more than 9 years ago | (#10313377)

It wouild suck if all the radios shut down in the middle of an emergency landing. Better to hae it manual.

Re:Anyone want to clue them in to scheduled jobs? (3, Interesting)

TykeClone (668449) | more than 9 years ago | (#10313378)

at sucks. Very, very much.

I've got an NT server that would hang after 2 weeks. I set up an at job to restart that service nightly and do not have that problem.

I've also got several linux servers that just plain run (and some NT/2000 servers as well).

That being said, rebooting sometimes does clear up many evils. We have a speakerphone (around 10 years old - no OS) that just wouldn't work one day. After looking at it, I unplgged it and plugged it back in (I rebooted it!) and it worked. No good reason, it just helps.

Re:Anyone want to clue them in to scheduled jobs? (0)

Anonymous Coward | more than 9 years ago | (#10313388)

Yes, let's try that, and we'll schedule the system to be rebooted the very next time your plane just happens to be landing in an emergency. Oh, wait, we can't schedule emergencies in advance....

There's something to be said for 24/7 uptime. (Very little to be said about Windows achieving it, but still, in some systems it is critical. Ever have your pacemaker's software crash on you?)

Re:Anyone want to clue them in to scheduled jobs? (5, Informative)

dbottaro (302069) | more than 9 years ago | (#10313432)

Agreed. A well written AT script something like this: Each M T W Th R S Su 12:45 AM shutdown /l /r /y /c

Would do the trick... We have used that exact script for YEARS to nightly reboot a troublesome NT4 BDC at a remote location.

While we knew that this was not a great solution, no one needed to access the server at that time of night. Any right minded IT person should be able to see the flaw in the FAA's logic.

buzz words (0)

Anonymous Coward | more than 9 years ago | (#10313325)

I'm surprised we didn't get 'mission' critical there in the blurb

Why not automate it? (1)

DevilJeff (243585) | more than 9 years ago | (#10313326)

Have they never thought to just schedule an event to reboot the computer every 30 days?

Re:Why not automate it? (2, Informative)

Embedded2004 (789698) | more than 9 years ago | (#10313374)

Well, if it is running windows, and somehow someone made a mistake and desided to run it on some mission critical system, they should reghost it as often as they can.

Windows has an odd tendancy to corrupt it self.

Re:Why not automate it? (1, Insightful)

Anonymous Coward | more than 9 years ago | (#10313416)

"Have they never thought to just schedule an event to reboot the computer every 30 days?"

Would it not worry you to know that the ATC were relying on a computer that reboots itself so often?

And the lesson is... (2, Insightful)

jcr (53032) | more than 9 years ago | (#10313327)

Don't use this stuff in mission-critical applications.

-jcr

Re:And the lesson is... (2)

LostCluster (625375) | more than 9 years ago | (#10313401)

"This stuff" being all of IT. HDs will fail within 5-7 years no matter what OS you put on them...

Good IT is so hard to pull off because you have to convince people that events that strike once every few years have to be prepared for otherwise a disruption in service will occur.

WOW (-1, Troll)

Anonymous Coward | more than 9 years ago | (#10313328)

Look at all those links!

who is really at fault? (0)

Anonymous Coward | more than 9 years ago | (#10313331)

Why Microsoft of course. Unix doesn't have any flaws.

In a related story (1)

lateralus_1024 (583730) | more than 9 years ago | (#10313333)

....all in-flight movies are played on Windows Media Player.

Re:In a related story (2, Interesting)

databank (165049) | more than 9 years ago | (#10313465)

Actually there's a lot of truth to that..I once flew in an airliner overseas which had the tv screens built into the back of the seat in front of me.

In the middle of the movie, the screen did the classic "blue screen of death" and rebooted with the Windows logo. There were quite a few chuckles in the aircraft when the movie was restarted and then the jokes started flying about the plane running on Microsoft Windows....(uh..oh..we're going to crash!..no wait, that's just Microsoft Windows)

Why is the FAA using off the shelf software? (4, Informative)

Samir Gupta (623651) | more than 9 years ago | (#10313339)

This is not an attack on Microsoft.

But most off the shelf software have disclaimers expressly stating they are not to be used in mission critical situations. Eg:

"technology is not fault tolerant and is not designed, manufactured, or intended for use or resale as on-line control equipment in hazardous environments requiring fail-safe performance, such as in the operation of nuclear facilities, aircraft navigation or communication systems, air traffic control, direct life support machines, or weapons systems, in which the failure of Java technology could lead directly to death, personal injury, or severe physical or environmental damage."

Re:Why is the FAA using off the shelf software? (3, Informative)

pyro101 (564166) | more than 9 years ago | (#10313442)

I don't know about using windows 95, but here at the nuclear facility that I work at we use not only Java but also windows. Have been using windows for some time and have to use java because that is the way Oracle is going. We have more problems with hardware issues then with the off the shelf software , but no matter what problems we get from any of it we as software developers are supposed to anticipate it and prove that we can, within reason catch the user/machine/other devices before screwing stuff up. But most of all we go through huge testing on any small addition or change to the code base, even changing color on menus requires a 10-20 signitures (never know what else could have been added on accident).

What?! (5, Funny)

ottergoose (770022) | more than 9 years ago | (#10313340)

I thought switching to Windows from *nix saved time, money, and hassle! Haven't you guys seen those banner ads here?

I Hate to Say It (2, Insightful)

DarkKnightRadick (268025) | more than 9 years ago | (#10313341)

But I'm going to.

It's M$'s fault. Why do I hate to say it? Because it'll just be seen as more anti-MS crap from another /.er.

All I have to say is if the shoe fits, wear it.

In this individual case a PHB made a decision to scrap the old, stable OS to a new, known-to-be-unstable OS. That screams PHB.

Re:I Hate to Say It (4, Funny)

multimed (189254) | more than 9 years ago | (#10313423)

No way is it Microsoft's fault. It even says so in their EULA...

I'm still amused & suprised the poster left off the quotes as in "upgrade" from Unix to Windows.

Re:I Hate to Say It (1)

DarkKnightRadick (268025) | more than 9 years ago | (#10313452)

Haha. I do agree with some other posters that the PHB's involved as well as the tech should take some of the blame. After all, they chose and installed the software.

That is indeed amusing.

A hit for the other team... (3, Interesting)

LostCluster (625375) | more than 9 years ago | (#10313344)

When a ball drops on a baseball field at the midpoint between two positions, it's scored a "hit" for the opposition rather than an "error" against either player. Still, a hit for the other side is a bad thing for the entire team.

This mess was big enough that there's a large enough supply of blame to give some to everybody involved.

- No system should require a manual reboot on a regular basis... there should at least be a script capable of accomplishing that. But somehow, one got implemented. Blame whoever bought it.
- Windows shouldn't have had a faw that required monthly reboots. Blame Microsoft.
- Somebody should have done the reboots like they were told to. Blame that poor smuck.

Bottom line is that everybody's at fault because had any one piece in the chain done their job properly the failure wouldn't have happened, but a cascade of mistakes lead to the ball hitting the grass instead of a glove.

Re:A hit for the other team... (5, Insightful)

PPGMD (679725) | more than 9 years ago | (#10313436)

The patriot missile system had a similar problem. It's timing broke down after a period of time without a reboot (it was a much shorter cycle, either one day or one week).

Microsoft isn't the only one to have issues like that. But it has been patched and there should have been more than enough time for the FAA to test and deploy the patch on the few legacy machines running Windows 95.

I simply blame the FAA for wasting money away every year, billions are sunk into the system, but rarely does anything come out of it, Lockheed can deploy a complete new system to every airport for the amount of money that is being dumped into the old TRACONs and towers for MX.

Re:A hit for the other team... (2, Insightful)

oGMo (379) | more than 9 years ago | (#10313449)

Bottom line is that everybody's at fault because had any one piece in the chain done their job properly the failure wouldn't have happened, but a cascade of mistakes lead to the ball hitting the grass instead of a glove.

An error is scored against a player if the player is determined to have been negligent in their position according to the rules. If someone hits a line drive right past the first baseman, it's still a hit. If the first baseman catches it, then drops it instead of making a tag, it's an error.

If multiple players are negligent, then multiple errors are scored. We've all seen "blooper" videos where there are cascading errors; one guy drops a catch, throws it to the next guy who drops it in turn, etc.

This is what happened here; it's not a hit, it's a cascade of errors. Everyone is to blame, because they all did something stupid. That doesn't make it "OK," it doesn't make any particular party less at fault.

I don't think this contradicts what you're saying here, I just wanted to emphasize the point. ;-)

Re:A hit for the other team... (2, Insightful)

LostCluster (625375) | more than 9 years ago | (#10313508)

If multiple players are negligent, then multiple errors are scored. We've all seen "blooper" videos where there are cascading errors; one guy drops a catch, throws it to the next guy who drops it in turn, etc.

Only one error can be scored per base advanced by the runner, and if the runner took first by a "hit" before the errant throw, then there is only one "error" for his advancement to second. If two players crash into each other and the ball drops, it's usually a hit because it's hard to say either would have been able to make the catch "with normal effort" which is the real standard for an error.

Migration (1)

OxygenPenguin (785248) | more than 9 years ago | (#10313347)

Why did they move from Unix to Windows in the first place? And why should a bug from Win95 crash a migrated Win2K?

How sad that such a sprawling metropolis of commerce and travel can be brought to its knees by the magic that is Windows.

Color me suprised.

Re:Migration (0)

Anonymous Coward | more than 9 years ago | (#10313422)

And why should a bug from Win95 crash a migrated Win2K?

You're taking the LA Times at their word that is what actually caused it. Since when are news papers 100% reliable (or 50% reliable) sources of information?

How sad that such a sprawling metropolis of commerce and travel can be brought to its knees by the magic that is Windows.

You could also say how sad that such a sprawling metropolis of commerce and travel can be brought to its knees by one person failing to do their job (reboot the system).

Re:Migration (5, Funny)

legirons (809082) | more than 9 years ago | (#10313464)

"Why did they move from Unix to Windows in the first place?"

Maybe they didn't want to have to reboot on January 19, 2038

duh... (0)

Anonymous Coward | more than 9 years ago | (#10313348)

crontab -e
59 23 15 * * shutdown now >/dev/null :P

Heh (3, Insightful)

GypC (7592) | more than 9 years ago | (#10313350)

upgrade from Unix to Windows

AKA, "The PHB Special"

Of course, the guy who was supposed to reboot the box will get all the blame. Shit rolls downhill.

Re:Heh (-1, Flamebait)

moop (140175) | more than 9 years ago | (#10313489)

I think he should get the blame. He had a job, and failed to perform it, knowing that it will crash after X amount of days only means he knew what would happen.

This is like blaming the gun makers when someone gets murdered. Microsoft made a gun, the FAA decided to purchase it, knowing it was dangerous,and then they gave it to a lunatic with out a trigger lock. I'm not suprised the crazy guy fired it.

If it's in the job description... (1)

DaftShadow (548731) | more than 9 years ago | (#10313354)

... I can think of no one else to fault *BUT* the technician. The IT guys know full well that this "quirk" exists, and in fact, part of their planning and maintenence involved resetting the machine in order to get around this potential problem. These guys did not complete their job duties, and as such, the system went down.

How can you intimate blaming the software company here?

- DaftShadow

Re:If it's in the job description... (1)

Cyb3r (224792) | more than 9 years ago | (#10313435)

Is it normal for a software made by such a company to need to be rebooted monthly?

Come on, lets be serious here...

this is a joke... (0)

Anonymous Coward | more than 9 years ago | (#10313355)

slashdot post this FUD and calls it news, implying that MS is at fault, this is pure trolling....

grow up slashdot editors, this has been old for a long time, grow up and stop the sensless bashing!!!

Ouch... (1)

hypermike (680396) | more than 9 years ago | (#10313359)

The newspaper said that a Microsoft-based replacement for an older Unix system needed to be reset every thirty days 'to prevent data overload', as a result of problems found when the system was first rolled out. However, a technician failed to perform the reset at the right time and an internal clock within the system subsequently shut it down. A back-up system also failed

Guess there was a backup, I feel for that guy.

Unemployed. (-1, Offtopic)

Anonymous Coward | more than 9 years ago | (#10313360)

I wish my unemployed butt could get paid to make decisions like this.

Uprgrade from UNIX to Windows.. (4, Funny)

Anonymous Coward | more than 9 years ago | (#10313364)

"This happened after an upgrade from Unix to Windows."

Thats the funniest thing I heard all day. Windows is an upgrade from unix. I almost choked on my coffee.

humans rule (3, Insightful)

Doc Ruby (173196) | more than 9 years ago | (#10313367)

It is human error: those bugs didn't write themselves. Nor did the operations protocol that required "rebooting LAX" every 49.69(!) days. Nor did the upgrade procedure that ignored that bottleneck. Nor did the upgrade decision that moved from Unix to Windows. Those were all human errors, as was the decision to keep a job at LAX that would face blame for shutting down the airport (or risking lives) if the reboot was missed, or unsuccessful.

"Not I," says the referee,
"Don't point your finger at me.
I could've stopped it in the eighth
An' maybe kept him from his fate,
But the crowd would've booed, I'm sure,
At not gettin' their money's worth.
It's too bad he had to go,
But there was a pressure on me too, you know.
It wasn't me that made him fall.
No, you can't blame me at all."
- Bob Dylan, "Who Killed Davey Moore?" [bobdylan.com]

integration flaw exposed: (3, Funny)

overbom (461949) | more than 9 years ago | (#10313368)

sleep 4294080
shutdown /s

Re:integration flaw exposed: (1)

LostCluster (625375) | more than 9 years ago | (#10313469)

Oh... there it is. Unit conversion flaw. They gave the value for seconds into a value for minutes... and ended up booting once every 10 years because of the factor of 60 mistake.

Ahh yes... (1, Flamebait)

WD_40 (156877) | more than 9 years ago | (#10313370)

I remember when the 49.7 day bug was discovered. That was right after I had just hit the 49.7 day freeze in an attempt to keep my personal machine alive as long as possible.

When it froze, I didn't know why until I read the story, just figured it finally gave up the ghost for no real reason. It was time for a reboot anyway, that system was hurtin' bad.

Why the hell the have a critical system running on an OS that can't stay up for at least 50 days, I do not know.

Re:Ahh yes... (2, Insightful)

Qeyser (6788) | more than 9 years ago | (#10313483)

Moreover: why do you have a critical system that hasn't been patched in over five years?


Check the date on that news.com article linked in the main story -- it's from March of 1999. The bug is that old, and as I recall the fix didn't take that long to get out.


If LAX was trying to upgrade to/integrate win2k with ancient, unpatched Win95 systems, its no wonder that they're having problems . . .


-Q

Mandate Open Source for Government work (1)

nightsweat (604367) | more than 9 years ago | (#10313372)

There's no conceivable reason not to. How do you justify your money going to a company that keeps the source to itself?

You paid for it with your taxes - you own it. Demand open source at ALL government levels.

Microsoft does provide source to govt (0)

Anonymous Coward | more than 9 years ago | (#10313409)

The often do provide the source code to governments. It does not mean that is is free, or that the government can submit fixes.

But yer Honour! (1)

Skiron (735617) | more than 9 years ago | (#10313376)

MS lawyer: "It all worked in the flight2000 simulator? We always rebooted after every crash and everytime it was OK afterwards?"

Simple Politics (1)

Cobblepop (738291) | more than 9 years ago | (#10313385)

Of course the technician was blamed - if not, some CIO-type in charge would have had to take it, and he wouldn't allow that to happen. It always runs downhill...

a little off topic... (-1, Troll)

Anonymous Coward | more than 9 years ago | (#10313387)

Who's really at fault?"

well duh... Bush's!

Why 49.7 days? (4, Informative)

FirstTimeCaller (521493) | more than 9 years ago | (#10313392)

Because there are 4294080000 millisconds in that time period. Just enough to cause a roll-over when using a 32 bit counter (and yes, 49.7 is an approximate value).

Very few Win95 systems ever made it that long without a reboot... but you would've thought that it would've been fixed by Windows 2000.

Re:Why 49.7 days? (4, Informative)

Holi (250190) | more than 9 years ago | (#10313481)

It was this issue has nothing to do with the Win95 bug, It was just the submitters opinion (which happens to be very wrong)

Before the torrent of "windows sucks" posts... (3, Insightful)

rasafras (637995) | more than 9 years ago | (#10313393)

...keep in mind that we have established numerous times that windows is not suitable for systems that need reliability and stability. It is not the operating system's fault that this happened, it is the FAA's for choosing to use it instead of considering the better alternatives. If you get run over on a bicycle while riding on the highway, don't blame the bike.
Quick addition: it seems that the fault does not belong entirely to windows, but rather a combination of the software running on it and the system architecture.

With that said, Windows could stand to improve a lot. It has too many bugs, too many flaws, and so on. And it definitely does not have a stable, secure, reliable base. So don't expect it to.

Would you trust your life to windows 95? (0)

Anonymous Coward | more than 9 years ago | (#10313410)

Flight BA 91429 on final approach.

Tower: what does "svga.dll performed an illegal instruction" mean?

Pilot: Oh sh...

Now even the submitters aren't reading the article (2, Insightful)

Holi (250190) | more than 9 years ago | (#10313415)

From the submission
possibility related to an old Windows 95 bug

From the Article.
The shutdown is intended to keep the system from becoming overloaded with data and potentially giving controllers wrong information about flights, according to a software analyst cited by the LA Times.

The shutdown is not a crash but a scheduled event to bring the servers down to flush data.
So it does not seem to be a problem with Windows (Ok now I get marked as troll) but with the FAA's own software.

32 bit timer (5, Interesting)

charnov (183495) | more than 9 years ago | (#10313419)

This old error was from the use of a 32 bit 1 ms increment timer (comes out to 49.7 days until rollover). AFAIK, this was fixed in Win2k and above when the timer got bumped to 64 bit. Maybe whoever set up LAX was using some ancient legacy middleware that used the old timer. This is just bizarre. In both locations that I have worked the last three years, none of the Win2k or Win2k3 servers went down ever. Sounds like bad consultants.

It'll still crash... (1)

Kippesoep (712796) | more than 9 years ago | (#10313520)

after 584542046 years. Okay, I admit... when you reach that time, you'll probably have other problems than a Win2K crash.

...eh-heh-heh. (1)

rincebrain (776480) | more than 9 years ago | (#10313421)

Silly IT departments.

If you "upgrade" a piece of software, then discover it requires a complete manual system restart to remain stable, the prudent thing to do in any other circumstance would be a rollback.

Unfortunately, since this is an IT department, it must run Windows; after all, where [tux.org] could [gentoo.org] you [tldp.org] ever [google.com] find [linuxhelp.net] support [linuxforums.org] for Linux [linux.org]?

Whos fault? (0)

laurent420 (711504) | more than 9 years ago | (#10313428)

the blame rests with microsoft. windows is the only "operating system" i know of that *needs* to be rebooted to maintain regular operation. i'm sure this wouldn't have happened if they stuck with a rebootless unix machine.

Check out this little pile of bullshit (5, Interesting)

Trailer Trash (60756) | more than 9 years ago | (#10313431)

The system offers unprecedented voice quality, touch-screen technology, dynamic reconfiguration capabilities to meet changing needs, and an operational availability of 0.9999999

Okay, bullshit. If I have to reboot a server every month, .0000001 of a month is- oh, let's be generous and only count months with 31 days- about .26 seconds. That's a damned fast boot time for Win2K.

Maybe they left off a percent sign?

Re:Check out this little pile of bullshit (2, Insightful)

k4_pacific (736911) | more than 9 years ago | (#10313479)

"Maybe they left off a percent sign?"

Or maybe there's some kind of failover to a backup system (Which they also forgot to reboot)?

Simplest Fix (And real concern) (1)

FalconZero (607567) | more than 9 years ago | (#10313433)

Surely the simplest 'fudge' to fix this problem is
to write a script that beeps loudly every 10 mins
or some other (read: more sensible) notification
after the system uptime exceeds 30 or so days?

But seriously, if its running windows its not the
monthly reboots the need to worry about, its the
quaterly format/reinstall procedure thats required
for stable operation.

I dont think I've had a stable (home) windows install for
more than 6 months without reinstall, but maybe I'm
pushing my luck by actually USING the computer.

stop passing the buck!!! (0)

Anonymous Coward | more than 9 years ago | (#10313434)

Who's really at fault?"

...it's the technicians fault for not doing his job

...it's MS's fault for not patching such a blatent bug

...it's the guys fault who decided to move from the stable unix to shite Windows 2000

I could fix that problem (1)

randomErr (172078) | more than 9 years ago | (#10313439)

I wrote a VB program years ago for the Win95 to solve this problem. I just had the scheduler run my program that rebooted the system for me.

Umm.... Duh

We used to joke (3, Interesting)

multiplexo (27356) | more than 9 years ago | (#10313440)

that no one would ever run into the 49.7 day bug on a Windows system because the chances of having that much uptime were slim to none. Having a system where you know that things are broken and you have to reboot it every 30 days to keep it from breaking down is a bad thing, deploying such a system into a production environment is even worse (but it's been done, I don't know how many times I wrote cron jobs to kill bad pieces of software and restart them) but deploying such a system in an environment where lives are at stake is completely inexcusable, regardless of whether or not it is closed or open source. This is similar to having a circuit in your house that overheats because occasionally too much load is placed on it. The idiot solution is to reset the breaker when it trips, the correct solution is to put in a bigger circuit that can handle the peak load. This vendor provided the idiot solution to this problem and should be punished for it, this never should have been deployed, I can only hope that they won't blame the technician for failing to do something that he wouldn't have had to do if the system had been designed properly.

I also love the statement that the system was upgraded from UNIX to Windows. Isn't this kind of like upgrading from being in very good health but not being good looking to being somewhat good looking but suffering from cancer, AIDS and heart disease?

Re:We used to joke (1, Funny)

Anonymous Coward | more than 9 years ago | (#10313503)

UNIX -> Windows is like Clinton -> Bush.

Sorry, but someone had to say it.

49.7 days (5, Funny)

k4_pacific (736911) | more than 9 years ago | (#10313443)

I remember back when that bug was announced. Seems it was at least a couple of years after Windows 95 had been out. I guess they had to work through a lot of other bugs to get Windows 95 to make it long enough for this bug to occur.

Re:49.7 days (1)

ArchieBunker (132337) | more than 9 years ago | (#10313487)

The story is a troll, all the win95 code died with Windows ME. In case you haven't realized win2k is based on NT4 which never had 49 day uptime bug.

Maintenance (1)

apoplectic (711437) | more than 9 years ago | (#10313451)

The employee missed the maintenance window. If you forget to do something that is a part of your job, I would have to suggest that you are responsible for the consequences. Now, does placing the employee in such a situation apply some burden of responsibility upon higher-ups? Certainly. But, the employee should be held responsible...ESPECIALLY if the importance of the maintenance was made clear.

Flaw left unfixed for too long? (2, Interesting)

Astro-pilot (765980) | more than 9 years ago | (#10313463)

Was the flaw left unfixed for too long because they did not have access to the source code? Or was it because it was too expensive? If this is such a critical system that it can cause loss of life (on a massive scale, no less), the root cause should have been fixed, rather than the workaround. I remember reading somewhere that this flaw has now been fixed. Smells like a managerial issue within the FAA, not just a technician problem. Remember NASA and the space shuttles?

Can't be a common problem... (0, Redundant)

mrgreen4242 (759594) | more than 9 years ago | (#10313474)

I can honestly say I have never encountered a situation where this would be a problem, as I have never had a Windows box stay up for more than a week without either crashing, getting so bogged down it needed a reboot just to open Word, or requiring me to reboot after I installed some ridiculously little program.

If I recall (1)

sweetshot97 (815470) | more than 9 years ago | (#10313475)

If I recall, doesn't MS have something that absolves them of any liability listed towards the end of the license agreement. Something along the lines of, "Do not use in mission critical places." Or was it more like do not install in missile silos or nuclear facilities, something like that right? Someone correct me. If I am right about the license agreement, that was stupid of LAX to have been suckered into switching from UNIX to M$. Oh wait, I forgot, everything works better on MS products right? That's why we have many security/virus/worm/bug/whatever flaws. What a great product Bill!

Uhm, THE TECHNICIAN (1)

mekkab (133181) | more than 9 years ago | (#10313477)

I don't think blame should be assigned to the technician who missed the task; rather, it seems a gross oversight for the FAA to guarantee that such a critical system will crash after only one missed maintenance task. Who's really at fault?"

Actually, I do. You've got a job; you've got deadlines. Do the work.

Don't be so hasty to blame the OS... (5, Insightful)

Ann Elk (668880) | more than 9 years ago | (#10313493)

OK, I know it's violation of /. policy to actually read a referenced article. My bad. But, according to the software.silicon.com article:

Richard Riggs, an advisor to the technicians union, said the FAA - the American aviation regulator - had been planning to fix the program for some time. "They should have done it before they fielded the system," he said.

This sounds to me like more of a problem with the application, not the OS. The "system" crashed after 49.7 days, which is about 4 million seconds, which is about 4 billion milliseconds, which is (obviously) MAX_ULONG. I suspect the application is using a ULONG to store a timeout value and got pissed-off when it rolled over.

Blame the human (1)

Lead Butthead (321013) | more than 9 years ago | (#10313495)

By the very nature of the system, the blame can fall on no one other than the maintenance personnel. Otherwise the PHB that authorize the "upgrade," and the system that put the said PHB in the position to authorize said "upgrade" would look incompentent and foolish, and we can't very well have that.

hum, guess they selected the wrong word... (1, Troll)

sxpert (139117) | more than 9 years ago | (#10313497)

This happened after an upgrade from Unix to Windows.

Shouldn't that read downgrade instead ???

First a Navy ship, now this.... (0, Flamebait)

HikeFanatic (809939) | more than 9 years ago | (#10313500)

Nice going, Microsoft!

First you crash a US Navy ship a couple of years back thanks to a NT4 divide by zero bug, now this.

Today's lesson kids: Don't use Windows for any mission-critical apps!!

Its a mind set sort of thing...... (1)

zootman (682820) | more than 9 years ago | (#10313514)

Ah, the old "windows maintenance reboot" problem. It always amazes me how IT managers (hell even some techos) accept the need to re-boot their windows systems every week. At my work, the windows guys accept it as normal maintenance. If I had to reboot my AIX and z/OS systems every week there would be hell to pay. But because its windows , its accepted. I dunno, mediocraty is the new standard these days...........

"upgrade from Unix to Windows." (1)

captainclever (568610) | more than 9 years ago | (#10313515)

" upgrade from Unix to Windows "

Ahahahahahahahahahahahahahahahahahahahah that's the funniest thing i've read in ages :)

49.7 Days - A New Record for Windows 95! (3, Funny)

akiy (56302) | more than 9 years ago | (#10313516)

I believe the 49.7 days of uptime for a Windows 95 box is a new record, shattering the previous record in Norway of 27.9 days back on January through February of 2001. Congratulations!
Load More Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Sign up for Slashdot Newsletters
Create a Slashdot Account

Loading...