Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Software Update Shuts Down Nuclear Power Plant

Soulskill posted more than 6 years ago | from the we-have-safety-systems-because-we-are-very-stupid dept.

Software 355

Garabito writes "Hatch Nuclear Power Plant near Baxley, Georgia was forced into a 48-hour emergency shutdown when a computer on the plant's business network was rebooted after an engineer installed a software update. The Washington Post reports, 'The computer in question was used to monitor chemical and diagnostic data from one of the facility's primary control systems, and the software update was designed to synchronize data on both systems. According to a report filed with the Nuclear Regulatory Commission, when the updated computer rebooted, it reset the data on the control system, causing safety systems to errantly interpret the lack of data as a drop in water reservoirs that cool the plant's radioactive nuclear fuel rods. As a result, automated safety systems at the plant triggered a shutdown.' Personally, I don't think letting devices on a critical control system accept data values from the business network is a good idea."

cancel ×

355 comments

Sorry! There are no comments related to the filter you selected.

Wow that is so funny (-1, Troll)

Anonymous Coward | more than 6 years ago | (#23689349)

Maybe nuclear power isn't a better choice.

Re:Wow that is so funny (4, Insightful)

Anonymous Coward | more than 6 years ago | (#23689683)

Correct. It is not the better choice. In the foreseeable future, it is the only choice.

Re:Wow that is so funny (5, Insightful)

Anonymous Coward | more than 6 years ago | (#23689759)

And a shutdown, while incovenient, is not a catastrophe. In fact, it speaks well for the plant's safety that it did automatically shut down when faced with bad data.

Re:Wow that is so funny (3, Insightful)

Anonymous Coward | more than 6 years ago | (#23689901)

Agreed. That was good software design to assume a worst-case scenario when the sensors stopped sending in data. The alternative (sending pager alerts or something) would be far worse.

analogy and reality. (0, Troll)

twitter (104583) | more than 6 years ago | (#23689875)

Computers are no a good choice because Windows sucks and M$ won't die. Hmmm, looks like that was what was wrong with the plant too.

Re:analogy and reality. (-1, Troll)

westbake (1275576) | more than 6 years ago | (#23689909)

It's amazing how operators tollerated crappy DOS based SCDA systems in some plants. These things should be hunted down with spears and removed. No one should be able to reset a 4GW themral reactor at a $1 million/day through a someone's spamed out W2K desktop or a telephone modem. Un-Fucking-Believable.

Re:analogy and reality. (0, Flamebait)

willyhill (965620) | more than 6 years ago | (#23689955)

It's amazing that you keep replying to yourself and thinking no one has figured out that you have ten different accounts. That is amazing.

Why don't you just say what you have to say with a single post and stop trying to insult everyone's intelligence? For an account with a grand total of 48 posts, a distressing number seem to dedicated [slashdot.org] to the cult of twitter.

Install Complete... (5, Funny)

Anonymous Coward | more than 6 years ago | (#23689357)

Must restart reactor to complete software installation.

[Yes] [No] [OMFG!]

Oblig Simpsons reference (5, Funny)

J'ai Friedpork (1293672) | more than 6 years ago | (#23689451)

"Vent radioactive gas? Venting gas prevents explosion. [Yes / No]"

Re:Oblig Simpsons reference (2, Funny)

Phanatic1a (413374) | more than 6 years ago | (#23689501)

I'm impressed that for once dad's butt prevented the release of toxic g-

Re:Oblig Simpsons reference (1, Redundant)

truthsearch (249536) | more than 6 years ago | (#23689669)

Hey! All I have to type is Y. (To Marge) Hey, Miss Doesn't-find-me-attractive-sexually-anymore: I just tripled my productivity!

Re:Oblig Simpsons reference (1)

j79zlr (930600) | more than 6 years ago | (#23689911)

I wash myself with a rag on a stick.

Re:Install Complete... (0, Offtopic)

ProfessionalCookie (673314) | more than 6 years ago | (#23689487)

UAC jokes ensue...

Re:Install Complete... (2, Insightful)

sharkey (16670) | more than 6 years ago | (#23689527)

What, did they change the phone number in Dial-Up Networking?

Obama is the Manchurian candidate (-1, Offtopic)

Anonymous Coward | more than 6 years ago | (#23689595)

Get ready to enter a new era of failed policies cribbed straight from the failed state of the USSR.

Re:Obama is the Manchurian candidate (-1, Troll)

Anonymous Coward | more than 6 years ago | (#23689945)

Wouldn't that silly scenario more fit McCain? He was tortured over a period of 5 1/2 years by the commies and is now ascending the ranks.

I mean how does Obama even fit into your stupid scenario? That he's black?

Re:Install Complete... (3, Funny)

Anonymous Coward | more than 6 years ago | (#23689637)

Looks like the plug and play device 'Nuclear Reactor' is not fully SP3 compatible...

Re:Install Complete... (2, Funny)

SlashWombat (1227578) | more than 6 years ago | (#23689687)

CTRL-ALT_DEL -> Kaboom.

or perhaps just another variation on the BSOD (Blu Screen Of Death)

Re:Install Complete... (1, Funny)

Anonymous Coward | more than 6 years ago | (#23689961)

DRIVER_IRQL_NOT_LESS_THAN_OR_EQUAL
driver: cooling_system.sys

Operator: "Ohcrapohcrapohcrap! runningrunningRUNNING!"

Re:Install Complete... (1)

arbiter1 (1204146) | more than 6 years ago | (#23689775)

i think box would say [No] [OMFG i am gonna get fired]

:O (-1, Troll)

Trukutu (1222874) | more than 6 years ago | (#23689361)

Scary!

Re::O (5, Insightful)

Lurker2288 (995635) | more than 6 years ago | (#23689679)

What exactly do you find frightening about an automatic safety system doing exactly what it's supposed to in response to unusual input?

Re::O (0)

Alex Belits (437) | more than 6 years ago | (#23689767)

Accepting a change in sensors data from something that is not a sensor?

Re::O (2, Funny)

3vi1 (544505) | more than 6 years ago | (#23689937)

What exactly do you find frightening about an automatic safety system doing exactly what it's supposed to in response to unusual input?
The part where a reboot was required. That makes me worried that they were using Windows.

The chemical company I work for has VAX/Unix systems that haven't been rebooted in over four years... and only then because of power outages.

Hmmm, threw an exception (5, Insightful)

Anonymous Coward | more than 6 years ago | (#23689371)

I'd rather it shut itself down then suffer major failure.

No! (-1, Redundant)

EmbeddedJanitor (597831) | more than 6 years ago | (#23689645)

The business computers should not be connected to the control network. What a crap design. It's as bad as me updating my laptop and having to ask Google to reboot their servers.

Re:No! (1, Insightful)

Anonymous Coward | more than 6 years ago | (#23689697)

Wow, way to parrot the summary.

Re:No! (1)

maxume (22995) | more than 6 years ago | (#23689751)

It's since been disconnected.

Business Network? (4, Interesting)

camperdave (969942) | more than 6 years ago | (#23689993)

The business computers should not be connected to the control network.

From the summary:

The computer in question was used to monitor chemical and diagnostic data from one of the facility's primary control systems...
... when the updated computer rebooted, it reset the data on the control system...
If it's monitoring the primary control system then it seems to me like the machine would have to be on the control network. The real issue is why did the primary control system accept a reset from a monitoring system. It sounds like there's more than one bug to track down.

Re:Hmmm, threw an exception (5, Funny)

xlv (125699) | more than 6 years ago | (#23689711)

I'd rather it shut itself down then suffer major failure.
Personally, I'd rather it doesn't suffer a major failure at all, whether it's after a shutdown or not. Oh you meant than and not then, never mind...

Critical Update (5, Funny)

Enderandrew (866215) | more than 6 years ago | (#23689375)

Adds a whole new meaning to "Critical Update".

Lesson learned: (1)

Aaron32 (891463) | more than 6 years ago | (#23689385)

When updating the computer that controls the entire facility, HAVE AN UNDO PLAN!

Re:Lesson learned: (2, Funny)

J'ai Friedpork (1293672) | more than 6 years ago | (#23689415)

Actually, I think that the lesson learned here was "when dicking around with the boss's computer, make sure it's not plugged into anything important first."

Re:Lesson learned: (3, Insightful)

Anonymous Coward | more than 6 years ago | (#23689497)

However useful a tip that may be, it has nothing to do with this incident. You clearly never even made it to the article summary, let alone the actual article.

"... when the updated computer rebooted, it reset the data on the control system, causing safety systems to errantly interpret the lack of data as a drop in water reservoirs that cool the plant's radioactive nuclear fuel rods. As a result, automated safety systems at the plant triggered a shutdown."

From that snippet alone, it stands to reason that _any_ reboot of the computer would have caused this reset in at the control system. Nor is this at all surprising; go reset any data collection system connected to controller software for any sort of industrial process and see if the controller doesn't receive spurious data.

To me this is an example of the automated system doing it's job. "Hark! I am a coolant reservoir monitor and I have reason to believe there may be a loss of coolant inventory. Time to trip the system."

Re:Lesson learned: (4, Informative)

bluefoxlucid (723572) | more than 6 years ago | (#23689913)

No, it has no reason to believe the coolant system has water. It's called FAIL SAFE; if I'm not quite sure, then fuck it, back off and shut the grid down and go MAKE SURE everything looks right.

The proper response of a nuclear cooling system to not knowing whether or not it's working correctly is not "let's keep running hot and see if more sample data comes across."

One begs the question (0, Troll)

jo42 (227475) | more than 6 years ago | (#23689403)

Was it running a Microsoft by-product or not?

Re:One begs the question (0)

Anonymous Coward | more than 6 years ago | (#23689445)

NO GODDAMNIT
http://begthequestion.info/

Re:One begs the question (0)

Anonymous Coward | more than 6 years ago | (#23689801)

How about www.dont-be-so-fucking-pedantic.com?

Re:One begs the question (1)

fyoder (857358) | more than 6 years ago | (#23689505)

Was it running a Microsoft by-product or not?
The article doesn't say. I suppose it could have been Ubuntu, they've had a couple of kernel updates recently, but somehow I doubt it.

Re:One begs the question (1)

CaptainTux (658655) | more than 6 years ago | (#23689755)

Remember when the Slammer worm hit the net a few years ago? There was an article in some defense newspaper I saw that mentioned that they were concerned about power generation and management facilities being hit by the worm. So, from that, I would say it's a reasonable assumption that the facility was running some version of Microsoft Windows (probably NT4 or 2000).

Re:One begs the question (5, Funny)

badboy_tw2002 (524611) | more than 6 years ago | (#23689939)

Good enough evidence for me! Microsoft caused a nuclear meltdown! Quickly, to the Blogo-Sphere!

Re:One begs the question (1)

cjb658 (1235986) | more than 6 years ago | (#23689967)

Nuclear power plants run Windows NT?

No wonder we have such a N.I.M.B.Y. problem with them.

MOD PARENT UP! (4, Funny)

Lux (49200) | more than 6 years ago | (#23689835)

He's trying to find an opportunity to bash Microsoft!

Re:One begs the question (3, Informative)

Viper Daimao (911947) | more than 6 years ago | (#23689891)

one begs the question...
No one doesn't [wikipedia.org]

Re:One begs the question (-1)

Tango42 (662363) | more than 6 years ago | (#23689941)

Language changes - keep up, or get out of the way.

Fail-Safe (4, Insightful)

lobiusmoop (305328) | more than 6 years ago | (#23689405)

Personally, I am reassured that these reactors are designed to shut down at the drop of a hat. This is not a situation were fuck-ups should be masked, any discontinuity, however minor, really needs to be highlighted and dealt with immediately.

Re:Fail-Safe (2, Funny)

Sitnalta (1051230) | more than 6 years ago | (#23689503)

Yeah, but you don't want the reactor shutting down because the computer system is shit. That is most definitely not reassuring to me.

Re:Fail-Safe (4, Insightful)

snkline (542610) | more than 6 years ago | (#23689545)

Umm, yes you do. If something in the system is shit, you don't want the reactor ON!

Re:Fail-Safe (3, Insightful)

NMerriam (15122) | more than 6 years ago | (#23689569)

Yeah, but you don't want the reactor shutting down because the computer system is shit. That is most definitely not reassuring to me.


On, the contrary, shutting down because the system is shit sounds like a much better option than continuing to run despite the shittiness of the computer monitoring everything.

Of course, the ideal situation would be to have good computers that only get updated in scheduled, planned ways so that you don't have the issue at all. But shutting everything down when something is amiss is the only sensible response.

This was not a "fail-safe" incident (5, Insightful)

Drenaran (1073150) | more than 6 years ago | (#23689745)

The problem here is that the system didn't shut down because it detected an error in the data collection system, instead it incorrectly detected a problem that did not in fact exist and then proceeded to take action. While the engineer in me is fairly certain that the system is designed to always fail to a safe state (as in, any automatic emergency response couldn't accidentally make things worse - at least not without raising all sorts of alarms), it is still concerning that internal control systems can be so effected by external servers.

In the article they mention that the system wasn't designed for security (since it was meant to be internal) - but this isn't a security issue at all! Any sort of system that relies upon other systems should be designed to assume failure can and will occur in other systems - that is not to say that it needs to verify/evaluate incoming data to make sure it is "good", but rather that it can tell the difference between receiving data (such as current water levels) and receiving no data at all (system failure). Once it has that it can ideally automatically switch to a backup system, or do what it did here and enter a fail-safe state (the difference being that it does so while pointing out the actual problem and not a incorrectly perceived problem in a different part of the system).

D'oh! (1)

file_reaper (1290016) | more than 6 years ago | (#23689411)

Surely this computer thingy must be the same as my home computer thingy....it always works when I turn it off and on again.

Sure glad the safety systems kicked in as per normal.

Just looking to avoid a script kiddie attack (0, Flamebait)

zazenation (1060442) | more than 6 years ago | (#23689421)

It was probably just a Microsoft Windows Update, I don't see how that could cause any problems....

How could NRC even allow this in the first place? (1)

McNihil (612243) | more than 6 years ago | (#23689433)

As a regulatory wouldn't there be some check and balances to keep critical systems being on their separate ring and not on directly interdependent?

This is beyond incompetence... it is gross negligence.

Re:How could NRC even allow this in the first plac (3, Funny)

Lurker2288 (995635) | more than 6 years ago | (#23689713)

"GROSS NEGLIGENCE - Failure to use even the slightest amount of care in a way that shows Recklessness or willful disregard for the safety of others." - 'Lectric Law Library.

Yeah, those bastards, the way they used THE SLIGHTEST AMOUNT OF CARE in designing a system that shuts down in response to unexpected data so as to avoid RECKLESSNESS with the SAFETY OF OTHERS.

Critical Updates (0, Redundant)

Zekasu (1059298) | more than 6 years ago | (#23689447)

Critical Updates are ready to be installed on your nuclear reactor. You must restart to complete them.

That's what you get for using Microsoft.

More like bad system design (1)

aliquis (678370) | more than 6 years ago | (#23689453)

To me it sounds much more like they have a bad system design if it's impossible to reboot one of the machines / it can't run with one of them offline. Not something which are to blame on the software update (shouldn't such things be expected anyway?)

I guess "software update" can have been used to bash Microsoft a little or something, not that it say windows update, or maybe the poster hates all kinds of software updates?

Re:More like bad system design (4, Insightful)

RiotingPacifist (1228016) | more than 6 years ago | (#23689515)

The only safe way to update a system is a reboot, sure you CAN do some stuff on linux bsd etc to avoid having to reboot( hell this was probably running some unix derivative so it was probably possible to do the update without rebooting), but you wouldn't want to run the risk of introducing an unchecked bug by doing a live update. when your choices are:
a) high chance of accidentally shutting down a reactor harmlessly
b) small chance of fucking up a nuclear reactor
you'll always go for (a), if your sane.

Misreading of the Article (5, Interesting)

Anonymous Coward | more than 6 years ago | (#23689461)

"Personally, I don't think letting devices on a critical control system accept data values from the business network is a good idea."
The article did not say that the data values were being read from the machine that was rebooted. It actually said that the rebooting triggered a problem in which values could not be read.

I wonder if they were using something like EPICS. I worked on a large experiment which used EPICS to control the system. Rebooting a machine would sometimes expose a problem with resources not being freed, eventually leading to a situation where data channels would read the 'INVALID/MISSING' value. The solution, as anyone who has worked on this sort of experiment will know, was to reboot more machines until the thing worked. ;-)

(I don't mean to complain about EPICS. It is very powerful and flexible... it's just that the version we used had these occasional hiccups.)

Re:Misreading of the Article (1)

SoapBox17 (1020345) | more than 6 years ago | (#23689919)

The article did not say that the data values were being read from the machine that was rebooted. It actually said that the rebooting triggered a problem in which values could not be read.
No, actually, the summary says "when the updated computer rebooted, it reset the data on the control system, causing safety systems to errantly interpret the lack of data as a drop in water reservoirs"... That doesn't really have much to do with the reboot itself (causing the computer to be unreachable or whatever) but that the data wasn't persistent. Completely different.

Re:Misreading of the Article (0)

Anonymous Coward | more than 6 years ago | (#23689977)

your all wrong, the synchronization softs was bugged so that when the target cpu (on the business network) was unreachable, the source channel reset itself.

Re:Misreading of the Article (1, Interesting)

Anonymous Coward | more than 6 years ago | (#23689927)

It actually said that the rebooting triggered a problem in which values could not be read.

I feel so fucking vindicated [slashdot.org] :

Long uptimes are a bad thing! How do you know a configuration change hasn't rendered one of your startup scripts ineffective? If you have to reboot for some unexpected reason, you could be stuck debugging unrelated problems at very inopportune moments.

You need to schedule regular reboots so that you can test that your servers can start up fine at a moment's notice. Long uptimes are a sign a sysadmin hasn't been doing his job.

You're right. While you're on the phone with hazmat explaining that you have a issue with green goo, how about i test the reboots of my PBX before you give your address?

yeah, I run mission critical systems. yes, i have proper redundancy and resiliency systems. Think I'm going to disrupt operations to test my reboots? Hell no. When it comes to public safety, 5 nines is the *only* option.

Looks like necrogram or somebody with his attitude is responsible for this.

Terminal Error (2, Interesting)

Anubis_Ascended (937960) | more than 6 years ago | (#23689465)

Reminds me of Terminal Error [yahoo.com] .

the slashdot crowd is dying to know... (4, Funny)

mathfeel (937008) | more than 6 years ago | (#23689467)

did it run Windows?

Re:the slashdot crowd is dying to know... (5, Funny)

Anonymous Coward | more than 6 years ago | (#23689857)

If it was running Windows the OS is at fault.
If it was running something else then the application was at fault.

Obligatory (0, Redundant)

Enderandrew (866215) | more than 6 years ago | (#23689479)

I for one welcome our new radioactive overlords.

Press hot grits to continue.

In Soviet Russia, reactor reboots you.

Yes, but does the reactor run Linux?

1) Break crucial system on reactor with update
2) Sell real update
3) Profit!

Re:Obligatory (4, Funny)

Kamokazi (1080091) | more than 6 years ago | (#23689583)

Don't forget about the now mutated sharks living in the coolant water growing frickin' laser beams on their heads.

Re:Obligatory (1)

turbidostato (878842) | more than 6 years ago | (#23689635)

"Don't forget about the now mutated sharks living in the coolant water growing frickin' laser beams on their heads."

Wow! Imagine a beowulf cluster of these!

Re:Obligatory (0)

Anonymous Coward | more than 6 years ago | (#23689781)

Everyone knows that coolant water only grows ill tempered sea bass.

Re:Obligatory (1)

Alex Belits (437) | more than 6 years ago | (#23689795)

No, but I have heard that frogs occasionally live there...

Redundancy! (-1, Redundant)

Drenaran (1073150) | more than 6 years ago | (#23689483)

Does this actually mean that every system that effects operations in the plant _doesn't_ have a duplicate system running identical software acting as a shadow/backup? This would seem like a very basic level of system protection to have in a Nuclear Power Plant... If they had maintained such a system they would of loaded the new update onto the backup servers (which would be identical in every possible way to the mains), the system would of "broken" as it did here, and they would be able to keep operating while they figured out the problem.

Also, before you make the argument "but what if the update is critical?" - it's a Nuclear Power Plant! If any sort of update can be classified as so very urgent they couldn't put it off a couple days then I'd say we have bigger problems.

big increases in your power bill! (3, Insightful)

Quadraginta (902985) | more than 6 years ago | (#23689549)

Think about the cost associated with having and maintaining a completely hot-pluggable second control system. How much do you want your power bills to go up to pay for that? And what would be the point?

They have a perfectly adequate safety system that did exactly what it's supposed to do. It read confusing data and decided to shut the reactor down until a human came along and explained things satisfactorily. What's wrong with that? Aside from having the reactor offline for 48 hours, there was no other cost.

EULA! (5, Funny)

bluephone (200451) | more than 6 years ago | (#23689519)

It says right in the EULA that it's not to be used in a nuclear power plant!

Re:EULA! (1)

ConceptJunkie (24823) | more than 6 years ago | (#23689603)

Yeah, but everyone knows those EULAs are unenforceable.

This was Good (3, Insightful)

snkline (542610) | more than 6 years ago | (#23689531)

While perhaps the system should be designed to behave differently, what happened here was a good thing. When things went wrong, rather than the reactor systems freaking out and doing random crap, they were properly designed to shift to a known safe state (i.e. Shut the hell down).

Re:This was Good -Bull (0)

Anonymous Coward | more than 6 years ago | (#23689903)

Obviously you have never worked around a Nuclear Power Reactor when it did a emergency shutdown.

The problem is the update - not business network (5, Interesting)

markdj (691222) | more than 6 years ago | (#23689537)

I write this type of software for a living so I know that having a computer on the business network connected to the control computers is a risk, bur that risk can be managed. The problem here is that the software update wiped out the nuclear control system data. This exposes two bad problems. First customers are always asking why they can't update their system while it is still running. We liken that to changing your tire while driving down the road. Secondly the software update did not respect the data in the nuclear control system and synchronized it to new initial data in the update on the other system! Not a good idea. In critical safety systems, you always practice an update before actually doing one.

Re:The problem is the update - not business networ (4, Funny)

dissy (172727) | more than 6 years ago | (#23689617)

First customers are always asking why they can't update their system while it is still running. We liken that to changing your tire while driving down the road.
Oh sure, NOW you think of a debian slogan ;}

Re:The problem is the update - not business networ (1)

Ungrounded Lightning (62228) | more than 6 years ago | (#23689761)

We liken that to changing your tire while driving down the road.

Oh sure, NOW you think of a debian slogan ;}


Good thing it wasn't written in Smalltalk. The slogan there is building the rest of the boat while underway.

Only the biz machine was updated. Why trouble? (5, Insightful)

Ungrounded Lightning (62228) | more than 6 years ago | (#23689725)

Secondly the software update did not respect the data in the nuclear control system and synchronized it to new initial data in the update on the other system! Not a good idea. In critical safety systems, you always practice an update before actually doing one.

I have no problem with a computer on the process control subnet reporting information to a computer on the business subnet.

I have a BIG problem with a computer on the business subnet being able to modify and corrupt data in a computer on the process control subnet.

"I can't dump data to the business side" is a reason to make a log entry and maybe sound a minor alarm. It's not a reason to shut down the reactor (unless the data is needed for regulatory compliance and the process control side isn't able to buffer it until the business side is working correctly.)

But if a business subnet computer can tamper with something as critical as a process control machine's idea of the level of coolant in a reservoir, it rings my "design flaw" alarms.

Is it ONLY able to reset it to "empty" as poorly-designed part of a communication restart sequence? Or could it also make the process control machine think the level was nominal when it WAS empty?

IMHO this should be examined more closely. It may have exposed a dangerous flaw in the software design.

Security flaws don't care if they're exercised by mischance or malice. If nothing else, this is a way to Dos a nuclear plant through a breakin on the business side of the net.

Where is the redundancy? (1, Redundant)

JSBiff (87824) | more than 6 years ago | (#23689771)

The thing I'm a bit puzzled about. . . if this system has data which is so important that the whole plant must be SHUT DOWN for two days if it fails, then why aren't there *at least* TWO of them (I'd say there's a good argument for 3 or 4, but. . .)? That way, you can take one out of the loop for updates, verify the update didn't hose your data, sync the data from the 'live' system, then put it online, take the other one offline, and complete the update on it.

If I were the power co owning this plant, I'd be ticked if the plant was dark for 2 days. With the price of energy these days, and the amount of energy a single Nuclear plant can generate, you're talking some real serious cash when the thing is down for 2 days. Especially if I have to look forward to the same thing happening again, potentially every time our systems need updating (not that it necessarily would happen every time, I would sure hope it wouldn't, but with only one system, every update is a potential for the whole plant to go down for some period of time).

Re:The problem is the update - not business networ (0)

Anonymous Coward | more than 6 years ago | (#23689849)

I'll admit I don't know the first thing about nuclear power plants, and even less about their control systems. With that in mind I would like to know what what great benefit is to be had by connecting these systems to the business network. Are these benefits worth the risk even if it is a manged risk?

"King-size Homer" season 7 episode 7, Nov 5, 1995 (4, Funny)

layer3switch (783864) | more than 6 years ago | (#23689575)

"... The move to SCADA systems boosts efficiency at utilities because it allows workers to operate equipment remotely."

Another proof that Homer Simpson was truly ahead of his time. [wikipedia.org]

Are you mad, woman? You never know when an old calendar might come in handy. Sure, it's not 1985 now, but who knows what tomorrow will bring? -Homer

Working as intended (2, Insightful)

BlueParrot (965239) | more than 6 years ago | (#23689581)

The chemical diagnostic data is damn important because it may determine things like corrosion rates and the amount of impurities circulating in the water, potentials for clogs etc... As with all other software, occasionally errors occur, and the appropriate way to respond when it does is to shutdown and blow some whistles as to ensure that the reactor is brought into a safe state before something else goes wrong. This is one of those cases where "Better safe than sorry" is a really rather good motto.

Maybe this was a case of... (0, Redundant)

arhhook (995275) | more than 6 years ago | (#23689591)

Patch Tuesday?

--
[Insert signature here]

well duh (1)

ILuvRamen (1026668) | more than 6 years ago | (#23689599)

I'm gonna have to agree with that last statement in the summary. Basically under these circumstances, you take out the switch and you take out the plant and I doubt they guard the network closet as well as the reactor core. Plus the whole hacking thing. You really don't need to watch youtube videos and check your e-mail from a control computer and you can bring any actually needed updates and files to it manually via USB drive.

Here's the real story... (1)

ConceptJunkie (24823) | more than 6 years ago | (#23689615)

The summary said: when a computer on the plant's business network was rebooted after an engineer installed a software update

We all know what really happened. Dude rebooted the computer so that Windows automatic update reminder to reboot wouldn't interrupt his Solitaire game every 10 minutes.

This is why... (3, Interesting)

rat7307 (218353) | more than 6 years ago | (#23689729)

This is why you keep the IT nerds away from the process network.

I've had a whole plant lose view of it's system because some well meaning retard in IT decided to push updates onto a SCADA system without qualifying the updates....... never had it KILL the control side of things though....well done whoever you were, you've done well.

Re:This is why... (1)

rat7307 (218353) | more than 6 years ago | (#23689765)

...although, after re-reading the story it's a little vauge...was he updating some random PC or was he actually updating the scada/process control software/firmware?

If it's the latter, I feel for him :-) , but you have to do your homework before going all patch crazy!

Huh? (1)

DerekLyons (302214) | more than 6 years ago | (#23689749)

From TFA

In June 1999, a steel gas pipeline ruptured near Bellingham, Wash., killing two children and an 18-year-old, and injuring eight others. A subsequent investigation found that a computer failure just prior to the accident locked out the central control room operating the pipeline, preventing technicians from relieving pressure in the pipeline.

Huh? I've read the NTSB report on that accident - and nowhere in it (IIRC) are computers implicated. The accident occurred due to damage to the pipes from construction equipment.
 
Rereading the report [ntsb.gov] [PDF file] pretty much confirms my recollection, the SCADA system was not implicated as a primary or contributory cause of the accident. The SCADA system was malfunctioning at the time of the accident, but did not cause the overpressure, and 'may' have allowed the operators to relieve pressure had it been functioning and had they observed the pressure spike. The rupture was caused by construction damage to the pipeline and a faulty relief valve.

perhaps they should have used java. (1)

goffster (1104287) | more than 6 years ago | (#23689815)

maybe ;-)

This was NOT a failure! (2, Insightful)

Anonymous Coward | more than 6 years ago | (#23689823)

Before there are too many retarded "OMG why was it on the business network!!!?LOL!??!" comments, I'll cover that right here:

It says the software is supposed to sync data between the control system and the business network. Obviously it has to be connected to both sides somehow. I'm not a power plant designer, but there's probably a good reason why people might need access to that data from the control system, and thus some kind of system acting as a safe bridge between the two rather than allowing unrestricted access from the business network.

The update f'd up and the control network went "Holy crap where did the cooling water go? Abort!" Everything worked like it was supposed to. The failure was caused by not testing the update in a lab environment before applying it to a live system.

Re:This was NOT a failure! (1)

datajack (17285) | more than 6 years ago | (#23689999)

Hmm .. not sure why the ops network would have to rely on such data sent from the business network. Monitoring of levels of important stuff is an ops function to my mind.

I'll admit that I'm too drunk to read TFA at the mo, so may have missed some detail :)

It kinda worked then... (3, Insightful)

dindi (78034) | more than 6 years ago | (#23689825)

At least it did not turn it into a meltdown, so at least the safety features worked in the software.

That is definitely a glass half full, as opposed to empty.

This was probably illegal (0)

Anonymous Coward | more than 6 years ago | (#23689837)

Every system in a nuclear power plant has to be completely backed up. There should be no single point of failure. In fact, many/most of the backup systems are backed up.

The reactor should be shut down until this design fault is rectified.

Btw, compare the cost of a redundant computer system with that of a spare coolant pump. This is a pretty cheap problem to fix.

just to shortcircuit the nuclear hysteria (4, Informative)

circletimessquare (444983) | more than 6 years ago | (#23689865)

most freakouts surrounding nuclear power are based on 1960s technology. modern reactor designs, such as pebble bed reactors [wikipedia.org] , are designed to be passively safe. that is, you can just walk away from them, doing nothing, and they will not release gas, go china syndrome, or anything else unsafe. older nuke tech requires active safety management: someone must always be on the job, making sure nothing f***s up. designing safety into nuclear reactor design from the philosophical ground up is the way of the future

Re:just to shortcircuit the nuclear hysteria (5, Insightful)

dbIII (701233) | more than 6 years ago | (#23690001)

While that may be true the first full scale prototypes of pebble bed are yet to go online - however construction of several in China is at an advanced stage. As Superphoenix showed with fast breeders you really need a full scale prototype to identify all of the problems (it was economic ones that killed fast breeders and not safety issues).

India's accelerated thorium idea is also very promising.

The major problem I see with US nuclear power is the assumption that it is a solved problem and almost zero has been spent on R&D for decades. The "new generation" of reactors from Westinghouse and others is little more than 1960's white elephants painted green.

Reboot on the business network? (1)

datajack (17285) | more than 6 years ago | (#23689969)

At the nuc site I worked at, there were two networks. The business network and the ops network. Data flowed from the ops network to the business network for statistics gathering only. The single thing that the business network did that affected operations and safety (regardless of my boss' attempt to justify budget) was the generation of work-orders. A total failure of the buisiness network would - at worst - result ina routine observation job to be missed which would cause the systems on the ops network to detect a 'fault' and bring the reactor away from criticality.

Yes, a simple software fault can 'shut down' a nuclear plant. These things are designed to 'trip' and shut-down automatically at the slightest thing going wrong. The most advanced and safest Nuc plant in the UK (SXB) does - or at least did - trip once a month or more.

Get a volt-meter that is sensitive to a thousandth of a volt, and allow it to shut down your house when it's input is not ideal. Give yourself three thousands of a volt either way off 'normal' and you are maybe experiencing the ridiculous measures a modern nuc plant puts itself under.

where was the operator (0)

Anonymous Coward | more than 6 years ago | (#23689973)

As a SCADA MMI guy, I just have to ask, where the hell was the intelligent human oversight? Luckly, it failed gracefully. From what I've see, this is just typical grossly negligent corporate effort. IMHO, these affluent corporations are way overpaided and far too irresponsible.

Fred? What's wrong with your keyboard? (1)

1310nm (687270) | more than 6 years ago | (#23689995)

^c^c^c

Scuse me, what? (0)

Anonymous Coward | more than 6 years ago | (#23690005)

Wait, so they let their computer networked systems overwrite what the hard wired transmitters were telling them? The refuelling water tank, emergency charging tank and condensate storage tank all have bog standard level transmitters on them which report the level of borated water inside.

How can a computer system that takes it's values from these systems suddenly overwrite those values? All the plant's control should be from those transmitters, not from the reported data which goes via countless computers...

This sounds a little fishy to me (not that I wokr on a nuclear plant or anything...)...
Load More Comments
Slashdot Login

Need an Account?

Forgot your password?

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>