Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Why Computers Suck At Math

Soulskill posted more than 4 years ago | from the must-be-lit-majors dept.

Math 626

antdude writes "This TechRadar article explains why computers suck at math, and how simple calculations can be a matter of life and death, like in the case of a Patriot defense system failing to take down a Scud missile attack: 'The calculation of where to look for confirmation of an incoming missile requires knowledge of the system time, which is stored as the number of 0.1-second ticks since the system was started up. Unfortunately, 0.1 seconds cannot be expressed accurately as a binary number, so when it's shoehorned into a 24-bit register — as used in the Patriot system — it's out by a tiny amount. But all these tiny amounts add up. At the time of the missile attack, the system had been running for about 100 hours, or 3,600,000 ticks to be more specific. Multiplying this count by the tiny error led to a total error of 0.3433 seconds, during which time the Scud missile would cover 687m. The radar looked in the wrong place to receive a confirmation and saw no target. Accordingly no missile was launched to intercept the incoming Scud — and 28 people paid with their lives.'"

cancel ×

626 comments

Sorry! There are no comments related to the filter you selected.

Poor QA (5, Insightful)

slifox (605302) | more than 4 years ago | (#29933547)

It's pretty pathetic and negligent that software that controls explosive missles was not tested for over 100 hours of operation. That's a standard Quality Assurance procedure for even the simplest low-budget hardware...

It's also pretty pathetic that the system designers implemented a broken design and did not foresee this problem. High-resolution timekeeping has been accomplished pretty successfully already...

I wonder how much time and money was spent in research and development for this thing
It doesn't seem like we're getting a quality product for the likely huge sum that was paid for it...

Re:Poor QA (5, Informative)

Anonymous Coward | more than 4 years ago | (#29933561)

Re:Poor QA (5, Insightful)

OeLeWaPpErKe (412765) | more than 4 years ago | (#29933861)

Mod parent up ! This idiotic article blames computers for programmers using numerical approximation algorithms illadvisedly.

which is stored as the number of 0.1-second ticks since the system was started up. Unfortunately, 0.1 seconds cannot be expressed accurately as a binary number, so when it's shoehorned into a 24-bit register — as used in the Patriot system — it's out by a tiny amount. But all these tiny amounts add up. At the time of the missile attack, the system had been running for about 100 hours, or 3,600,000 ticks to be more specific. Multiplying this count by the tiny error led to a total error of 0.3433 seconds, during which time the Scud missile would cover 687m. The radar looked in the wrong place to receive a confirmation and saw no target. Accordingly no missile was launched to intercept the incoming Scud — and 28 people paid with their lives.'"

So in a system that should have clocks synchronized to less than a microsecond nobody bothered to run "ntpdate" even once in hundred days ? And surely the military has better clock synch than a stupid home pc ? This is stupidity, also known as "human error", causing those deaths. It's a case of "the correct answer to the wrong question".

What is always brought up as a "computer problem" is the crash in Paris of a jet due to infighting between the human pilot and the autopilot. Of course, there the ultimate mistake was the pilot's : he had forgotten to turn off the autopilot to land. It was set for cruising altitude (3km), and the pilot was trying to land. This resulted in ever more desperate attempts by the autopilot to get the plane to gain height, which eventually resulted in a total loss of lift for the plane, which naturally resulted in the plane hitting the ground nose-down and a big fireball. The computer did exactly as instructed, it's just that the pilot's (unintentionally given) instructions were stupid, and the fact that it took the pilot over 3 minutes to realize just how stupid he had been.

Re:Poor QA (0, Troll)

Anonymous Coward | more than 4 years ago | (#29933615)

I'm sure had *you* been on the team, this never would have happened eh? All the other problems associated with making a missile do what no other missile had done before, what many, many people said could not be done, you would have solved all those problems too eh?

What's pathetic are Monday Morning Quarterbacks who get winded just getting up for a beer.

Re:Poor QA (5, Informative)

Anonymous Coward | more than 4 years ago | (#29933625)

This particular story took place in 1991, and most of the code for Patriot was written in the 70s - needless to say, software QA was a little more lax back then. The fix for this problem was out a couple days after the incident.

Re:Poor QA (-1, Troll)

Anonymous Coward | more than 4 years ago | (#29933813)

Thats the biggest load of bull shit I've ever heard of. Go back to sleep.

Re:Poor QA (4, Insightful)

dbIII (701233) | more than 4 years ago | (#29933847)

Oh really? The problem with these systems is that they have never worked in anything other than rigged tests and are just silicon snake oil.
I remember having this same discussion where there was a story here about some sort of Israeli space lasers that could apparently even shoot down artillery shells. Only a few months after that a very large number of thirty year old rockets dumped at discount price by Iran for being obsolete came flying over the border from Lebanon. Since then a lot of even slower rockets came out of Gaza. The success rate of this amazing new space toy matches that of the Patriot - zero.

Re:Poor QA (0)

Anonymous Coward | more than 4 years ago | (#29933877)

success rate of this amazing new space toy matches that of the Patriot - zero.

You're claiming the Patriot's success rate is zero, really? Were you alive during Desert Shield? Take a minute to learn what the fuck you are talking about [google.com] .

"User error"? (4, Informative)

wisebabo (638845) | more than 4 years ago | (#29933635)

I actually read about this specific incidence once; I seem to remember (though honestly not sure) that the design flaw was known and the user manual indicated that the computer needed to be reset every 36 hours. However, in wartime, under attack (there were frequent Scud intercepts), the crew controlling the missile battery opted against shutting it down if even for short time. Maybe even though the manual said it SHOULD be rebooted it did not explain WHY or what the consequences would be.

Re:"User error"? (5, Insightful)

betterunixthanunix (980855) | more than 4 years ago | (#29933673)

So they designed a system that accumulated rounding errors over time, and their solution was to ask the system's users to reboot the system every so often? Somehow, that does not add to my sympathy for these programmers...

Re:"User error"? (1, Troll)

Hurricane78 (562437) | more than 4 years ago | (#29933917)

Yep. That was a epic fail.

The rule is: If a user *can* do something wrong, he *will*!

How can they not know that?

Re:"User error"? (1)

dbIII (701233) | more than 4 years ago | (#29933883)

there were frequent Scud intercepts

Hang on, do you have a link to a documented case anywhere of a successful intercept in the field by such a system?

Re:"User error"? (4, Insightful)

Joce640k (829181) | more than 4 years ago | (#29933935)

I'm calling "Horsepoo" on the whole story.

a) If they knew enough about it to put "reboot every 36 hours" in the manual they knew enough to fix it.

b) According to the summary, 36 hours would still be a complete miss (a third of 687 meters is still 229)

c) A fixed point integer (32 bits) can mark tenths of seconds with complete accuracy for over 13 years.

d) Leaving aside a,b and c, the story still doesn't make any sense. The system would start the calculation the moment it saw the missile, not 100 hours before it appeared on the radar.

Now ... at the speed of a scud missile (mach 5 if google serves me), it may be that an accuracy of 1/10th second isn't enough to compute the trajectory accurately enough to intercept it. At that speed you might need 10,000th second resolution or whatever. *That* would be believable (but unlikely - the designers would have to be complete idiots).

The rest of the article? Yawn. It's the same old recycled story we've been seeing since the 1970s (those of us who are old enough).

Re:Poor QA (2, Informative)

betterunixthanunix (980855) | more than 4 years ago | (#29933655)

I want to know who programmed a system that allowed floating point errors to accumulate over time in a critical calculation. I hope they did not receive a degree in computer science, or that if they did, it was not from my alma mater.

Seriously, what programmer has not heard of floating point errors? That has to be one of the most common phrases I have ever heard in relation to programming; even the EEs and MEs I have met are familiar with the concept.

Re:Poor QA (3, Insightful)

Rising Ape (1620461) | more than 4 years ago | (#29933755)

Seriously, what programmer has not heard of floating point errors?

I had a similar issue with some code of mine for physics analysis. While I had heard of floating point errors, they're a lot more subtle than it first appears, and I ended up falling victim to one. Fortunately I discovered it before it actually let to any serious problems, it just resulted in wasted time.

Not everyone with a need for programming has a CS background and enough experience to be aware of all the potential problems. You'd hope that someone working on a missile system would have though.

Re:Poor QA (4, Informative)

TheRaven64 (641858) | more than 4 years ago | (#29933891)

Everybody knows that they exist, fewer people know how to avoid them. Lots of early multimedia frameworks, for example, were written using floating point timestamps and developed this exact problem (add some fraction repeatedly for each audio and each video frame, and after an hour the two tracks are noticeably out of sync). Now, they use a numerator-and-denominator form which is simple to add without rounding errors and so you only get them when you convert to floating point for comparison.

Even fewer people realise how compiler and hardware dependent they can be. For example, if you do a sequence of floating point operations on x86 then the values will stay in 80-bit registers until they are stored out to a variable. If you compile the same code for a newer machine with SSE or for another architecture then you will get 32-bit operations on your 32-bit floats and so you'll have less precision. A lot of compilers will even generate different precision between debug and release builds.

Re:Poor QA (4, Informative)

commodore64_love (1445365) | more than 4 years ago | (#29933665)

>>>It's also pretty pathetic that the system designers implemented a broken design and did not foresee this problem. High-resolution timekeeping has been accomplished pretty successfully already...

I sorry.

j/k.

We had a similar problem with an Aegis design, and it was a major headache for us Hardware engineers to try to convince the Systems Engineers that counting in Binary time was more logical than counting in 0.1 second increments. The SEs kept insisting that their computers at home accurately count in seconds and we hardware engineers should be able too. The HE manager and the SE manager were butting heads for about a month over this issue, until finally an upper-level manager handed-down a decision in favor of the HE manager and binary-based counting/requirements documentation.

I guess in the Patriot situation, the decision went in the opposite direction. Hence errors we introduced.

Re:Poor QA (4, Funny)

Stele (9443) | more than 4 years ago | (#29933721)

I guess in the Patriot situation, the decision went in the opposite direction. Hence errors we introduced.

Ah, so you're the one responsible!

Re:Poor QA (5, Interesting)

Anonymous Coward | more than 4 years ago | (#29933753)

Hindsight is almost 20/20. Except that the original purpose of the Patriot was to shoot down much slower aircraft, flying parallel to the earth, not ballistic missles. This new use for Patriot was essentially experimental and had had been rushed to war - and in war you run into alot of unexpexcted circumstances. For example, conventional doctrine in the 1980's required Patriots to move constantly on the battlefield to avoid air attack. The clock would then reset when repositioned. No one expected a Patriot in air defense mode to stay stationary for 10 hours let alone 100. But in a missle defense role they did. There is a good GAO report on this.

Curse of binary floating point (5, Insightful)

Carewolf (581105) | more than 4 years ago | (#29933553)

Use decimal floating point or simple swich to fixed point. Fixed point not used as often as it should, and many developers don't know how difficult ordinary floiting point really is.

Re:Curse of binary floating point (2, Insightful)

RichardJenkins (1362463) | more than 4 years ago | (#29933607)

Indeed, this seems more like naive design decisions than computers sucking at math.

Alternate Headlines (1)

bipbop (1144919) | more than 4 years ago | (#29933885)

Slashdot: Why Programmers Suck at Math

Okay, that'd be misleading too--I suppose it'd be more accurate to write "How a few incompetent programmers who built a weapon got people killed because they suck at math". Not very headliney? Okay, how about "Military Moron Makes Murderous Machine, But Beginner's Bug Betrays Billions"? I rounded up from 28 to billions, so it should still be inaccurate enough for Slashdot. As a bonus, you still can't tell what the hell the article is about from the headline :-)

Re:Curse of binary floating point (0, Troll)

stonefoz (901011) | more than 4 years ago | (#29933639)

Fixed point? What would that accomplish? Rounding creep happens, irregardless of data type, every time it rounds the last digit, fixed or float.
Fixed point is never a good idea, bad idea or not, it does speed up things on limited hardware. A missile isn't "budget" though.
Yes floats are difficult, every operation moves it farther from original guess, it's just guessing the last digit. Only solutions are to not do fractional math at all, or to reload and adjust values periodically. Time keeping however is a subject that been already well researched. Any embedded platform I've seen has at least a dozen app-notes and a dozen different ways to keep accurate time.

Re:Curse of binary floating point (-1)

Anonymous Coward | more than 4 years ago | (#29933719)

Wrong Wrong wrong. Also irregardless in not a word.
ACK!~

Re:Curse of binary floating point (5, Informative)

Carewolf (581105) | more than 4 years ago | (#29933725)

Fixed point never rounds when operating in the range and precision for which it is designed. In this case they needed a precision of .1, using INT/10 would be 100% accurate and never give them any rounding errors for this use case.

So, in other words: You are wrong, and should probably considering using fixed point more.

Computers don't suck at math, some programmers do (1)

YA_Python_dev (885173) | more than 4 years ago | (#29933641)

The problem is the programmer, they should simply have maintained a count of the ticks in an integer and then multiplied it by 0.1 when necessary. Even better, use a proper data type, not a suckish 24-bit float in a freaking weapon, unless they understand very well what are they doing.

Python 2.6.2 (release26-maint, Apr 19 2009, 01:56:41) [GCC 4.3.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from decimal import Decimal, getcontext
>>> n = 0
>>> tick = Decimal('0.1')
>>> for i in range(3600000): n += tick
...
>>> n
Decimal('360000.0')
>>> Decimal(1) / Decimal(7)
Decimal('0.1428571428571428571428571429')
>>> getcontext().prec = 50
>>> Decimal(1) / Decimal(7)
Decimal('0.14285714285714285714285714285714285714285714285714')

And, yes, I know that Decimal in Python 2.6/3.1 is slow. Will be faster in 2.7/3.2. And there are similar libraries in Java and other languages.

Re:Curse of binary floating point (0)

Anonymous Coward | more than 4 years ago | (#29933677)

How does fixed point help you on this? The only reason to use fixed point is to speed up calculations on slow embedded systems that are making lots of realtime calculations and using floating point calculations would be too slow. It definitely does not get you out of rounding errors, in fact in many cases it would be far worse than doing floating point in terms of accuracy.

Re:Curse of binary floating point (3, Informative)

Carewolf (581105) | more than 4 years ago | (#29933763)

With fixed point you can choice the basis of the fraction part. A binary fixed point would not help them, but a decimal fixed point of /10 or /100 would. The algrebra of fixed point is the same no matter what base you choice. This means it is fastest way to get decimal based fraction instead of binary fractions (decimal floating point is best with hardware support).

Re:Curse of binary floating point (1)

Beale (676138) | more than 4 years ago | (#29933929)

Using this case as an example: If you use an integer variable for your tick, it's never rounded. Whenever you use it to calculate a time, then you can multiply it by 0.1 to get a much more accurate number than one obtained with the cumulative error of adding on a rounded floating point 0.1 to a rounded floating point sum every tick.

Re:Curse of binary floating point (1)

NeoStrider_BZK (1485751) | more than 4 years ago | (#29933759)

Dificult and imprecise. Lots of developers have bugs in their code that they dont even imagine its due to floating point errors. Im mantaining a ARM codebase with lots of floats and now I can see this into action (beforehand I mantained mostly fixed-point apps).

There are lots simple acts with floats that can improve accurancy that most people are unaware of and that could have saved lives.
(ok, same goes to fixed-point)

Re:Curse of binary floating point (4, Insightful)

noidentity (188756) | more than 4 years ago | (#29933777)

Unfortunately, 0.1 seconds cannot be expressed accurately as a binary number, so when it's shoehorned into a 24-bit register -- as used in the Patriot system -- it's out by a tiny amount.

Sorry, 0.1 seconds can be represented EXACTLY in such a system. It doesn't even need floating-point. Here is how such a system could represent the durations of 0.1 seconds, 25.7 seconds, and 123.4 seconds: 1, 257, and 1234. So like you say, fixed-point works here. No need for anything beyond integers in this case.

Re:Curse of binary floating point (4, Informative)

PhilHibbs (4537) | more than 4 years ago | (#29933889)

Well, in this specific instance a decimal system would have been ok, but it isn't a general answer. The general answer is "make sure your increments are divisible into your number base", if they had used 1/8th or 1/16ths of a second, or even 3/32 of a second, as their timer increment then they would not have had this problem. There's no reason why 1/10th of a second has any magic properties.

In general terms, all number bases have other number bases with which they are incompatible. The inability of binary to represent 1/10 accurately is just the same as the inability of decimal to represent 1/3 accurately. It's only because we use decimal all the time that we overlook decimal's shortcomings (or instinctively compensate for or avoid them) and then blame computers for binary's incompatibility with decimal.

Computers can do math (0)

Anonymous Coward | more than 4 years ago | (#29933557)

Mathematica. 'nuff said.

the IEEE specification stuff is literally designed knowning there will be calculation errors. Don't use this" create your own number system like mathematica does for 100% accuracy always.

Why bad programmers suck at math (0)

Anonymous Coward | more than 4 years ago | (#29933583)

So, in other words, the programmers for this piece of "mission-critical" software were not aware of floating point arithmetic and error propagation? What does that have to do with "computers" in general?

Fixed point numbers? (5, Insightful)

Big_Mamma (663104) | more than 4 years ago | (#29933591)

Use fixed point numbers? You know, in financial apps, you never store things as floating points, use cents or 1/1000th dollars instead!

Computers don't suck at math, those programmers do. You can get any precision mathematics on even 8 bit processors, most of the time compilers will figure out everything for you just fine. If you really have to use 24 bits counters with 0.1s precision, you *know* that your timer will wrap around every 466 hours, just issue a warning to reboot every 10 days or auto reboot when it overflows.

Re:Fixed point numbers? (2, Insightful)

DarkOx (621550) | more than 4 years ago | (#29933621)

yea because the missile counter measures failed to fire because the system was doing its scheduled reboot is so much better than the missile counter measures failed to fire because of timer precision

Re:Fixed point numbers? (2, Interesting)

dr_wheel (671305) | more than 4 years ago | (#29933685)

yea because the missile counter measures failed to fire because the system was doing its scheduled reboot is so much better than the missile counter measures failed to fire because of timer precision

The OP's suggestion for scheduled reboots could be solved by having redundant systems, no? System X comes up at 0 hour mark, System Y comes up at 233 hour mark. System X switches to System Y and reboots at 466 hour mark; System Y only has 233 hours uptime.

They do exactly that! (2, Interesting)

gbutler69 (910166) | more than 4 years ago | (#29933835)

Each battery has overlapping coverage with its nearest neighbors. A proper deployment has overallping fields of fire in both depth and breadth. Surface-to-Air missile defense involves multiple layers of different systems, each specializing in different ranges: Short Range - things like stingers, Medium Range - things like HAWK, Long-Range - things like Patriot. A proper tactical deployment never relies upon a single battery to provide the sole coverage. The problem here was primarily on of tactical deployment. The technical issues can be argued, but, the real failure was a failure to deploy in tactically correct fashion. They sent a battery or two as a "Show of Force", probably overriding the tactical expertise of the officers involved for political expediency. You have jack-asses like Rumsfeld and Cheney (and their ilk) making military tactical decisions when they are not qualified to do so. The REAL failure here is one of politics.

Re:Fixed point numbers? (1)

hitmark (640295) | more than 4 years ago | (#29933729)

how about this then, have two systems, set up so that for a amount of time each 36 hours they process the tasks in parallel, so that the most recently rebooted can take over control while the other reboots.

Re:Fixed point numbers? (1)

RobertLTux (260313) | more than 4 years ago | (#29933785)

There is the concept of "Fail safe" if the system is down (for reset reloading or whatever) then the folks up in C&C know that its down and can do other things but if its "working perfectly" but is in fact not then the folks in C&C don't know this (and land up going boom).

besides a properly built EMBEDDED MILITARY GRADE system should not take more than a couple minutes to "reboot" so you have a couple F16s in the air patrolling to watch for incoming "stuff"

Old news on an ancient design (1, Informative)

Anonymous Coward | more than 4 years ago | (#29933593)

1) This problem was covered in Risks Digest years ago.
2) Design and production phase was completed in 1980.

http://catless.ncl.ac.uk/Risks/10.82.html#subj1

is a good start for "Why the hell are we using this weapons system the way we are?"

As memory serves the fix is to restart the system perodically.
As memory also serves that's been part of the operating procedure for a very long time.

Why the author sucks at math... (1)

allcaps (1617499) | more than 4 years ago | (#29933595)

Shouldn't we focus on the fact that without computers, even MORE people would die? This article seems to make the conjecture that somehow these instruments are worthless, but it appears the writer of it sucks at math as well.

# ppl who would die without computers -MINUS- # ppl who die with computers = # of lives SAVED by computers.

That second # isn't bad, it was already there before computers came along!

Re:Why the author sucks at math... (1)

betterunixthanunix (980855) | more than 4 years ago | (#29933629)

Except that people tend to rely on computers, and take risks they would not have otherwise taken. I am not saying that the number of deaths resulting from computer errors is going to be higher than other deaths, but that it is not as simple as "every death caused by a computer error is a death that would have happened before computers." If you knew your enemy was launching missiles at you, and you had no missile defense, what would you do to protect yourself? What would you do if you did have missile defense?

Stupid article, too (5, Insightful)

hellfire (86129) | more than 4 years ago | (#29933599)

Translation: computers are only as smart as the people programming them... and there's plenty of stupid people out there.

We knew this. This is no great revelation. So why is this news?

Re:Stupid article, too (0)

Anonymous Coward | more than 4 years ago | (#29933647)

It's news because this particular brand of stupid got some people killed.

The obvious solution is to stop bickering over dirt and oil and all the other silly shit we do. But that will never happen. Because of another particular brand of stupid.

Welcome to the future!

Re:Stupid article, too (0)

Anonymous Coward | more than 4 years ago | (#29933757)

because "interval arithmetic" as a part of math still needs to be discovered by some - after 40 odd years of its existence...

tech failure vs people failure (1)

doug141 (863552) | more than 4 years ago | (#29933857)

It was written up as a tech failure (and not a people failure) because newsmen who call their sources stupid lose their sources. As others have pointed out, the answer to your question of why this is news is because of the system failure resulting is death.

Re:Stupid article, too (1)

h00manist (800926) | more than 4 years ago | (#29933865)

I had the same reaction - stupid news. Old technical problem with old solutions, but someone thought they had a "catchy headline". And conclusion. It allows for radicalizing - involves missiles, war, deaths, a lot of money, national pride, terrorism, religion, politics, corruption... sex lies and videotape. No big news, all the same, some engineers screw up, some people do wars and weapons, and some people die. If the same error was in a videogame, or in an Intel CPU [wikipedia.org] or Excel spreadsheet calculation error [microsoft.com] , it would be boring. A thousand monkeys typing, last anyone checked, will not produce any decent code. Or politics.

What?! (5, Insightful)

jointm1k (591234) | more than 4 years ago | (#29933601)

of 0.1-second ticks since the system was started up. Unfortunately, 0.1 seconds cannot be expressed accurately as a binary number, so when it's shoehorned into a 24-bit register

All they had to do is use integers, where a value of 1 represents 0.1 s.

Re:What?! (1)

slonik (108174) | more than 4 years ago | (#29933681)

Mod the parent up! It is the best common sense solution every software engineer must know.

Re:What?! (1)

91degrees (207121) | more than 4 years ago | (#29933799)

Except it's never that simple. Other components rely on the data being in seconds. There's all sorts of hardcoded values in seconds. The specification states 0.1 seconds so it's impossible to change it to something more convenient without restarting at the specification stage. You'll end up looking a the code and asking "who the hell wrote this!?"

Re:What?! (1)

UltraAyla (828879) | more than 4 years ago | (#29933913)

GP was saying they should have done this to start with, not that they should go back and do this. But this software should go back anyway - it's broken.

Re:What?! (1)

TheRaven64 (641858) | more than 4 years ago | (#29933931)

Not a problem. You keep the tick in tenths of a second. You convert to seconds by dividing by ten in a floating point value when it's needed. This introduces a small rounding error, and the next time you do it then it will also introduce a small rounding error. These errors, however, are independent of each other. You can then document the error range of the second counter.

retrospective technological excuses (0)

Anonymous Coward | more than 4 years ago | (#29933605)

'The calculation of where to look for confirmation of an incoming missile requires knowledge of the system time, which is stored as the number of 0.1-second ticks since the system was started up. Unfortunately, 0.1 seconds [techradar.com] cannot be expressed accurately as a binary number, so when it's shoehorned into a 24-bit register -- as used in the Patriot system -- it's out by a tiny amount.

But all these tiny amounts add up. At the time of the missile attack, the system had been running for about 100 hours, or 3,600,000 ticks to be more specific. Multiplying this count by the tiny error led to a total error of 0.3433 seconds, during which time the Scud missile would cover 687m
'

Nonsense, it's perfectly possible to design a computer that can accurately tell the time. What caused Patriot to fail was that over an expended period, the clocks went out of sync, between the various dispersed sub-systems. As Patriot wasn't designed to be switched on for so long.

Regardless, what isn't possible is is to design a system that can accurately track and shoot down missiles in flight. As the Patriot defence system so patently demonstrated. As I recall, it succeeded less than 50 % of the time. Which begs the veracity of the starwars SDI project. Just another excuse to spend billions on the defence budget.

Re:retrospective technological excuses (5, Insightful)

david duncan scott (206421) | more than 4 years ago | (#29933749)

Regardless, what isn't possible is is to design a system that can accurately track and shoot down missiles in flight. As the Patriot defence system so patently demonstrated.

You're right. Just as the failure of Samuel Langley's aircraft demonstrated that man would never fly, the failure of an anti-aircraft missile to destroy only half of the ballistic missiles (targets moving at what, twice the speed of the targets it was designed to destroy?) demonstrates that ABM's will never work.

Didn't read TFA but... (0)

Anonymous Coward | more than 4 years ago | (#29933609)

Found this article [nytimes.com] matching the criteria only dated February 28, 1991.

It just didn't seem plausible did it... How this correlates to modern computer FP calculations is beyond me.

Re:Didn't read TFA but... (3, Interesting)

peragrin (659227) | more than 4 years ago | (#29933707)

because military computers are 20 years out of date to start with. Heck even the awesome modern land warrior hardware, is 10 years out of tech date. Heck they could probably shave 5 pounds off of the hardware by using modern chips, and displays.

Military Spec is only good at rugged. up to date with the best is far behind.

Re:Didn't read TFA but... (1)

Wonko the Sane (25252) | more than 4 years ago | (#29933875)

Military Spec is only good at rugged. up to date with the best is far behind.

In the year 2000 the US Navy still had a submarine that used a vacuum tube based system to monitor and control the nuclear reactor.

Sorry, but no. (1)

golodh (893453) | more than 4 years ago | (#29933925)

The "advancedness" of military hardware has absolutely nothing to do with the problems sketched in the article.

As long as the hardware has basic floating point support it's possible to design software that will get the right answer, and usually fast enough. It's all down to the software.

Practical Analysis (2, Informative)

mseeger (40923) | more than 4 years ago | (#29933613)

The problem seems to be right out of the textbook for "Practical Analysis" (not sure if this is the correct translation for the german "Praktische Analysis"). This was a nandatory course for every computer science degree during my university time (20 years ago). Don't know if this is still the case. It was an eye opener to see how correct formulas and a perfectly working computer could yield absurd results. Several times i was asked for help by people claiming their Excel was broken due to such mistakes.

CU, Martin

Because they are programmed by morons (0)

Anonymous Coward | more than 4 years ago | (#29933623)

The computer is not at fault here. The problem is the moron who thought floating point representation is a good choice for a fixed point value.

These problems are just too common. Like some game company discovered weird display in their game. Found out that floating point numbers are not very precise when far away from 0, like in a huge seamless world.

That is the programmer sucking (1)

BoneFlower (107640) | more than 4 years ago | (#29933627)

Any first year compsci student should know that this happens, and should know to choose data types that can represent the data to the needed degree of accuracy.

A simple struct {int integral_part, int decimal_part}; would do the job for this. Or since you care exactly about .1 second increments, you could even use integral values in the first place. With 24 bits, you can cover 19 days before it overflows, and almost half a day on top of that to provide a buffer if bad guys show up right as the scheduled reset comes up.

100 hours = 3,600,000 ticks? Wait, summary math is wrong. One hour = 60 minutes. Each of those 60 minutes is 60 seconds. 60 sets of 60 seconds is 60 * 60 = 3,600 seconds per hour. 100 hours means 100*3,600 = 360,000. Either they missed a digit and the system was online for 1,000 hours straight or they added one to the final result.

Re:That is the programmer sucking (0)

Anonymous Coward | more than 4 years ago | (#29933695)

multiply that number by ten as there are 10 ticks per second.

Re:That is the programmer sucking (1)

kmsigel (306018) | more than 4 years ago | (#29933697)

There are 10 ticks per second.

Re:That is the programmer sucking (1)

p_millipede (714918) | more than 4 years ago | (#29933705)

3,600,000 ticks, not seconds. You forgot to times by 10 ticks per second.

Re:That is the programmer sucking (0)

Anonymous Coward | more than 4 years ago | (#29933769)

Forgot to "times by"??? Are you six years old?

Re:That is the programmer sucking (1)

T-Bone-T (1048702) | more than 4 years ago | (#29933735)

The summary math isn't wrong. Try reading the part about ticks per second again.

Worst article ever. (0)

Anonymous Coward | more than 4 years ago | (#29933651)

Talk about misleading headline. All was at the programmers' fault. The computer did no "bad math".

Stupid humanist journalists should not be writing technical articles.

Computers are great... when used correctly. (1)

thesandbender (911391) | more than 4 years ago | (#29933657)

The author seems to imply that computers can't do simple base 10 math without errors. That's not entirely true if you have a fixed precision. You use an integer and shift it so there is no decimal portion, in this case you would make your base a 1/10th of second instead of 1 second. Addition, subtraction and multiplication will be error free. You'll still have a problem with division and other operations but in this case that doesn't sound like their primary issue. It wasn't the computer's fault that the designers did not account for the fact that 2.0/2.0 != 1 on almost all FPU's today. It usually just equals a really good approximation of 1 that's "close enough".

Re:Computers are great... when used correctly. (2, Insightful)

noidentity (188756) | more than 4 years ago | (#29933837)

2.0/2.0 != 1 on almost all FPU's today.

Say what? Citations please. Me thinks one of those 2.0 values isn't really 2.0. Hint: printing a value isn't a good way to get its actual value, because the printing function most likely rounds it to fewer digits than it's actually stored as.

Re:Computers are great... when used correctly. (0)

Anonymous Coward | more than 4 years ago | (#29933921)

Well, that is not the best example. With binary FP formats 2.0 is 1.0B+01 so it has a perfectly accurate representation, and 2.0/2.0 will be done as (1.0/1.0)B(1-1) and so will be a perfect 1.0 (1.0B+00). A better example would be 1.0/10.0.

don't blame the computer for bad programming (5, Insightful)

frovingslosh (582462) | more than 4 years ago | (#29933679)

It is absurd to blame the computer (or worse, all computers) for what is bad programming. Computers can store a 1/10 of a second perfectly accurately, as long as it is stored in a variable that counts tenths of seconds rather than seconds. It can easily be stored as an integer that way, avoiding any floating point rounding errors.

There certainly are cases of bad math in computers, particularly Intel computers. But this isn't such an example. This is just a lazy and stupid programmer who didn't understand what he was really doing who should take the blame for the failure that killed people, not the computer.

This problem has been solved since the 1960s (3, Informative)

tjstork (137384) | more than 4 years ago | (#29933689)

I remember this from a numerical methods class in the 1980s. To deal with situations like this, you can do one of three things :

a) Have a function that you sample as a function of t, so you don't get accumulated error.
b) Have enough bits so that error won't be an issue. This is actually hard to do because floating point errors do stack up pretty quick if you are not careful.
c) Or, you can have an error term which you can use to make adjustments along the way to account for a lack of precision. Bresenham's line does that more or less exactly when he does his lines. That's why you had "stair stepping" as the algorithm corrected itself along the way.

If the OP was correct, then PATRIOT failed because it did none of them. My bet is in reality, they simply underestimated the actual error term, but did everything else correct. This could be because of discrepancies in flight control instrumentation or some sensor, or, they were simply trying to save money on bits and didn't really do the calculation as to how far the missile could be off in an error term length seconds of flight at a particular phase in its flight profile.

Bottom line is, the engineering discipline exists to solve this problem and is really no different than error handling in any guidance system. Putting a man on the moon, launching an ICBM at target, shooting down a missile, are all essentially the same computer science problem from an error management perspective. The Phd's already nailed this decades ago. There's not a fundamental limitation to computing, in this case, merely, a failure or inability of engineers on this project to apply the correct known answer to this problem.

Re:This problem has been solved since the 1960s (0)

Anonymous Coward | more than 4 years ago | (#29933761)

No, it's not because of some bug, or some software guy being stupid. It's because the procurement specification said something like "shall operate for 12 hours". So, among other countless design decisions, there was one that said, "let's do this, because it meets the requirement, and it would take significantly more effort to do better AND TEST THAT IT WORKS". Remember, if your design is perfect to say, 100 hours, then you need to test to 100 hours. If your 100 hour design is only tested for 12 hours, then it's really no better than the original 12 hour design. In today's dollars, that extra 80 hours of test is going to be $10-20k, just for one person to sit there. Now multiply that by 100s of little decisions of the same scope and magnitude. This is how you get massive overruns and missed schedules.

So it's really a requirements issue. And, of course, the fact that the Patriot is designed to shoot down AIRPLANES not missiles. It was being used way out of it's design space in the first place.

Time really is money, and on every product, like or not, there is an explicit budget for both dollars and calendar time that you have to meet. Would you rather have your aircraft defense system need a reboot every 12 hours OR would you like your defense system delivered 2 years later, while the programmers put in all sorts of enhancements to make it just a little bit better.

It's a weapon. (1)

tjstork (137384) | more than 4 years ago | (#29933831)

I don't disagree with what you wrote. One thing is, though, that requirements are very fluid and you have to ask if perhaps the problem is that 10 hours and reboot is a ridiculous requirement from the get go. Soldiers aren't going to sit in a middle of a war zone and turn off the shields.

Arguably, when specing out systems like this, the solution is probably not to build them because they are really too complex to test for battlefield conditions. But that's crazy. So.. what was the outcome? You put a system out there, make it is as good as you can, and the outcome, in this case, was that the system did intercept some missiles, did save some lives and did pioneer missile interception in a war.

28 people died because the system isn't perfect, to be true, but how many people lived because the system worked at all?

Terrible programming (0)

Anonymous Coward | more than 4 years ago | (#29933691)

That is just an example of a terrible programmer(s)...if you ever programmed in assembly before floating point processors, especially on 8 bit machines, you'd be very comfortable extending your number of bits using fixed point math. Its work, but not hard...terrible people died because of a lazy or uneducated programming team.

Designed by who? (1)

miffo.swe (547642) | more than 4 years ago | (#29933699)

Why on earth didnt they have a clock source other than the standard one? There are numerous sources of correct time like GPS, radio, NTP, clock servers, atom clocks or add in cards. The worst possible clock source is a standard PC. This system was probably faulty by design since the simple clock hardware in a normal server isnt made for keeping exact time.

Are there any symbolic math Libraries? (1)

wisebabo (638845) | more than 4 years ago | (#29933701)

I'm not a serious developer and certainly not one that works on mission critical systems but I have a question:

Are there any symbolic math libraries that allow a program to compute and store its interim values symbolically until the final result was needed? (Like, as an AC mentioned earlier, Mathematica?). Of course there would be an memory overhead (but surely the entire Mathematica kernel wouldn't be needed) and performance might be much MUCH slower than current "binary math" libraries but surely in a day of gigabyte RAM chips and gigaflop CPUs (and Terflop GPUs) the added precision would be worth it?

So does anything like this exist? Would it be hard to develop (that's a challenge for you out there!)

GMP (1)

tepples (727027) | more than 4 years ago | (#29933827)

The mpz module in the LGPL library GMP [gmplib.org] (not to be confused with a bitmap image editor) does arithmetic on large integers, and its mpq module represents rational numbers exactly as ratios of mpz integers. For example, 3.14 would become "157/50".

Re:Are there any symbolic math Libraries? (0)

Anonymous Coward | more than 4 years ago | (#29933843)

Python

Your tax dollars at work (2, Insightful)

Herger (48454) | more than 4 years ago | (#29933703)

This is not an example of computers sucking at math.

This is an example of engineers and developers failing to draw up valid requirements, failing to develop to specification, and failing to test against real-world use cases.

Management undoubtedly shares an equal if not greater portion of the blame here. This is typical military-industrial complex, lowest-bidder contractor mentality at work, just another form of corporate welfare if the government doesn't turn around and punish shortfalls like this.

Why Computer Suck At Keeping Time (1)

rocketPack (1255456) | more than 4 years ago | (#29933737)

Sounds like the computer did the math just fine, but with a flawed clock.... That's classic GIGO!

May I suggest that it's not the computers... (0)

Anonymous Coward | more than 4 years ago | (#29933775)

...that sucks at maths, but some programmers?
OK, that was an easy shot, but really, don't you agree that today's academic courses in science at large are becoming so specialized so soon that good sense stemming from scientific culture cannot be expected any more?

And this is why... (1)

elnyka (803306) | more than 4 years ago | (#29933779)

And this is why it is a good idea to take a Numerical Analysis course or an Assembly course that lets you play with floating-point arithmetic as part of your CS electives. As much as I'd like to blame today's Java/.NET-oriented CS curricula (which seem to be fashionable now in many universities), it's been quite a while that many universities barely pay any attention (if any) to the details of floating point arithmetic.

Kind of old news isn't it? (2, Insightful)

Interoperable (1651953) | more than 4 years ago | (#29933783)

The article contains some interesting examples but all of which have been in programming texts and courses for years. I'm not really sure why it's on /.

WHY MAGAZINE EDITORS SUCK AT MATH: (0)

Anonymous Coward | more than 4 years ago | (#29933787)

Let's take the double precision floating point representation as an example. It uses 64 bits to store each number and permits values from about -10308 to 10308 (minus and plus 1 followed by 308 zeros, respectively) to be stored.

Flamebait (0)

Anonymous Coward | more than 4 years ago | (#29933801)

The how story and most of the posts are one giant Flamebait.

Nice how all the Slashdot geniuses seem to think they could have done a better job had the *only* been there 20-30 years ago, before most of these would be heroes were even born.

Then there are morons who get on their high horse about corporate welfare bullshit. Sure, no one at Raytheon gave a shit about our soldiers, they just wanted to make a buck.

What a disgusting way to start a Saturday.

Ridiculous. Patriots always win. (3, Funny)

writermike (57327) | more than 4 years ago | (#29933809)

Look, you guys can talk trash all you want, but when you say this:

>>Patriot defense system failing to take down a Scud missile attack

You're just lying to yourself. The Patriots defense is awesome this year. I mean, was there really ANY point for the Titans offense to show up a couple of weeks back?

And the Scuds? C'mon man. They let go their best man two seasons ago. The QB can't hit the broadside of a barn and their entire wide-receiver corp has Jello hands anyway. The missile attack is a gadget play, pure and simple. Belichick sees right through that and you know it.

Haters need to stop all the hatin' and get on the Pats bus!!!!! GO PATRIOTS!

Re:Ridiculous. Patriots always win. (1)

gclef (96311) | more than 4 years ago | (#29933933)

Football? Dude, we didn't leave you...you left us. Your card, turn it in.

The old Patriot-stories, again.... (0)

Anonymous Coward | more than 4 years ago | (#29933819)

Screw this. During Gulf-war I, or whatever it's called, Patriot did not 'miss' any target that they fired at - period. The system was never designed to destroy missiles or get direct hits, however. The original missiles were to destroy planes by going off in close proximity to the target - which they do, very successfully. Missiles like Scuds, however, are not always destroyed by this. They tend to just break up, sending the intact warhead off track, slightly. From what I know, this happened in the case mentioned.
I happen to have worked with this system in the mid-nineties and this was a hot topic, back then. Why the total uptime of the system would mess up tracking is beyond me. The system will track what it either sees or is told to look for. This has nothing to do with rounding errors in time. Our system back then has been online for many days without impaired ability to track anything.

Error in the author's math (1)

ronaldg (718122) | more than 4 years ago | (#29933825)

Quote: total error of 0.3433 seconds, during which time the Scud missile would cover 687m.............. This would mean the SCUD would be traveling at almost 71 million miles per hour! I don't think so............

"must-be-lit-majors" (1)

John Hasler (414242) | more than 4 years ago | (#29933863)

The authors of the article? So it would seem.

Bad design anyway (1)

rnelsonee (98732) | more than 4 years ago | (#29933881)

Programmers' errors/naivete aside, if an error of 0.3433 seconds can mean the target aperture is 687m off, then a resolution of 0.1 seconds - even when working properly - could still be 200m off.

And I see other comments about using fixed-point. I wonder why couldn't they just use an integer and use deciseconds as their base time?

Seriously flawed reporting (3, Interesting)

SpinyNorman (33776) | more than 4 years ago | (#29933893)

There's no way a real-time missile tracking system is going to be dealing with time at an accuracy of 0.1 sec.

A Patriot missile travels at about Mach 3 (~1000 m/sec) so a rounding error of 0.05, even without any error accumulation, means you'd be off by 50m in position.

Who knows what the real story is vs the garbage that was reported, but even if there was a cumulative error that's the fault of the programmer rather than a lack of a computers ability to do math. You do your error analysis and use whatever accuracy needed to keep the errors in a tolerable range.

The part about the system running for 100 hours was pure gibberish. Yes, we can all divide that by 0.1 sec, but what on earth does that have to do with a real-time tracking system tracking a target is acquired a few minutes ago?!

A better title for the story rather than "computers can't do math" would be "we can't do tech reporting".

Car analogy (1)

ledow (319597) | more than 4 years ago | (#29933895)

This TechRadar article also explains why cars suck at math, too.

The timing belt was manufactured to be a few mm too short. But over the course of several thousand revolutions, those mm add up to a massive error, which causes the pistons to strike metal. Thus the car was a write-off.

It's no fairer to blame the computer than it is the car - some ABSOLUTE PILLOCK didn't design, implement or test their system properly. And *they* caused the 28 deaths, not the computer (and it can't be overstated just how elementary a mistake this is, especially in a military system, and should have been caught by basic code review and testing at every stage).

I hate stories like this because then you get deep mistrust of computerised systems where they *can* be incredibly useful, and without an adequate substitute. Every time a car won't start because the electronic ignition wasn't designed properly, every time a home computer crashes because someone didn't bother to isolate the apps from the OS well enough, every time something like this happens, people distrust "computers" more and more when what they should be distrusting is damn crappy programming.

A computer is as close as you can practically get to being perfect. Short of hardware failure (Intel FDIV bugs, bad RAM, corrupt drives etc.), computers do not make mistakes. If they crash, it's because they've been *told* to crash (the fact that you even *see* a blue screen or kernel panic means that the computer is still just blindly following orders).

There's no excuse for this - it's basic, elementary mathematics and binary manipulation. Some pillock threw a cheap CPU clock and a standard library at a time-critical, life-dependent military problem without even thinking. The programmers should be sacked, the testing teams should be sacked and ANYTHING they've ever created or reviewed should be overhauled to make sure they haven't made even worse mistakes.

Re:Car analogy (1)

John Hasler (414242) | more than 4 years ago | (#29933937)

You assume that the article is not a "complete pillock".

I'm not impressed ... (1)

golodh (893453) | more than 4 years ago | (#29933899)

I remain very much unimpressed by the article, due mainly to it's rather sensationalist focus on missile systems and Ariane but also to it's apparent ignorance of a now 50-year old branch of applied Mathematics: Numerical Analysis (see e.g. http://en.wikipedia.org/wiki/Numerical_analysis [wikipedia.org] ) and its failure to distinguish between the root causes of both system failures. The Ariane failure (see http://en.wikipedia.org/wiki/Ariane_5_Flight_501 [wikipedia.org] ) was interesting in that the software itself was Numerically sound, but it only failed to watch for overflow:

Efficiency considerations had led to the disabling of the software handler (in Ada code) for this error trap, although other conversions of comparable variables in the code remained protected. This led to a cascade of problems, culminating in destruction of the entire flight.

The Patriot case was simply unsound from a numerical point of view because it used an approximation which accumulated errors to the point where they seriously compromised the end result, which is a whole thing altogether (and mathematically speaking much simpler and more fundamental).

Numerical analysis is basically about "How can we make sure that a computer algorithm on such-and-such hardware will always produce an answer to this-and-this mathematical problem with such-and-such error bounds.". This really isn't something like "coding well", but it can require complicated and careful mathematics to get right, which is something programmers usually haven't a clue about. Instead, and provided the effort is warranted by the application, one needs to have a competent Numerical Analyst (a fancy title for a Mathematician specialized in this particular field) check (if not actually design) the software. Coders can then do the rest, provided there is sufficient communication between the architect (the numerical analyst) and the builders (the coders) about all the quirks of the hardware and how they are accounted for and dealt with.

Every CS graduate is supposed to know that advanced numerical work with computers (like those in the Patriot system, where the 0.3 second error is a fine example of negligence) falls under the domain of Numerical Analysis and require specialist attention. This is why some jobs should be undertaken by software engineers, not coders.

In one sentence? (1)

Hurricane78 (562437) | more than 4 years ago | (#29933901)

As Paul Lockhart said: Math is about creativity!

There. I saved you a hell of a lot of time! ^^

Fixed, but a day late (0)

Anonymous Coward | more than 4 years ago | (#29933905)

Fixed, but a day late. for a 2 week turn around time from when this was fault-isolated to a fix was fielded in SW Asia is fast for government work. Sadly, 28 American soldiers died. The computer found a possible ABT. When it verified the track, it wasnt where its programming told it to be. Track was dropped.

The Real Problem (0)

Anonymous Coward | more than 4 years ago | (#29933909)

The real problem is all the math teachers around the world that teach students that all decimals after tenths or hundreths of a digit are, and I quote, "insignificant". In this case, what you basically have is a bunch of programmers who grew up learning that anything after the tenths digit is "insignificant". It's as simple as that.

Load More Comments
Slashdot Login

Need an Account?

Forgot your password?