Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Programming Error Doomed Russian Mars Probe

Soulskill posted more than 2 years ago | from the to-infinite-loops-and-beyond dept.

Mars 276

astroengine writes "So it turns out U.S. radars weren't to blame for the unfortunate demise of Russia's Phobos-Grunt Mars sample return mission — it was a computer programming error that doomed the probe, a government board investigating the accident has determined." According to the Planetary Society Blog's unofficial translation and paraphrasing of the incident report, "The spacecraft computer failed when two of the chips in the electronics suffered radiation damage. (The Russians say that radiation damage is the most likely cause, but the spacecraft was still in low Earth orbit beneath the radiation belts.) Whatever triggered the chip failure, the ultimate cause was the use of non-space-qualified electronic components. When the chips failed, the on-board computer program crashed."

cancel ×

276 comments

Sorry! There are no comments related to the filter you selected.

Excuse me... not a programmer's fault. (5, Insightful)

LostCluster (625375) | more than 2 years ago | (#38958015)

We've got a contradictory summary here. Chip failure isn't a programming fault, it's a hardware problem. Stop confusing hardware and software you insensitive clod.

Re:Excuse me... not a programmer's fault. (3, Insightful)

Anonymous Coward | more than 2 years ago | (#38958097)

Obviously the error handling routine was poorly written.

Re:Excuse me... not a programmer's fault. (5, Funny)

Anonymous Coward | more than 2 years ago | (#38958143)

sure, it missed:

if(cpu_melted)
      abort();

Re:Excuse me... not a programmer's fault. (5, Funny)

MSesow (1256108) | more than 2 years ago | (#38958267)

That could throw a ProcessorNotFoundException, be sure to code accordingly.

Re:Excuse me... not a programmer's fault. (3, Funny)

Anonymous Coward | more than 2 years ago | (#38958605)

The linux kernel throws an error about unsupported CPU's, how that code should execute in the first place is a mystery.

Re:Excuse me... not a programmer's fault. (4, Funny)

tripleevenfall (1990004) | more than 2 years ago | (#38958725)

In Soviet Russia, code executes you!

Re:Excuse me... not a programmer's fault. (3, Informative)

Anonymous Coward | more than 2 years ago | (#38959025)

In that case, the primary CPU is already up and running; it's booting additional processors.

Re:Excuse me... not a programmer's fault. (0)

Anonymous Coward | more than 2 years ago | (#38958731)

So something like:

try { // anything
} catch (ProcessorNotFoundException e) {
        System.out.println(e.printJTAGTrace());
}

That was easy... lazy Russian programmers.

Re:Excuse me... not a programmer's fault. (4, Funny)

wjsteele (255130) | more than 2 years ago | (#38958825)

Actually, that code worked perfectly!!!

Bill

Re:Excuse me... not a programmer's fault. (1)

Anonymous Coward | more than 2 years ago | (#38958361)

You wouldn't need those chips if the probes software was processed by The Cloud(tm)!

Re:Excuse me... not a programmer's fault. (-1, Redundant)

tripleevenfall (1990004) | more than 2 years ago | (#38958691)

In Soviet Russia, chip crashes you!

Re:Excuse me... not a programmer's fault. (0)

Anonymous Coward | more than 2 years ago | (#38958929)

correct.
Most likely cause of the two half-sets the device restarts TSVM22 ( digital computer ) IOO ( onboard computer system ) is the local impact of heavy charged particles (TZCH) space, which led to faulty RAM ( random access memory ) computational modules TSVM22 sets during the flight of spacecraft “Phobos-Grunt” on the second turn. RAM failure could be caused by short-term inability of the ERI ( elektroradioizdely ) due to exposure to cells TZCH TSVM22 computational modules, which contain two chips of the same type WS512K32V20G24M, located in a single case in parallel with each other. Exposure to lead to a distortion of the code that caused the “restart” of the two half-sets TSVM22.

TFS - obviously written by a hardware guy (2)

Thud457 (234763) | more than 2 years ago | (#38958509)

"Cosmic rays?"
"That's a software problem...

They're lucky those chips they bought from China weren't made of lead, or contain deadly melamine!!!

Re:TFS - obviously written by a hardware guy (4, Interesting)

sconeu (64226) | more than 2 years ago | (#38958847)

You laugh, but how many of you low level guys had to work around buggy hardware?

I once sent a memo to my boss that I was doing the equivalent of "working around a burnt out lightbulb in software".

E.g.: How many hardware guys does it take to change a lightbulb? None, we'll just have the software work around it.

Re:TFS - obviously written by a hardware guy (1)

jd2112 (1535857) | more than 2 years ago | (#38959069)

"Cosmic rays?" "That's a software problem...

They're lucky those chips they bought from China weren't made of lead, or contain deadly melamine!!!

If they were made of lead they might have blocked enough radiation to prevent them from crashing.

Re:Excuse me... not a programmer's fault. (5, Interesting)

icebike (68054) | more than 2 years ago | (#38958521)

Obviously the error handling routine was poorly written.

I'll assume your tongue was firmly planted in your cheek, and suggest a +1 Funny mod.

But on the chance you were serious, depending on where that chip was, it may have been beyond something manageable by software.

A chip in a power controller could take down any or all of the processor components, or render access to control circuits impossible.

The linked article also states

Everything was working well with the spacecraft immediately after launch, including deployment of the solar panels, until the command to start the engines was issued. When that did not happen, the spacecraft went into a safe mode, keeping the solar panels pointed to the Sun to maintain power.

How many times do you supposed they actually tested engine start IN THE SPACE CRAFT? I'm guessing ZERO.

non-space qualified parts being used in some of the electronics circuits. This is a design failure by the spacecraft engineers that might have been caught had they performed adequate component and system testing prior to flight. But they did not.

So design failure, due to radiation, prior to the craft getting near the strongest radiation belts. Unbelievable. Occam would be skeptical.

This sounds to me like some on-board internal source of radiation, or induction, or simple overload, fried a chip somewhere in some un-specified circuitry, most probably in the engine controls. This seems far more likely than an external radiation source given the shielding the physical design would provide.

I doubt space qualification made any difference at all. The window for space radiation in the brief time it was operational was small.
Rather I suspect under-spec parts, over voltage or high current draw, or internal shielding oversights.

Re:Excuse me... not a programmer's fault. (2)

Rakishi (759894) | more than 2 years ago | (#38959091)

How many times do you supposed they actually tested engine start IN THE SPACE CRAFT? I'm guessing ZERO.

I'm sure they tested the engine multiple times. I'd figure the stress of the launch (vibrations, etc, etc.) causes something to fail either due to shoddy construction or small debris falling onto something.

I doubt space qualification made any difference at all. The window for space radiation in the brief time it was operational was small.

Exactly. I doubt all those laptops on the ISS are radiation hardened but they last quiet a while anyway.

Re:Excuse me... not a programmer's fault. (0)

Tsingi (870990) | more than 2 years ago | (#38958159)

I concur. I didn't RTFA, TFS is contradictory.

Re:Excuse me... not a programmer's fault. (5, Informative)

Cochonou (576531) | more than 2 years ago | (#38958221)

Well... if you read TFA (or actually the first TFA linked), it is clearly written:
In a report to be presented to Russian Deputy Prime Minister Dmitry Rogozin on Tuesday, investigators concluded that the primary cause of the failure was "a programming error which led to a simultaneous reboot of two working channels of an onboard computer [...] Likewise, cosmic rays and/or defective electronics are not the leading suspects behind Phobos-Grunt’s demise.
The summary is clearly bolting together two contradicting reports.

Re:Excuse me... not a programmer's fault. (0)

Anonymous Coward | more than 2 years ago | (#38958417)

Except that the OP said the SUMMARY was contradictory. And it is. This has nothing to do with reading TFA. It has everything to do with the summary.

Re:Excuse me... not a programmer's fault. (5, Funny)

Anonymous Coward | more than 2 years ago | (#38958485)

This has nothing to do with reading TFA. It has everything to do with the summary

You just defined all of slashdot. What was your point again?

Re:Excuse me... not a programmer's fault. (1)

Rary (566291) | more than 2 years ago | (#38958283)

The second link makes the following claim:

In a report to be presented to Russian Deputy Prime Minister Dmitry Rogozin on Tuesday, investigators concluded that the primary cause of the failure was "a programming error which led to a simultaneous reboot of two working channels of an onboard computer," the Russian state-owned news agency RIA Novosti reported.

However, the third link says nothing of the sort. It sounds like TFS is just a mishmash of conflicting theories from different articles.

Re:Excuse me... not a programmer's fault. (3, Interesting)

Rary (566291) | more than 2 years ago | (#38958359)

To follow up, the article saying that it was a chip failure is dated yesterday, while the article claiming it was a programming failure is dated today. Presumably, this is new information to shoot down the previous claims, but TFS (in typical Slashdot "editorial" style) fails to actually make that distinction, and puts both claims together as part of a single summary.

Re:Excuse me... not a programmer's fault. (1)

MindStalker (22827) | more than 2 years ago | (#38958463)

Chip failure, but it was a software error that lead to not handling the chip failure gracefully. Space qualified stuff has to be much more redundant and capable of handing failures of multiple components.

Re:Excuse me... not a programmer's fault. (2)

0123456 (636235) | more than 2 years ago | (#38958727)

A while back I read some interesting discussions between satellite engineers about the tradeoffs between space qualified and not space qualified chips. From what I remember you gain resistance to radiation, but lose in other areas such as resistance to physical damage (e.g. a solder joint coming loose due to launch vibrations) because they're so far behind the state of the art that you may have to put a lot more chips on the same circuit board.

So it doesn't seem a clear-cut choice... rebooting the computer when it crashes is typically easier than fixing a solder joint when it's fifty million miles from Earth.

Re:Excuse me... not a programmer's fault. (1)

Grishnakh (216268) | more than 2 years ago | (#38958851)

I'm not a satellite engineer, but wouldn't it be easy enough to just install a lead shield around the PCB to protect from most radiation? As long as the shield's not too thick, it shouldn't add too much weight, especially compared to using older-technology chips that'll take up more board space.

Re:Excuse me... not a programmer's fault. (0)

Anonymous Coward | more than 2 years ago | (#38959021)

Surface-mounted discrete parts are less vibration-damage-prone, but are more prone to radiation because of their smaller size. The opposite is true for through-hole parts. As for integrated circuits, I don't know what transistors they use in space vehicles, but the same radiation limitations probably apply to newer and smaller FET transistors.

Re:Excuse me... not a programmer's fault. (0)

Anonymous Coward | more than 2 years ago | (#38958649)

This one time, I almost agree, but why didn't you software retards code in redundancy?

Re:Excuse me... not a programmer's fault. (2)

icebike (68054) | more than 2 years ago | (#38958681)

The second link in summary leads to an article that is internally contradictory. That page from Discovery News is all over the place.
Which is not surprising given the bio of the author [discovery.com] :

Klotz came to Brevard County, Fla. (aka The Space Coast) as a copy editor for the local paper 24 years ago. She switched to writing because it was obvious the reporters were having way more fun than the editors for the same money. After a year or so of writing for the business section,
Journalism major trying to wear the big girl shoes.

The Link to the planetary society page seems much more reliable.

Re:Excuse me... not a programmer's fault. (3, Funny)

smcdow (114828) | more than 2 years ago | (#38958699)

You can't possibly call yourself a programmer if your code can't recover from a hardware fault.

Re:Excuse me... not a programmer's fault. (1)

geekoid (135745) | more than 2 years ago | (#38958777)

If a software failover fails, and the currently used chip fail, then it's both.

Please, do some low lever software /hardware work before opening you mouth.
This isn't one of your slapped together VB3 front end.

Yeah, YOU Herd me.

Re:Excuse me... not a programmer's fault. (1)

gatkinso (15975) | more than 2 years ago | (#38958961)

Space rated hardware, software, and (more relevantly) firmware is designed to handle this type of problem (to the fullest extent possible).

Re:Excuse me... not a programmer's fault. (3, Insightful)

alienzed (732782) | more than 2 years ago | (#38959005)

On the other hand, this demonstrates so aptly why they failed in the first place. "Yep, it's a software problem, because the hardware failed to run any after it was damaged."

Re:Excuse me... not a programmer's fault. (2)

jamstar7 (694492) | more than 2 years ago | (#38959159)

At least they didn't fuck up a meters-to-feet conversion.

Re:Excuse me... not a programmer's fault. (3, Interesting)

crutchy (1949900) | more than 2 years ago | (#38959169)

to my knowledge, only the Apollo Guidance Computer has ever truly achieved hardware failure tolerance. the Apollo 11 LM radar fault overloaded the computer, but was able to continue due to restart logic built into the AGC that was able to pick up critical tasks from where they were when the computer was restarted and drop non-critical tasks, and all with a very small fraction of the capabilities of current technology (although I think from memory they were able to fit 2 transistors on a single chip!). the AGC is really a marvel of (past) engineering and computer science. the reliability problem alone would be insurmountable with today's garbage. probably part of the reason why we haven't been back there since.

Programming error? (5, Funny)

mehrotra.akash (1539473) | more than 2 years ago | (#38958075)

the ultimate cause was the use of non-space-qualified electronic components

Programming error?
Perhaps in the software used to order the parts

Re:Programming error? (0)

Anonymous Coward | more than 2 years ago | (#38959013)

they probably made the mistake of ordering parts from America

Typical (0)

Anonymous Coward | more than 2 years ago | (#38958081)

The electronic engineers here are always trying to blame programmers for their design faults too.

headline fail (3, Informative)

jamessnell (857336) | more than 2 years ago | (#38958095)

"the ultimate cause was the use of non-space-qualified electronic component" != "programming error" hardware fail.

Re:headline fail (1)

X0563511 (793323) | more than 2 years ago | (#38958109)

Even better... a design fail! The hardware worked (or not) as per it's specifications. It's not the hardware's fault you put it where it wasn't meant to go!

Re:headline fail (1)

jamessnell (857336) | more than 2 years ago | (#38958189)

Pretty much. Though in a sense, it probably wasn't a design fail necessarily.. They probably just had someone ordering parts that didn't know to order mil spec (I'm assuming mil spec is fine for space stuff). Seems to me like most ICs are available implemented in mil spec packages - so the part seems the same in basically every way, but it costs a lot more and resists environmental crap better. It's a sad story, really.

Re:headline fail (2, Informative)

Anonymous Coward | more than 2 years ago | (#38958259)

They probably just had someone ordering parts that didn't know to order mil spec (I'm assuming mil spec is fine for space stuff)

No, not even close. "Mil spec" is basically industrial grade with a little bit extended temperature range. Radiation hardened stuff is completely different ballpark.

Re:headline fail (1)

sjames (1099) | more than 2 years ago | (#38958627)

Sometimes mil spec isn't even extended at all, but just has more rigorous testing to make sure it's within the standard specs.

Re:headline fail (2)

Tastecicles (1153671) | more than 2 years ago | (#38958295)

mil spec isn't proofed against hard radiation; it does some soft radiation and EM not quite up to airburst-strength pulse. Space spec has to withstand high energy radiation such as Cosmic, X- and Gamma rays way beyond what you'd encounter 5 miles below a thermonuclear burst, otherwise it'll get outside the VA belts and simply die.

Re:headline fail (4, Funny)

smitty97 (995791) | more than 2 years ago | (#38958431)

(I'm assuming mil spec is fine for space stuff)

You don't happen to work at the Russian Space Agency purchasing department, do you?

Re:headline fail (1)

jamessnell (857336) | more than 2 years ago | (#38958587)

ha ha ha ha, no. Though I am a junior in R&D for aeronautically deployed survey equipment.. So I'm a little familiar with hardening systems.. Do you happen to know from practice (or some substantial experience) if mil spec is insufficient for that application?

Re:headline fail (2)

geekoid (135745) | more than 2 years ago | (#38958905)

A) Some hardware has software embedded into it, yeah shocking.

B) Parts fail in space craft. If the software failed to detects a failed piece and roll to back up, the software has it's roll in the incident as well.

C) If it jump to the wrong mode after the error, that's also a software error.

I'm not saying one way or another in the specific incident. The idea that there is a hard line between all software and hardware is false, and technical people should know better.

Re:headline fail (1)

jamessnell (857336) | more than 2 years ago | (#38959065)

Obviously software and hardware are more conceptual partitions used to help divide and conquer the overall challenge at hand. That said, picking a part that from a digital logic level up was the right part, and it merely failed due to improper sheidling (etc), sounds fairly deep in the hardware camp to me. I think your "B" point is pretty solid. But in terms of jumping to the wrong point of code due to physical error, that's getting to be kind of unreasonable to demand of software - as that case is indicative of the software not executing correctly. You can probably do some neat things in software to help mitigate a little of that, but for the most part, that seems unreasonable to me. I suppose that the key to a meaningful conversation here is understanding what actually caused the failure, as we're all kind of spinning off in to speculation, which means we're basically trolling ourselves.

Re:headline fail (1)

jd2112 (1535857) | more than 2 years ago | (#38959175)

Programmer didn't include if($component.SpaceRating != TRUE) {throw "INITIALIZATION ERROR: NON SPACE RATED COMPONENT!"}

Gamma rays (1)

Anonymous Coward | more than 2 years ago | (#38958113)

Gamma rays, X-rays and the products of their collisions are attenuated by the upper atmosphere, not the Van Allen belts. This is why you get more exposure at altitude in an airplane.

Translation Fail? (1)

DemonicMember (1557097) | more than 2 years ago | (#38958121)

Maybe this is all just a translation error, could have been either or both?

So how much? (2)

cvtan (752695) | more than 2 years ago | (#38958145)

How much did they save by using Radio Shack parts in a Mars probe? $5.00 even?

Re:So how much? (4, Funny)

Spykk (823586) | more than 2 years ago | (#38958447)

Not even the government could save money by buying something at Radio Shack.

Re:So how much? (2)

Dasuraga (1147871) | more than 2 years ago | (#38958455)

Space-qualified microchips can cost something around 5000 euros. Equivalent chips that are "only" rated for automobile usage(for example), cost 10 cents.

Re:So how much? (3, Informative)

stewbee (1019450) | more than 2 years ago | (#38958499)

If only. The reason ICs cost so little is that the cost is spread out over millions of parts. As my analog circuits Prof would say. "Your very first IC off the line is going to cost a million dollars. Everything else after that is free." So to buy one or two ICs that are radiation hardened is probably going to cost that much since it will most likely be custom. Now that's not to say they can't reuse some of the masks for an existing IC to make it cheaper, but It won't be that much cheaper. My guess is that they would want to redesign the part anyway if it is going to be in a radiation intense environment. The radiation could cause some weird quantum effects in the IC that might mean they want the transistors to be larger for reliability purposes. But that last part is just a guess since I am not an IC designer and thought my electronic materials class was nothing short of voodoo.

Long story short, they probably saved more than $5 for using a COTS part, but they probably lost the probe by the part not being radiation hardened.

Re:So how much? (1)

John Bresnahan (638668) | more than 2 years ago | (#38958513)

How much did they save by using Radio Shack parts in a Mars probe? $5.00 even?

Based on my last visit to Radio Shack, I don't think their parts are any cheaper than the special-purpose, radiation-hardened parts they should have used.

But when you can't wait until tomorrow for a part for your space probe, Radio Shack is convenient.

Re:So how much? (2)

systemeng (998953) | more than 2 years ago | (#38958969)

When I worked in the test equipment industry, we had a term for the lowest grade of parts that still worked when binning components: The radio shack bin. I once built part of an emergency prototype for a test equipment cooling system with radio shack parts. The prototype was sent to Taiwan where it failed prematurely due to the marginal components. Never Again!

Re:So how much? (2)

K. S. Kyosuke (729550) | more than 2 years ago | (#38959137)

How much did they save by using Radio Shack parts in a Mars probe? $5.00 even?

This is not the first time something like this happened to the Russians. In the 1970's, the Soviet Mars 4 [wikipedia.org] probe failed in flight. The reason? Due to cost savings, the transistors used had had their gold parts replaced with aluminium ones, which were prone to chemical degradation (a.k.a. corrosion). The Soviets then realized that they had manufactured three more probes of the same series using the same (unfit) transistors. Now what did they do? Of course they launched them! Guess what happened? Mars 5 failed two weeks after reaching the target orbit. Mars 6 first stopped sending its telemetry, but it operated autonomously just fine and launched a transmitting lander...which stopped working before touching the surface. Mars 7 failed again in flight and launched a lander onto an interplanetary trajectory instead of the surface of Mars.

See, when you're Russian and know that a probe as designed might fail, you just build more of them until one succeeds. :D

Huh? Its hardware failure (0)

Anonymous Coward | more than 2 years ago | (#38958149)

doesn't sound like programming, if the part did not fail, then the mission should have continued as planned.

Always Blame Software (4, Insightful)

invid (163714) | more than 2 years ago | (#38958163)

Is it just me, or is it the responsibility of all software engineers to find the hardware problem in order to prove to people that the cause isn't software?

Re:Always Blame Software (1)

Hognoxious (631665) | more than 2 years ago | (#38958303)

is it the responsibility of all software engineers to find the hardware problem in order to prove to people that the cause isn't software?

Find someone else, I'm busy.

In any case, it's usually orders of magnitude easier to blame the spec. It's written by management/users, after all...

Re:Always Blame Software (0)

Anonymous Coward | more than 2 years ago | (#38958413)

Then, the software engineer is expected to code around the HW problem, since fixing the hardware is too expensive.

This is followed by everyone blaming the SW. A new SW release fixed the problem so it must have been a SW problem right?

Re:Always Blame Software (2)

rwv (1636355) | more than 2 years ago | (#38958419)

In my experience... hardware problems are acceptable if there's a software work-around. Special acknowledgement isn't given to software for fixing hardware bugs... it's just expected since hardware is arguably more expensive to change.

How is "chip failure" a "programming error"? (1)

vleo (7933) | more than 2 years ago | (#38958179)

I'm not first to ask... but still wonder how that's possible on Slashdot that is *supposed* to be technologically literate.

Re:How is "chip failure" a "programming error"? (5, Funny)

Hognoxious (631665) | more than 2 years ago | (#38958377)

A 4 digit ID and never heard of microcode.

Seriously Gramps, the distinction between hardware and software isn't as clear cut as it was when shit was all powered by steam.

Re:How is "chip failure" a "programming error"? (0)

Anonymous Coward | more than 2 years ago | (#38958441)

Context. You fail it.

Re:How is "chip failure" a "programming error"? (2)

Capt.DrumkenBum (1173011) | more than 2 years ago | (#38958507)

Stop dissing Steam, it is the power source of the future. :)
Also, get off my lawn.

Re:How is "chip failure" a "programming error"? (1)

geekoid (135745) | more than 2 years ago | (#38958933)

Problem came on board during the first SW:EP1 discussion, not any of the technical ones. Not that there was any real technical ones at the time.

Re:How is "chip failure" a "programming error"? (0)

Anonymous Coward | more than 2 years ago | (#38958769)

In pseudocode: let's say you had two redundant hardware devices with interfaces DevA and DevB. You might code this:

Try
        DevA.Write(InputWord)
Catch
    Try
          DevB.Write(InputWord)
    Catch
          WarningFlag.Raise
          Continue

Now, suppose you coded this instead:

        DevA.Write(InputWord)
        DevB.Write(InputWord)

That can be a case where a chip failure (in DevA) becomes a programming error.

Obligatory Armageddon quote (1)

Kinthelt (96845) | more than 2 years ago | (#38958185)

Components. American components, Russian Components, ALL MADE IN TAIWAN!

http://www.imdb.com/title/tt0120591/quotes?qt=qt0459113 [imdb.com]

Re:Obligatory Armageddon quote (1)

Tastecicles (1153671) | more than 2 years ago | (#38958323)

Ob. Clancy (mis?)quote:

"See? We have the best technology in our missiles, Tovarisch."
"What does it say?"
"Texas Instruments."

Hardware != Software (1)

jmorris42 (1458) | more than 2 years ago | (#38958193)

Eh? Headline says "Programming error" while the summary says it was some doofus trying to get away with buying off the shelf instead of paying extra for radiation hardened space rated parts and losing. The only good thing in this story was it was an unmanned probe.

Description Fail (0)

Anonymous Coward | more than 2 years ago | (#38958199)

The OP is stating from 2 different sources. One saying it was a programming error while the other was a seemingly earlier report about the defective or off-spec components

Re:Description Fail (4, Interesting)

expatriot (903070) | more than 2 years ago | (#38958391)

The Planetary Society entry says that two modules failed and then the main computer crashed. Probably irrelevant if the computer crashed or not if there were significant failures in the electronics. Perhaps if the computer had kept going there woud have been some communication of what had gone wrong.

One of the commenters wrote "It is rather unlikely radiation caused the failure. Russians said the failure was due to an SRAM WS512K32V20G24M from White Electronics. This part is a module containing 4 CY7C1049 chips from Cypress and is actually screened. While the Cypress part is very susceptible to Latchup," No idea if this is true or not.

It wasn't the programming... (0, Offtopic)

afabbro (33948) | more than 2 years ago | (#38958205)

...it was the name. Phobos Grunt sounds like a porn star.

Male, female, or transgendered, I'm not sure.

Re:It wasn't the programming... (1)

vlm (69642) | more than 2 years ago | (#38958563)

Mythologically, which is where the moon got its name, Phobos is a dude. He's got a twin brother Deimos. Given that datapoint, guess the name of another Martian satellite...

Re:It wasn't the programming... (1)

crawling_chaos (23007) | more than 2 years ago | (#38958767)

Steve?

Re:It wasn't the programming... (1)

geekoid (135745) | more than 2 years ago | (#38958953)

What we do know for sure: Bottom.

Contradictions (5, Informative)

Aladrin (926209) | more than 2 years ago | (#38958265)

The summary is so contradictory because it quotes from 2 articles, and each of them is completely different. One says that the parts were space-tested and fine, and the other says they were never space-certified and were definitely bad. The first one says instead that a software bug caused parts of the system to reboot. The second doesn't know what happened and just blames faulty hardware.

Re:Contradictions (1)

mbone (558574) | more than 2 years ago | (#38958613)

The summary is so contradictory because it quotes from 2 articles, and each of them is completely different.

" A foolish consistency is the hobgoblin of little minds, (Emerson)

Sounds like a editor failure to me (5, Funny)

kbob88 (951258) | more than 2 years ago | (#38958327)

In other news, U.S. radars were not responsible for the highly confusing and contradictory summary posted this morning to a Slashdot story about Russia's Phobos-Grunt probe. A thorough investigation has determined that the story's chips should have been able to withstand the radiation received when the story was transmitted through the intertubes and routed over northern Alaska. Instead, investigators blamed a typing failure on the story editors. "A series of tests showed that the editing was lousy and sloppy, and disciplinary action will be taken on those responsible," a spokesman said.

Re:Sounds like a editor failure to me (0)

Anonymous Coward | more than 2 years ago | (#38958655)

Those. Responsible have been sacked. Lamas!!!!

In Soviet Russia (1)

Anonymous Coward | more than 2 years ago | (#38958339)

The chips program you.

Blame it on software? (0)

Anonymous Coward | more than 2 years ago | (#38958355)

Wow, us software folks get blamed for everything....

So they picked the wrong components, had a hardware failure, and it's software's fault for not anticipating the failure? I know they always say "we'lll fix it in software" but this is ridiculous.

I can say at NASA, when we needed 2 fault tolerance, we had 3 CPUs....

Obligatory... (1)

Cruciform (42896) | more than 2 years ago | (#38958365)

In Soviet Russia probe causes programming bug!

They have very strict security measures. It can be traumatic.

What is it with Mars and probes? (1)

g0bshiTe (596213) | more than 2 years ago | (#38958451)

What's with Mars and probes? Seriously, how many have been lost either going or coming from?

how long does it take YOU to walk a mile? (2)

Thud457 (234763) | more than 2 years ago | (#38958671)

Mars is 60,000,000 miles away.
Phobos Grunt would have taken three years to get there.
If it didn't die of dysentery on the journey there.

Re:What is it with Mars and probes? (1)

Squidlips (1206004) | more than 2 years ago | (#38958859)

There seems to be a Mars curse for Russian probes. They have sent 4-5 probes, and they have all failed (two are at the bottom of the Pacific right now). However the Ruskies have done very well with other probes; it is just Mars. It is like the Patriots versus the Giants... NASA (actually JPL) has done better. Off the top of my head I would say that only two out of the last 5-6 have failed. [The failures spelled the doom of NASA new mantra "Better, Cheaper, Faster", although one lander did make it using better-cheaper-faster (using off the shelf electronics).] The last 3 JPL Mars probes have been spectacular successes ( MRO and the two MER rovers).

Re:What is it with Mars and probes? (1)

geekoid (135745) | more than 2 years ago | (#38958977)

It's HARD.
I mean, we have pretty much mapped every spot on the planet, yet airplanes still crash.

Considering the cost of a launch (0)

Anonymous Coward | more than 2 years ago | (#38958457)

I'm really surprised to hear that it's a programming error, but considering what was done with SCADA I wonder if there isn't something else afoot here.

Staffing Error Doomed American Tech News Site (4, Insightful)

billcopc (196330) | more than 2 years ago | (#38958477)

Okay, we still have a respectable though dwindling community of commenters, so can we please get rid of these editors who can't even be bothered to read four lines of summary text before posting ?

The headline and summary do not make sense. Come on, we're supposed to be nerds, aka intelligent, focused, attentive knowledge aggregators.

the fuck is wrong with this goddamned site?! These failures are starting to make Digg look good!

Re:Staffing Error Doomed American Tech News Site (0)

Anonymous Coward | more than 2 years ago | (#38958733)

I think the more important point is that it's hard to put together the usual aggregate news summary about the lost mission investigation, because the official Russian response has been chaotic. The hardware people are probably pointing fingers at the software engineers, and vice versa; and procurement is blaming the testers, etc. Meanwhile, some of their bosses may still suspect American sabotage.

Good luck to those guys trying to move the program forward.

Re:Staffing Error Doomed American Tech News Site (1)

geekoid (135745) | more than 2 years ago | (#38959009)

No. You are welcome to go to the times and pay for a subscription that uses actual editors.

Fun to read the comments (5, Insightful)

vlm (69642) | more than 2 years ago | (#38958479)

Fun to read the comments here. I've done embedded stuff and you need to be defensive. You can see at a glance who here has never done defensive programming before, or embedded or safety critical programming, all blaming the hardware. There's 3 states so you got 2 bits of input and a disallowed state comes in. Deal with it, don't just curl up and die and blame the hardware designer. There's a 12 bit A/D conversion result stored in two bytes, and there's a 14 bit number found there, deal with it don't just curl up and die and blame the ... . Theres a cycle start button and an emergency stop button and both are simultaneously on. Deal with it. You reboot a mission critical (or safety critical!) CPU and a minor auxiliary input A/D doesn't initialize, do you burn the plant down in a woe is me pity party because one out of 237 sensors aren't coming on line, or do you deal with it?

Finally radiation is a statistical phenomena. There is no such think as radiation free. If they used non-rad hardened parts, its gonna crash maybe 10000 times more often. Thats OK, you program around that, assuming you know what you're doing. Radiation hardened does not equal radiation-proof. If there was a single bit error, or a latchup on a rad-hardened unit, with a poorly programmed control system it would have failed just as well, its just that a rad hardened chip would have made it a couple orders of magnitude less likely. A shitty design that has a 1 in 20000 failure rate due to better hardware instead of 1 in 2 is still a shitty programming design, even if the odds are "good enough" that it makes it most of the time with the better hardware.

Re:Fun to read the comments (0)

Anonymous Coward | more than 2 years ago | (#38959007)

Most comments on Slashdot are from software-oriented people.

Baloney (4, Interesting)

mbone (558574) | more than 2 years ago | (#38958573)

What are the chances chips would fail in a 20-30 minute period just after launch but before Mars transfer orbit insertion ?

No, I bet this was a programming error, coupled with a near total failure to test the software.

 

Oh come on. (1)

JustAnotherIdiot (1980292) | more than 2 years ago | (#38958625)

I read the title and I was going to make a joke forgetting a ;, or something in the like.
But this wasn't a programming error, it was a hardware failure |:
Did the editor even read what he wrote?

Re:Oh come on. (1)

Fnord666 (889225) | more than 2 years ago | (#38959105)

Did the editor even read what he wrote?

The editors no longer write or read anything. They just cut and paste. Submitters no longer write anything, they just copy the first paragraph or two of an article. I swear that some days all of the articles are probably just submitted by a very short perl script.

Load More Comments
Slashdot Login

Need an Account?

Forgot your password?

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>