Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Patching Software on Another Planet

Soulskill posted about a year ago | from the no-do-overs dept.

Mars 96

An anonymous reader writes "Sixteen years ago, the Mars Pathfinder lander touched down on Mars and began collecting about the atmosphere and geology of the Red Planet. Its original mission was planned to last somewhere between a week and a month, but it only took a few days for software problems to crop up. The engineers responsible for the system were forced to diagnose the problem and issue a patch for a device that was millions of miles away. From the article: 'The Pathfinder's applications were scheduled by the VxWorks RTOS. Since VxWorks provides pre-emptive priority scheduling of threads, tasks were executed as threads with priorities determined by their relative urgency. The meteorological data gathering task ran as an infrequent, low priority thread, and used the information bus synchronized with mutual exclusion locks (mutexes). Other higher priority threads took precedence when necessary, including a very high priority bus management task, which also accessed the bus with mutexes. Unfortunately in this case, a long-running communications task, having higher priority than the meteorological task, but lower than the bus management task, prevented it from running. Soon, a watchdog timer noticed that the bus management task had not been executed for some time, concluded that something had gone wrong, and ordered a total system reset.'"

cancel ×

96 comments

Sorry! There are no comments related to the filter you selected.

old? (1)

Anonymous Coward | about a year ago | (#44203385)

I didn't even read the full summary. But hasn't the occurrence of this priority inversion issue been reported about ... many years ago?

Re:old? (1)

Cenan (1892902) | about a year ago | (#44203747)

Reading TFA to the rescue:

L. Sha, R. Rajkumar, and J. P. Lehoczky. Priority Inheritance Protocols: An Approach to Real-Time Synchronization. In IEEE Transactions on Computers, vol. 39, pp. 1175-1185, Sep. 1990.

Re:old? (1)

Man On Pink Corner (1089867) | about a year ago | (#44204221)

Well, the more important stories were hogging the lock.

Sounds like this was noticed earlier ... (4, Interesting)

xmas2003 (739875) | about a year ago | (#44203389)

From TFA: "Engineers later confessed that system resets had occurred during pre-flight tests. They put these down to a hardware glitch and returned to focusing on the mission-critical landing software"

Very surprised by this ... even if a hardware glitch, wouldn't you want to track that down before launch? Especially since in the harsh space environment (bit flops even with hardened RAM/CPU), you want your hardware to be as reliable as possible.

Re:Sounds like this was noticed earlier ... (4, Interesting)

mlts (1038732) | about a year ago | (#44203475)

Devil's advocate here:

If it were my guess, there are so many priorities of glitches, and with a limited budget, if it isn't something that actively shuts down operations, resources are spent on other things.

The one good thing in this equation is the watchdog circuits. Without these in place, it can mean the hardware goes down and never comes to life again.

It is extremely hard to get working operating systems and patch management here on Earth [1]... much less having systems that are made to work where there is no way to walk up to the machine, and re-flash a new OS via the JTAG ports.

[1]: Patch management had issues for every OS I've used. AIX gets issues via lppchk which means force-installing LPPs, RedHat gets RPM glitches possibly forcing a rebuild of the DB, Windows sometimes will just not install, or permit to be installed an update from WU, and so on. Now, with this in mind, trying to patch a machine millions of miles away is very daunting for even the best of the best.

Re:Sounds like this was noticed earlier ... (0)

Anonymous Coward | about a year ago | (#44203737)

much less having systems that are made to work where there is no way to walk up to the machine, and re-flash a new OS via the JTAG ports.

No, but you can run the patch on the simulator here on Earth and if it doesn't work at least here you can walk up to the machine and re-flash it.

Unless management tells you there's no budget to run it in the simulator... Which is why the antenna on one of the Viking landers is pointing at the ground.

Re:Sounds like this was noticed earlier ... (4, Interesting)

girlintraining (1395911) | about a year ago | (#44203739)

If it were my guess, there are so many priorities of glitches, and with a limited budget, if it isn't something that actively shuts down operations, resources are spent on other things.

Devil here: This isn't a budget problem, this is a management problem. Going all the way back to the Challenger disaster, NASA has shown a pattern of disregard for proper engineering practice. Richard Feynman chewed their ass out in Appendix F [nasa.gov] of the Challenger report to congress, and it was so scathing that both Congress and NASA tried to kick him off the board and discard his results... prompting the entire senior engineering staff of all branches of the Shuttle project to sign a petition saying: Either publish this, or face our wrath.

This isn't a technical problem -- this is management having shitty project management skills. If the budget is insufficient, then the project scope has to be reduced. It's just that simple. This is not the engineers' fault, or is it the fault of the technology... this is management trying to do too much with too little.

Re:Sounds like this was noticed earlier ... (0)

Anonymous Coward | about a year ago | (#44204007)

> prompting the entire senior engineering staff of all branches of the Shuttle project to sign a petition saying: Either publish this, or face our wrath.

Citation please. I really do care. I only have the story from Feinmann's perspective.

Re:Sounds like this was noticed earlier ... (1)

0123456 (636235) | about a year ago | (#44204177)

Bugs get prioritised, and when you have to launch this year or wait three years for the next launch window, non-critical bugs aren't going to delay a launch. A bug that causes the computer to reset and return to operation is not a critical bug for a system that's rolling over the surface of Mars at a few feet per minute.

I remember reading that the Apollo Guidance Computer developers would randomly press the reset button while testing their software just to ensure that, if it did reset, that wouldn't cause problems during a lunar landing.

Re:Sounds like this was noticed earlier ... (1, Insightful)

DerekLyons (302214) | about a year ago | (#44204949)

This isn't a technical problem -- this is management having shitty project management skills. If the budget is insufficient, then the project scope has to be reduced. It's just that simple. This is not the engineers' fault, or is it the fault of the technology... this is management trying to do too much with too little.

It must be nice to live in your black and white world, but the rest of us live in the real world where engineering, budget, and schedule tradeoffs are a reality.

Re:Sounds like this was noticed earlier ... (1)

gl4ss (559668) | about a year ago | (#44205533)

This isn't a technical problem -- this is management having shitty project management skills. If the budget is insufficient, then the project scope has to be reduced. It's just that simple. This is not the engineers' fault, or is it the fault of the technology... this is management trying to do too much with too little.

It must be nice to live in your black and white world, but the rest of us live in the real world where engineering, budget, and schedule tradeoffs are a reality.

well yeah.. unless you're building something to go to space. I suppose someone knew that the problem wouldn't make patching impossible.

of course if benefits from the project are totally immeasurable then it doesn't matter that much that the thing might waste the entire budget if it doesn't work.

Re:Sounds like this was noticed earlier ... (0)

Anonymous Coward | about a year ago | (#44205831)

even things that go into space have to balance budget, time, and performance.

One might even contend, with good theoretical and empirical backup, that it is impossible to have a perfect system, so you're going to launch with "some" problems, regardless. As long as none of them is a "mission killer" then you're good to go.

Re:Sounds like this was noticed earlier ... (1)

DerekLyons (302214) | about a year ago | (#44207341)

It must be nice to live in your black and white world, but the rest of us live in the real world where engineering, budget, and schedule tradeoffs are a reality.

well yeah.. unless you're building something to go to space.

Well, no, Even things that are being built to go into space suffer from the same limitations as anything else.

Re:Sounds like this was noticed earlier ... (1)

arth1 (260657) | about a year ago | (#44207439)

Richard Feynman chewed their ass out in Appendix F of the Challenger report to congress, and it was so scathing that both Congress and NASA tried to kick him off the board and discard his results... prompting the entire senior engineering staff of all branches of the Shuttle project to sign a petition saying: Either publish this, or face our wrath.

I have a hard time believing that Feynman wrote it, or that it wasn't re-written by someone else before it was published. Read this (emphasis mine):

"A more reasonable figure for the mature rockets might be 1 in 50. With special care in the selection of parts and in inspection, a figure of below 1 in 100 might be achieved but 1 in 1,000 is probably not attainable with today's technology. (Since there are two rockets on the Shuttle, these rocket failure rates must be doubled to get Shuttle failure rates from Solid Rocket Booster failure.)"

I just can't believe that Feynman of all people would make an elementary mistake like this. You don't get the risk of failure with two risks by doubling. You multiply the inverses. For the reasonable figure of 2% failure rate, the risk with two engines would be 3.96%, not 4%.

Feynman of all people would know this, and if simplifying it for the unwashed masses, I have a hard time believing he would claim that you get the result by doubling, but more likely use words like "in this case, approximately double".

Re:Sounds like this was noticed earlier ... (0)

Anonymous Coward | about a year ago | (#44207557)

I wouldn't be so sure. If x 1, then (1-x)^2 ~= 1 - 2x. It would be Feynman's style to simplify his message, even if it meant the loss of a bit of precision.

Re:Sounds like this was noticed earlier ... (1)

arth1 (260657) | about a year ago | (#44207631)

I wouldn't be so sure. If x 1, then (1-x)^2 ~= 1 - 2x. It would be Feynman's style to simplify his message, even if it meant the loss of a bit of precision.

Not to the point of being factually incorrect, especially in a context where the difference between understanding it correctly or not is statistically significant.
The "since there are two rockets, this must be doubled" text implies that adding is the correct approach. That would mean that if launching shuttles 25 times with an 1:50 risk, there's a cumulative 100% risk of failure. That's obviously not the case, and the error is significant (the risk of failure would be around 63.6%, not 100%)

I find it much more likely that a proofreading well-meaning middle management guy struck out a word like "approximately".

Re:Sounds like this was noticed earlier ... (0)

Anonymous Coward | about a year ago | (#44210913)

I wouldn't be so sure. If x 1, then (1-x)^2 ~= 1 - 2x. It would be Feynman's style to simplify his message, even if it meant the loss of a bit of precision.

What kinda math is that? Call me stupid but that alone looks complicated.(Im just a lowly Hospital Corpsman)

Re:Sounds like this was noticed earlier ... (0)

Anonymous Coward | about a year ago | (#44209855)

Have you worked around physicists or even taken any upper level physics courses? For a vast number of physics fields, applying theory and equations to real world situations and getting solutions often requires figuring out what can be linearized and approximated or what value is small (or very large) and can be expanded about. Feynman was familiar with this, considering his work in QED which is a massive pile of perturbative approximations. Sometimes it gets used as an advanced level Fermi problem type approach, to get rough scaling of a problem, other times it gets used for very high precision work. In this case, the difference between a 3.96% failure and a 4% failure rate is going to be pretty insignificant until you start having thousands of trials, while the difference between 2% and 4% could make a difference fewer than a hundred. The doubling won't be off by more than 10% until you get to failure rates above 20%. Wasting too much time fighting over the difference between 3.96% and 4% kind of misses the frequent lack of precision in such numbers and is part of the mindset that caused some of the problems with estimating failure rates in the first place.

Re:Sounds like this was noticed earlier ... (1)

arth1 (260657) | about a year ago | (#44212649)

In this case, the difference between a 3.96% failure and a 4% failure rate is going to be pretty insignificant until you start having thousands of trials, while the difference between 2% and 4% could make a difference fewer than a hundred. The doubling won't be off by more than 10% until you get to failure rates above 20%. Wasting too much time fighting over the difference between 3.96% and 4% kind of misses the frequent lack of precision in such numbers and is part of the mindset that caused some of the problems with estimating failure rates in the first place.

Bzzt, wrong. The difference between multiplying the failure rate for one device with the number of devices and multiplying the inverses very quickly becomes significant. Using the "reasonable" failure rate of one in fifty, we get:
1 launch (2 rockets): 4% vs 3.96%
10 launches: 40% vs 33.2%
25 launches: 100% vs 63.4%
That is a significant difference, for a moderate number of launches.

Again, I cannot believe that Feynman would have written something that would sound plausible to those who don't know statistics, but is plain misleading, and that someone must have edited it, simplifying without understanding the implication.

Re:Sounds like this was noticed earlier ... (1)

sjames (1099) | about a year ago | (#44203775)

I have to wonder these days if including a BMC that CAN re-flash through JTAG remotely might have become practical. While it is extra weight and we'd need a hardened BMC, it's not like it has to have much performance as long as it runs at all. Given that it could save a mission it could be a good trade-off.

Re:Sounds like this was noticed earlier ... (1)

Darinbob (1142669) | about a year ago | (#44206417)

Most things I've worked on have watchdog circuits or software. They're almost essential, since you will never be able to find every possible bug or account for every exceptional condition. Watchdogs are the system equivalent of customer support telling you to reboot and see if that fixes it. Here on earth it can sometimes be amazingly difficult just to patch something ten miles away.

Re:Sounds like this was noticed earlier ... (1)

Google Fanboys (2974975) | about a year ago | (#44203485)

There's just so much you can do. Eventually you just have to leave it and hope it works.

Re:Sounds like this was noticed earlier ... (0)

Anonymous Coward | about a year ago | (#44203497)

You would and such a glitch would have been handed to the hardware to to look into who would have said "no glitch found, possibly software related"
Software would not acknowledge it and would stay convinced it was hardware.

The good old battle of hardware vs software in a mixed team...

Re:Sounds like this was noticed earlier ... (1)

Cammi (1956130) | about a year ago | (#44203503)

Sounds like a management decision.

Re:Sounds like this was noticed earlier ... (0)

Anonymous Coward | about a year ago | (#44203517)

From TFA: "Engineers later confessed that system resets had occurred during pre-flight tests. They put these down to a hardware glitch and returned to focusing on the mission-critical landing software"

This is actually a perfect excuse used by software developers. "It's not me! My software is perfect! Must be your faulty hardware". I've seen it again and again and again. Just look at the Linux mailing list for some easy examples. (this is certainly not limited to Linux or kernel, or databases, etc.) Heck, look at the problem with some cars and "sudden acceleration syndrome" - it took them *years* to figure out that "brake should override the gas pedal!".

There should be 2 possible solutions to these comments,

    1. the person saying it is hardware, should show it is hardware and then workaround the problem, or
    2. stop commenting because all they want is easy way out

Re:Sounds like this was noticed earlier ... (0)

Anonymous Coward | about a year ago | (#44203761)

Actually, the software group probably submitted a bug report which was assigned to the hardware department. They resolved the bug report as "not reproducible" and closed it out. And the management reviewing the corrective action reports decided not to follow up (which happens a lot because they want to keep their numbers down). This is a classical corrective action program mistake where the problem isn't fixed but the bug report is closed because the assigned group couldn't find the problem.

Re:Sounds like this was noticed earlier ... (1)

dragonsomnolent (978815) | about a year ago | (#44203817)

Sudden acceleration syndrome was chalked up to operator error. Even revving the engines to full, the brakes supplied more than enough stopping force, every single time. So even from the get go, brakes were over-powering the engine (brakes work so well, in fact, that they had to invent a technology to make them not work so well it's called anti-lock brakes).

Re:Sounds like this was noticed earlier ... (0)

Anonymous Coward | about a year ago | (#44203919)

ABS was made for ice, not because someone would hypothetically hit the gas and brakes at the same time. And even if they did, what do you think ABS would do? If the brakes were stronger than the engine, then it would release the brakes when the wheels locked up, which is the opposite of what you want in that case.

Re:Sounds like this was noticed earlier ... (1)

dragonsomnolent (978815) | about a year ago | (#44204081)

ABS was to stop the brakes from locking up in the first place, regardless of road condition (in other words, brakes work so well they had to invent a way to interrupt them). If you hold your brake pedal down and stomp the gas, your car will stay stationary, if you do it long enough (assuming you have an automatic transmission) You'll blow out your torque converter, your transmission or your engine (depending on the weak link) if you do it in a manual transmission, you'll kill the engine. You missed the point of my post entirely, that the brakes always overpower the engine, that they are so capable of stopping those wheels from spinning, that they had to invent a way to make them not do the job so well (and still be safe enough to drive). ABS isn't as simple as "wheels not spinning, release brakes".

Re:Sounds like this was noticed earlier ... (1)

Anonymous Coward | about a year ago | (#44204597)

This is the biggest load of bullshit I have ever read on Slashdot.

Go ahead on provide a citation on how ABS is supposed to protect engines from braking.

Re:Sounds like this was noticed earlier ... (0)

Anonymous Coward | about a year ago | (#44209971)

The other AC is correct, this is complete BS. Check it out with an actual car, or if lazy, look at a service manual. Most ABS systems won't kick in below some minimum speed. ABS will do nothing if you are standing still, and using the brakes to hold the car still against the engine.

Re: Sounds like this was noticed earlier ... (0)

Anonymous Coward | about a year ago | (#44204515)

Erm no, not always.

Worked on a production system where our inexplicable resets DID turn out to be a fault with the motherboard's power management hardware. The bug would be triggered around a certain amount of system CPU load (not too much, not too little). It was blanket attitudes like yours that meant we had to fight hard to get this acknowledged and resolved.

On a team working on a product, this kind of turf war attitude is stupid.

good choise. (0)

Anonymous Coward | about a year ago | (#44203961)

I too would prioritise the landing software above a rare and recoverable reset after it's standing save and sound on the surface of Mars.

Re:Sounds like this was noticed earlier ... (1)

FatdogHaiku (978357) | about a year ago | (#44204695)

From TFA: "Engineers later confessed that system resets had occurred during pre-flight tests. They put these down to a hardware glitch and returned to focusing on the mission-critical landing software"

Very surprised by this ... even if a hardware glitch, wouldn't you want to track that down before launch? Especially since in the harsh space environment (bit flops even with hardened RAM/CPU), you want your hardware to be as reliable as possible.

Perhaps they were thinking about that sweet sweet mileage charge for a service call?

Re:Sounds like this was noticed earlier ... (0)

Anonymous Coward | about a year ago | (#44205653)

Launch windows and all that.

Re:Sounds like this was noticed earlier ... (1)

v1 (525388) | about a year ago | (#44209891)

Inter-planetary travel imposes inflexible deadlines. Planetary alignment dictates when your launch windows are, and they are frequently several years apart. Compare it with the space shuttle for example, where you can get a launch window everyday or two and have lives at risk. Project planning on an inter-planetary launch spreads out over years. If your part of the project starts getting behind, and it's not something you can fix by simply throwing more resources at it, you have to prioritize so you don't miss your launch window.

A non-fatal glitch that only showed up once during testing, that you have nothing to go on, quickly takes a back seat to engineering making sure it doesn't "pull a Beagle" when it arrives. One-off anomalies in testing are only tackled (and not all of them are necessarily resolved) when all critical systems are verified bulletproof.

There is one other angle often overlooked here that I'm surprised wasn't addressed. On a launch like this, where it typically takes over a year for the mission to reach the destination, this is when those weird glitches should be getting identified and patched on the twin unit in the lab, when you can invest all your ground resources into software testing and improvements. Then software updates are sent to the rover en-route, or shortly after landing. I don't know why this didn't happen. From the sounds of it though, the testers may not have properly documented the glitch. It kinda sounds like they had to talk to them when the problem surfaced and get someone to fess up to having seen a problem like this during testing that didn't get documented. One person covering up a glitch in their work to save face can cost millions of dollars and years of time. Someone ought to get canned for that.

Why are our tax dollars going for this crap? (-1)

Anonymous Coward | about a year ago | (#44203391)

Space exploration should be the domain of the private sector. The US government needs to stop doing multi-million boondoggles when it can't even afford to pay its interest payments to China, the #1 creditor.

Fix the economy, offer inducements for banks to lend to Americans rather than the banks feeling safer stashing their dough overseas, build prisons to handle overcrowding, get the guns off the streets (Venezuela did, and their crime rate is 1/100 of what it used to be just two years ago), and maybe after we get some commerce back, perhaps we can waste money on tossing crap to other planets for no significant economic return value.

Re:Why are our tax dollars going for this crap? (1)

MrBandersnatch (544818) | about a year ago | (#44203499)

I seem to recall reading that the return on investment for the US economy from the 1960s space program was something like 100-1. Today government investment in a space program acts as investment for private companies to develop new technologies - and I would be unsurprised to discover that the return is still not above 10-1 from an economic perspective.

If you really want to attack waste of money spending there are FAR better targets.

Re:Why are our tax dollars going for this crap? (1)

Impy the Impiuos Imp (442658) | about a year ago | (#44203509)

I remain convinced there should be a mod option of +1 Troll.

Re:Why are our tax dollars going for this crap? (1)

NoNonAlphaCharsHere (2201864) | about a year ago | (#44203547)

Moderation isn't near enough. That one was worthy of the Nobel Prize for Trolling.

Re:Why are our tax dollars going for this crap? (1)

xmundt (415364) | about a year ago | (#44203553)

Our tax dollars are going to these projects because private enterprise is unwilling to take up projects that will produce a guaranteed return for their investors. It is notably unwilling to take on risky projects, or, projects that do not have that clear return. Only an organization that has no profit motive (I.E. The federal government) is willing to invest the large sums in a project that might blow up during the boost phase of a launch. The fact is that the space program is quite profitable - The early years returned upwards of $14 for every $1 invested...yet, in spite of that, SpaceX is the ONLY private company that is interested in taking up the task. Even they are focusing on being truckers - providing transportation for other things into space..they have no interest in putting an exploration robot onto another planet, or sending a probe out into deep space to see what we can see.
          As for the current debt...this article does a good analysis of it: http://useconomy.about.com/od/monetarypolicy/f/Who-Owns-US-National-Debt.htm and, it shows that in terms of the overall debt, China holds about 10% of the total. less than half the debt is held by foreign countries and investors. Not that it means it is good that the country owes that much cash, by any means...but, it is not like China could come in and put a lien on the country....

Re:Why are our tax dollars going for this crap? (1)

meerling (1487879) | about a year ago | (#44204203)

Not to mention that the developments and data that is made available to the public and private industries by NASA and their space exploration and technology developments are responsible for a not insignificant chunk of the GNP. If it were private industry that were doing that, and they won't due to risk and unquantifiable short term return estimates, they would charge through the nose or hoard all that good stuff for themselves. The net result to the economy and human life would be negligible at best, and compared to our current status quo, a definite negative.

No, it's far better to have this stuff done by the government.

Re:Why are our tax dollars going for this crap? (0)

Anonymous Coward | about a year ago | (#44203629)

Your ignorance is showing. Interest rates are very low and the US has no trouble paying its interest. Take a look at the breakdown of federal spending. NASA is just noise in the federal budget. If you want to "fix the problem" you need to go after the big items, social security, Medicare/Medicaid, and the military. After that is unemployment and interest as you pointed out. Anyone who whines constantly about the discretionary part of the budget isn't thinking straight.

Ingeniously cryptic point! (1)

SinisterRainbow (2572075) | about a year ago | (#44203657)

Is this (another) failed sarcastic statement, or did I really just read that. I'm taking away the point that the world needs less true believers, and people need to stop writing sarcasm online.

Re:Why are our tax dollars going for this crap? (0)

Anonymous Coward | about a year ago | (#44204299)

Can you describe how "exploration" of a vacuum would motivate private interests?

This is how you get humans to other star systems.. (0)

Anonymous Coward | about a year ago | (#44203393)

Patching software systems is how you get humans to other star systems.

You send a ship, and if it takes 1000 years to get to the destination, that is ok because once it is there you can upload it a software upgrade that allows it to start growing humans from scratch on the remote planet.

Re:This is how you get humans to other star system (0)

Anonymous Coward | about a year ago | (#44204405)

1000 years? The fastest spacecraft we've ever launched won't reach Alpha Centauri for 40000 years, and that's only because it's not stopping.

Re:This is how you get humans to other star system (1)

kasperd (592156) | about a year ago | (#44205505)

The fastest spacecraft we've ever launched won't reach Alpha Centauri for 40000 years, and that's only because it's not stopping.

Leaving now isn't the fastest way to get there. You'd get there faster by waiting back on Earth for more efficient propulsion technology to be developed. So when is the right time to leave in order to get there as soon as possible? That is a question which can only be answered in retrospect. One day people can look and say, hey we could have been there already if only we had left in the year x.

In reality there are other goals as well, so getting humans there as soon as possible isn't desirable as that would fail at some other goals of such a mission. The first mission to reach another star is going to be an unmanned probe. Serious research into such a possibility is actually already happening. The distance unmanned probes have travelled from the Earth is several orders of magnitudes beyond the furthest any living being has ever travelled from the Earth. I don't imagine manned missions will catch up with that in a thousand years.

But on a longer time scale I actually do imagine manned missions will travel beyond autonomous probes. That of course depends on which happens first, mankind successfully colonizing another star system or mankind extinguishing itself. If the successful colonization happens first, I don't see what would be stopping this from continuing throughout the galaxy. Reaching the other side of the galaxy take such a long time that evolution will be at play, and whoever reach the other side first will have evolved into a mindset where it seems entirely natural that once you have colonized a planet you breed for long enough to produce enough humans for the mission to the next star system. At that time the fatality rate of such missions would be high, but evolution would reward sending out a thousand missions even if only two survives rather than safely staying put until the first mission can be done safely.

If we do reach the other side of the galaxy, it will be with a manned mission intending to colonize yet another star system. Antonymous probes from the Earth will never get that far. What will mankind do, once there are no more habitable star systems left in this galaxy? I guess some crazy attempts at reaching other galaxies. But none of this will happen in my lifetime anyway, so we'll never know if my guesses are even remotely correct.

Re:This is how you get humans to other star system (1)

arth1 (260657) | about a year ago | (#44207501)

What will mankind do, once there are no more habitable star systems left in this galaxy? I guess some crazy attempts at reaching other galaxies.

One thing is certain - by that time, it would not be mankind, any more than what we are today can be called fishkind.

But I highly doubt that we'll get there. Evolution does not favour long term strategies unless those picking short term strategies die off.

Re:This is how you get humans to other star system (1)

kasperd (592156) | about a year ago | (#44208247)

One thing is certain - by that time, it would not be mankind, any more than what we are today can be called fishkind.

In terms of time passed it could be a shorter period than the time it took to evolve from monkeys into todays humans. Whether we will use the term human about all descendants of humans is a matter of definition. The changes in culture and technology are likely to be greater than the changes in genome. But all of the changes would be subject to evolutionary selection.

But I highly doubt that we'll get there. Evolution does not favour long term strategies unless those picking short term strategies die off.

The two are not mutually exclusive. Earth would still be populated by humans favouring strategies suitable for life on Earth. Compared to the total size of the human population it only takes a small number of individuals to seed colonization across the galaxy. And other star systems would be dominated by those who are ready to take the risks involved in interstellar travel.

Re:This is how you get humans to other star system (1)

arth1 (260657) | about a year ago | (#44209763)

In terms of time passed it could be a shorter period than the time it took to evolve from monkeys into todays humans.

Humans did not evolve from monkeys. Monkeys and humans evolved from a common haplorhini ancestor which was neither monkey nor human, around 40 million years ago. Your typical random monkey has undergone as much evolution since then as the typical human has.

Journeys to other solar system would take an enormous amount of years. So much so that it probably won't happen with live crews. Sending DNA records and reconstructing life at the destination might be the best bet. But even if we used the same blueprint for seeding millions of stars, evolution would occur on all of them, and by the time any of the descendants could meet (but why would they?), they would have evolved so much in different directions that meeting a cousin from the stars might be like meeting a trout.

The two are not mutually exclusive. Earth would still be populated by humans favouring strategies suitable for life on Earth. Compared to the total size of the human population it only takes a small number of individuals to seed colonization across the galaxy. And other star systems would be dominated by those who are ready to take the risks involved in interstellar travel.

But what's the short term reward that would prevent the migration-friendly to be selected against, as they undoubtedly will have to sink resources into their long term goals before even leaving? What makes us think they'd be favoured enough to be allowed to leave before going extinct?

Re:This is how you get humans to other star system (1)

kasperd (592156) | about a year ago | (#44211989)

Monkeys and humans evolved from a common haplorhini ancestor which was neither monkey nor human

If you could reconstruct pictures of what they looked like, I bet the majority of people would classify it as a monkey if you showed them the picture. And chances are nobody would classify it as a fish.

Journeys to other solar system would take an enormous amount of years. So much so that it probably won't happen with live crews.

If you could build propulsion capable of delivering 1G of acceleration continuously for the entire duration of the flight, then I think you could do it in a lifetime. As a nice side effect that would provide artificial gravity during the flight. Achieving that would require a much higher specific impulse than any current propulsion technology could deliver. You'd actually need a powerful particle accelerator onboard the craft.

I haven't done all of the math, but it would be interesting to see a comparison of the specific impulse needed with the speeds delivered by current accelerator rings.

Even if you couldn't do the trip in a lifetime, that just means you need sustainable recycling on board capable of supporting life for enough generations to make the trip there. Which is probably easier than growing and raising a child without any humans around.

Amazing... (0)

Anonymous Coward | about a year ago | (#44203405)

We can patch hardware on another planet...

Yet we can't get commercial stuff we paid money for updated and working correctly down here on earth half the time.

I've dealt with a related problem (1)

Anonymous Coward | about a year ago | (#44203425)

Fixing code written by someone from a different planet.

Re:I've dealt with a related problem (1)

Anonymous Coward | about a year ago | (#44203493)

At my job we refer to them as Indian contractors.

Usually subcontractors working for IBM.

Priority inversion (0)

Anonymous Coward | about a year ago | (#44203441)

Who the hell writes a modern, threaded RTOS and doesn't account for priority inversion?

Re:Priority inversion (1)

Alex Belits (437) | about a year ago | (#44207511)

VxWorks is hardly "modern".

Re:Priority inversion (0)

Anonymous Coward | about a year ago | (#44210145)

You're going to be sporting that "-1" default moderation for a long time if you don't stop trolling. It's a simple lesson, should be easy for anyone to learn, even a child, but it seems to be difficult for you.

Re:Priority inversion (1)

IWannaBeAnAC (653701) | about a year ago | (#44207707)

The OS included priority inheritance mutexes, but IIRC the developers decided not to use them for reasons of 'efficiency'. Presumably it takes an extra cycle or two to lock a priority-inheriting mutex...

Re:Priority inversion (0)

Anonymous Coward | about a year ago | (#44209387)

Given the failure to use a priority inheritance mutex caused an actual deadlock, I'd say the authors of VxWorks screwed up.

Let me guess... (0)

Anonymous Coward | about a year ago | (#44203461)

The communications task priority was expressed as in decimal, while the data gathering priority was coded in hex.

Priority inversion bug (4, Interesting)

BitZtream (692029) | about a year ago | (#44203481)

This problem is known as priority inversion. Its a common concern in schedulers when critical functions run in their own threads. Its something that they should have known about and tested against. Or they could have used more traditional IO approaches and let the VxWorks IO system, which already has protection against priority inversion by design, do its job.

Re:Priority inversion bug (0)

Anonymous Coward | about a year ago | (#44203825)

I'm not familiar with VxWorks capabilities from back then (or even now, really), but perhaps JPL engineers needed to use a more efficient locking protocol, such as Ted Baker's Stack Resource Policy (SRP)? It's the best real-time locking protocol for uniprocessor fixed-priority systems, but it's not always supported by an RTOS.

Re:Priority inversion bug (1)

k8to (9046) | about a year ago | (#44208991)

The issue was simple.

In VxWorks, when you create pipes for IPC, you get to choose what kind of semaphore you want, because people want the shortest deadlines possible (at least classically).

JPL selected simple mutexes, which led the priority inversion.

Pipes were, generally speaking, far less well exercised in the codebase at the time, and Wind River engineers explicitly advised the use of message queues which would offer the necessary functionality and would not have had the problem encountered.

Red Planet? (1)

Cammi (1956130) | about a year ago | (#44203501)

Do people still call this the red planet? lol

Re:Red Planet? (1)

Smivs (1197859) | about a year ago | (#44203887)

On soviet Mars.......oh, never mind!

Re: Red Planet? (0)

Anonymous Coward | about a year ago | (#44203923)

Yeah those in the civilian space program do .. Or else hahaha

Re:Red Planet? (1)

maxwell demon (590494) | about a year ago | (#44204523)

Did its colour change?

Re:Red Planet? (1)

gl4ss (559668) | about a year ago | (#44205545)

Do people still call this the red planet? lol

well, yeah. it looks red, sort of. at least from here.

remake of total recall totally copped out though..

It's Very Simple! (0)

Anonymous Coward | about a year ago | (#44203557)

"Scissors cuts paper, paper covers rock, rock crushes lizard, lizard poisons Spock, Spock smashes scissors, scissors decapitates lizard, lizard eats paper, paper disproves Spock, Spock vaporizes rock, and - as it always has - rock crushes scissors."

Boring story (2)

Hentes (2461350) | about a year ago | (#44203617)

Seriously? This reads like morality tale for beginner programmers. "Remember kids, always check the settings of your mutexes!"
Will we also have articles about NASA engineers mistyping == for = ? Everyone makes mistakes, just because it happened in a rover doesn't make it interesting.

Re:Boring story (1)

wvmarle (1070040) | about a year ago | (#44203659)

The interesting part is not so much that people make mistakes, it is how they are solved.

Re:Boring story (1)

Hentes (2461350) | about a year ago | (#44203733)

Not really, most bugs are easy to fix once found. Especially trivial ones like this.

Re:Boring story (1)

wvmarle (1070040) | about a year ago | (#44207215)

Of course, it's the process how to find the bug, and later update the remote device, that's interesting. I know fixing bugs is often just a few keystrokes - after spending hours or days searching for the cause.

Re:Boring story (1)

Hentes (2461350) | about a year ago | (#44211049)

But there's nothing in the article about how it was found.

Re:Boring story (5, Interesting)

Antique Geekmeister (740220) | about a year ago | (#44203683)

Actually, it's very interesting. It shows that even with the very extensive testing and layers of planning and managerial processes to prevent such errors, they can still creep in. And it shows that very expensive, one-off projects remain vulnerable to subtle design errors, so the tools to do field updates are _critical_.

Note that designing for spacecraft can be a real artform: they have extremely limited computational resources, due to the inherent risks of bit errors in increasingly small modern silicon exposed to radiation and temperature changes, and you cannot simply shield the electronics: the shielding adds weight and itself becomes radioactive over time. So you often wind up using quite old but far more stable technologies. That means tools that may be considered quite obsolete by the time your design phase is complete and the device is ready for launch. And by the time it arrives _on Mars_, the techonology is very obsolete indeed.

My respect for the programmers and designers of interplanetary spacecraft is enormous: systems like Voyager and the Mars Rover, Spirit, that exceed their lifespans by years fill me with pride as an engineer that we could build so well. And the obligatory XKCD on the subject:

        http://www.xkcd.com/695/ [xkcd.com]

Re:Boring story (1)

Hentes (2461350) | about a year ago | (#44203805)

Actually, it's very interesting. It shows that even with the very extensive testing and layers of planning and managerial processes to prevent such errors, they can still creep in. And it shows that very expensive, one-off projects remain vulnerable to subtle design errors, so the tools to do field updates are _critical_.

That's true, but has been known for a while.

Note that designing for spacecraft can be a real artform: they have extremely limited computational resources, due to the inherent risks of bit errors in increasingly small modern silicon exposed to radiation and temperature changes, and you cannot simply shield the electronics: the shielding adds weight and itself becomes radioactive over time. So you often wind up using quite old but far more stable technologies. That means tools that may be considered quite obsolete by the time your design phase is complete and the device is ready for launch. And by the time it arrives _on Mars_, the techonology is very obsolete indeed.

That is indeed an interesting topic, and I wouldn't have complained if the article talked about that. But it was just a generic description of a common error with almost no details about the actual system. I didn't say that I don't respect the engineers working on the project. Even the best minds make simple errors. It just doesn't make for a good story.

Re:Boring story (1)

MatthiasF (1853064) | about a year ago | (#44204019)

All the more reason that public, non-military projects like this should have everything open sourced.

Had the hardware and software platforms both been open sourced and available to the public, they would have had a lot more hands and eyes helping to correct these issues.

The only way we're going to get off this planet is with mutual cooperation, and I think that should start between the public sector and..well..the public.

open source? (0)

Anonymous Coward | about a year ago | (#44205875)

Interestingly, I've been involved in open source development for spaceflight software, and I don't see legions of users coming forward with the defects uncovered by their many eyes, and fixes produced by their busy fingers, notwithstanding that the software is available open source to U.S. Person who asks for it.

There is also the open source operating system RTEMS, which is also at Mars, as well as the closed source VxWorks.

Re:Boring story (0)

Anonymous Coward | about a year ago | (#44208219)

You're prepared to guarantee that, are you? This isn't an audio driver for a brand X wintel system. An open-source approach can't guarantee success any more than the current system, and I can't see any government, let alone the US congress assigning funds to space exploration where the software is supplied by l33tcoder@whoop.de.do. Would you be prepared to strap yourself into the nosecone of an open-sourced saturn-V equivalent?

Old story, lots of details published years ago (0)

Doug Jensen (691112) | about a year ago | (#44203641)

This was documented in great detail by a number of people years ago.

Mars Code (3, Interesting)

Anonymous Coward | about a year ago | (#44203663)

At the USENIX "Hot Topics in System Dependability 2012" conference Gerard Holzmann of JPL labs gave a fantastic talk [usenix.org] about how they developed the software for the Curiosity rover. (spoiler: Having to display a Bieber poster in your cubical if break the nightly build, is a great motivator.)

mod 3Own (-1)

Anonymous Coward | about a year ago | (#44203801)

you canC. When the

you FAIL It (-1)

Anonymous Coward | about a year ago | (#44204041)

NASA tradition (0)

Anonymous Coward | about a year ago | (#44204127)

"(Engineers later confessed that system resets had occurred during pre-flight tests. They put these down to a hardware glitch and returned to focusing on the mission-critical landing software.)"

Engineering malpractice, but unfortunately common. It comes from shifting the burden of proof from proving that it will work, to proving that it will fail. It also cost a couple space shuttles.

Watchdog timer? (1)

PPH (736903) | about a year ago | (#44204361)

Are they certain it wasn't just the person on the tech support line who suggested rebooting it?

The problem was well known when the story was new (5, Interesting)

Cryptosmith (692059) | about a year ago | (#44204581)

This is a rambling bit of history. Move on if that's not your thing. I love reading about problems like the the Pathfinder problems. Trust me - such things often happen on Earth-bound systems, too.

Back in '79, I was working on a multiprocessing router for the ancient ARPANET. At the time the net had over sixty routers distributed across the continent. Actually we called them "imps" - well, "IMPS" but I'll use the modern term "router." We had a lot of the same problems as Pathfinder without ever leaving the atmosphere.

By then all ARPANET routers were remotely maintained. They all ran continuously and we did all software maintenance in Cambridge, MA. By then the basic software was really reliable. They rarely crashed on their own, and we mostly sent updates to tweak performance or to add new protocol features. Once in a while we'd have to use a "magic modem" message to restart a dead machine and to reload things. The software rarely broke so badly that we'd have to have someone on-site load up a paper tape. So remote maintenance was well established by then.

The multiprocessor didn't run "threads" it ran "strips." Each was a non-preemptive task designed to execute quickly enough not to monopolize the processor. If you wrote software for a Mac before OS-X, you know how this works. A multi-step process might involve a sequence of strips executed one after the other.

Debugging the multiprocessor code was a bit of a challenge because we could lock out multi-step processes in several different ways. While we could put our test router on the network for live testing, this didn't guarantee that we'd get the same traffic the software would get at other sites. For example, we had software to connect computer terminals directly to hosts through the router (the original "terminal access controllers"). This software ran at a lower priority than router-to-router packet handling. It was possible for a busy router to give all the bandwidth to the packets and essentially lock out the host traffic. Such problems might not show up until updated software was loaded into a busy site.

Uploading a patch involved assembly language. We'd generally add new code virus style. First you load the new code into some spare RAM. Once the code is loaded, we patch the working program so that it jumps to the patch the next time it executes. The patch jumps back to an appropriate spot in the program once the new code has executed. We sent the patches in a series of data packets with special addressing to talk to a "packet core" program that loaded them.

The bottom line: it's the sort of challenge that kept a lot of us working as programmers for a long time. And they pop up again every time someone starts another system from scratch.

Re:The problem was well known when the story was n (0)

Anonymous Coward | about a year ago | (#44209509)

Ugh. I can only infer that the 'strips' you ran, if they needed to do something more complex than a single strip in aggregate, sometimes had to be backed by a state machine. What a headache. Arg. ;-P

Re: The problem was well known when the story was (1)

Cryptosmith (692059) | about a year ago | (#44212893)

At the time it seemed virtuous to implement state machines. One guy did his phd by building a mechanism that did coroutining - the programmer could write out the whole procedure and stick in the strip breaks after the fact. I suppose someone did something like that for the Mac, tho I stopped writing Mac code before seeing such a thing.

News? (1)

wonkey_monkey (2592601) | about a year ago | (#44204819)

More like "olds," am I right? Huh? Ahhh.

Re:News? (1)

SwedishCoward (1838398) | about a year ago | (#44208067)

I remember a lecturer telling exactly this story when I took a real-time systems course circa 2001...

Re:News? (1)

matfud (464184) | about a year ago | (#44221043)

Far far older than that. It is not a new problem but it it a very persistent one. There are many ways to try and avoid the problem. Most do not work in practice. Priority inversion is quite tricky to deal with.

The archetypical priority inversion example (0)

Anonymous Coward | about a year ago | (#44205399)

This is the goto example of recent memory that I've always trotted out when asked about possible issues with task priority in embedded systems. I always note that while a more elegant solution might be to raise the holding task's priority to that of the highest priority task pending on the lock, the watchdog performed it's job exactly as it was supposed to do, and returned the system to a known recoverable state.

Even in this challenging scenario, there are still levels of 'good enough' we are prepared to accept.

Threads (1)

Old Wolf (56093) | about a year ago | (#44206751)

This is why you don't use threads for important stuff...

That wasn't Windows (0)

aglider (2435074) | about a year ago | (#44208663)

or any other Microsoft related OS thanks God.
Check for New Comments
Slashdot Login

Need an Account?

Forgot your password?

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>