×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Writing Code for Spacecraft

michael posted more than 9 years ago | from the carried-the-bits-uphill-one-by-one-in-the-snow dept.

Programming 204

CowboyRobot writes "In an article subtitled, "And you think *your* operating system needs to be reliable." Queue has an interview with the developer of the OS that runs on the Mars Rovers. Mike Deliman, chief engineer of operating systems at Wind River Systems, has quotes like, 'Writing the code for spacecraft is no harder than for any other realtime life- or mission-critical application. The thing that is hard is debugging a problem from another planet.' and, 'The operating system and kernel fit in less than 2 megabytes; the rest of the code, plus data space, eventually exceeded 30 megabytes.'"

cancel ×
This is a preview of your comment

No Comment Title Entered

Anonymous Coward 1 minute ago

No Comment Entered

204 comments

hmm... (2, Interesting)

opqdonut (768567) | more than 9 years ago | (#10875588)

I wonder will they be releasing the source. It could be an interesting read.

Re:hmm... (2, Informative)

Anonymous Coward | more than 9 years ago | (#10875638)

WindRiver ROTS (real time operating system) is painful to work with. There debugging environment is a nightmare and the cost of development and deployment is almost 3x that of an embedded linux. My little company just finished doing a trade study of the various ROTS kernels available and yes, thiers might be more reliable, but at a huge cost. Furthermore, performance wise, it just isn't to snuff vs say MercuryOS on a single CPU, let alone a multi CPU system.
As to releasing of thier source code? From Wind River? ROTFL!
1. 2. 3. 4. Profit???? (For a quick mod up)

Re:hmm... (2, Funny)

grub (11606) | more than 9 years ago | (#10875695)


i>and the cost of development and deployment is almost 3x that of an embedded linux

When a spacecraft millions of kilometers from Earth packs it in I'm sure a project leader at NASA would be happy they saved 2/3 of the price on a relatively small ticket item.

Re:hmm... (3, Insightful)

Richthofen80 (412488) | more than 9 years ago | (#10875837)

thiers might be more reliable, but at a huge cost.

Probably not as big a cost as losing a Mars rover because your OS wasn't reliable enough.

Chinese Threat: Keep the Source Code Secret! (-1, Flamebait)

Anonymous Coward | more than 9 years ago | (#10876109)

If NASA were to open source the code, then the Chinese would have access to the latest technology for remote-controlled drones. The Chinese are morally bankrupt [tibet.org] and would use the software to create remote-controlled war vehicles.

Sometimes national security requires that we keep certain things secret. At least, be glad that we in the West have a lock on the technology thus far.

Re:Chinese Threat: Keep the Source Code Secret! (1)

Performaman (735106) | more than 9 years ago | (#10876201)

Explain to me how the souce code for a computer designed to operate a slow-moving, 4 or 6 wheeled vehicle used to take pictures and to sample temprature, radiation and other scientefic data could be adapted for use on an aicraft with a crusing speed of about 84 miles per hour.
Also, China already has its own UAV. "China's armed forces have operated the Chang Hong (CH-1) long-range, air- launched autonomous reconnaissance drone since the 1980s. China developed the CH-1 by reverse-engineering US Firebee reconnaissance drones recovered during the Vietnam War. An upgraded version of the system was displayed at the 2000 Zhuhai air show and is being offered for export. A PRC aviation periodical reported that the CH-1 can carry a TV, daylight still, or infrared camera." (from http://www.globalsecurity.org/military/world/china /uav.htm [globalsecurity.org]

Re:Chinese Threat: Keep the Source Code Secret! (0)

Anonymous Coward | more than 9 years ago | (#10876461)

YHBT ;)

Re:Chinese Threat: Keep the Source Code Secret! (-1, Troll)

Anonymous Coward | more than 9 years ago | (#10876283)

Heh, the US (and definitely more than 51% of the country) is morally bankrupt! Get off your high horse! Finding examples of corruption are left to the reader, if you can't find any, congratulations, you're an idiot.

Re:Chinese Threat: Keep the Source Code Secret! (1)

Sj0 (472011) | more than 9 years ago | (#10876639)

I'd point out how stupid your arguement is, but I don't think I really have to. It speaks for itself.

Re:hmm... (3, Interesting)

The Vulture (248871) | more than 9 years ago | (#10876485)

Yes, and seeing as I'm currently working with embedded Linux, I can honestly say that it's a pain. (Note: I must preface this by saying that I am using Linux 2.4.18 for MIPS and my company is not using any sort of real-time extensions, just the bare 2.4.18 tree).

You get what you pay for... I've used VxWorks for a few years now, and while it does have it's share of problems, and while they are sometimes difficult to deal with, it is a great platform for development. You get much better control of the system as opposed to Linux (the main problem with using Linux in an embedded environment is the user to kernel relationship. It's solved neatly in vxWorks by getting rid of it (everything is in kernel space)). This works out very nicely for MIPS processors, which I deal with most of the time. Threading (or tasks as vxWorks has) is much better than Linux - you can at least somewhat guarantee when your tasks run, unlike with the default Linux scheduler.

I am very interested in trying QNX out, to see how it compares to vxWorks, one of these days.

-- Joe

Re:hmm... (5, Funny)

Infinityis (807294) | more than 9 years ago | (#10875745)

Not gonna happen, for one big reason. I could just see the Slashdot headline:

Mars Rover HaX0r3d and OS replaced with Linux.

Shortly thereafter, Micro$oft claims that they can enforce patent infringement on Mars...

Chinese Threat: Keep the NASA Code Secret! (-1, Troll)

Anonymous Coward | more than 9 years ago | (#10875948)

If NASA were to open source the code, then the Chinese would have access to the latest technology for remote-controlled drones. The Chinese are morally bankrupt and would use the software to create remote-controlled war vehicles.

Sometimes national security requires that we keep certain things secret. At least, be glad that we in the West have a lock on the technology thus far.

hard to imagine.. (2, Interesting)

Chuck Bucket (142633) | more than 9 years ago | (#10875598)

all software has bugs, what happens when 1/2 thru the trip they have an update? who installs remotely, and I guess having a sysop reboot is out of the question...

CBB

Re:hard to imagine.. (3, Informative)

brilinux (255400) | more than 9 years ago | (#10875612)

Actually, if I remember correctly, there was a problem with one of the rovers, and they had to re-flash it from millions of KM away. I am not sure whether they had a backup copy of the OS on the rover that would facilitate the re-flashing, or whether there was some patch that was transmitted, but I remember them talking about it on the news.

Re:hard to imagine.. (2, Interesting)

Cylix (55374) | more than 9 years ago | (#10875652)

They had a section of the flash memory go bad... so they patched a work around for those sectors if I remember correctly.

Re:hard to imagine.. (5, Interesting)

Vardamir (266484) | more than 9 years ago | (#10875692)

Yes, here is an email my OS prof sent our class on the subject:

Subject: What really happened on Mars Rover Pathfinder

The Mars Pathfinder mission was widely proclaimed as "flawless" in the early
days after its July 4th, 1997 landing on the Martian surface. Successes
included its unconventional "landing" -- bouncing onto the Martian surface
surrounded by airbags, deploying the Sojourner rover, and gathering and
transmitting voluminous data back to Earth, including the panoramic pictures
that were such a hit on the Web. But a few days into the mission, not long
after Pathfinder started gathering meteorological data, the spacecraft began
experiencing total system resets, each resulting in losses of data. The
press reported these failures in terms such as "software glitches" and "the
computer was trying to do too many things at once".

This week at the IEEE Real-Time Systems Symposium I heard a fascinating
keynote address by David Wilner, Chief Technical Officer of Wind River
Systems. Wind River makes VxWorks, the real-time embedded systems kernel
that was used in the Mars Pathfinder mission. In his talk, he explained in
detail the actual software problems that caused the total system resets of
the Pathfinder spacecraft, how they were diagnosed, and how they were
solved. I wanted to share his story with each of you.

VxWorks provides preemptive priority scheduling of threads. Tasks on the
Pathfinder spacecraft were executed as threads with priorities that were
assigned in the usual manner reflecting the relative urgency of these tasks.

Pathfinder contained an "information bus", which you can think of as a
shared memory area used for passing information between different components
of the spacecraft. A bus management task ran frequently with high priority
to move certain kinds of data in and out of the information bus. Access to
the bus was synchronized with mutual exclusion locks (mutexes).

The meteorological data gathering task ran as an infrequent, low priority
thread, and used the information bus to publish its data. When publishing
its data, it would acquire a mutex, do writes to the bus, and release the
mutex. If an interrupt caused the information bus thread to be scheduled
while this mutex was held, and if the information bus thread then attempted
to acquire this same mutex in order to retrieve published data, this would
cause it to block on the mutex, waiting until the meteorological thread
released the mutex before it could continue. The spacecraft also contained
a communications task that ran with medium priority.

Most of the time this combination worked fine. However, very infrequently
it was possible for an interrupt to occur that caused the (medium priority)
communications task to be scheduled during the short interval while the
(high priority) information bus thread was blocked waiting for the (low
priority) meteorological data thread. In this case, the long-running
communications task, having higher priority than the meteorological task,
would prevent it from running, consequently preventing the blocked
information bus task from running. After some time had passed, a watchdog
timer would go off, notice that the data bus task had not been executed for
some time, conclude that something had gone drastically wrong, and initiate
a total system reset.

This scenario is a classic case of priority inversion.

HOW WAS THIS DEBUGGED?

VxWorks can be run in a mode where it records a total trace of all
interesting system events, including context switches, uses of
synchronization objects, and interrupts. After the failure, JPL engineers
spent hours and hours running the system on the exact spacecraft replica in
their lab with tracing turned on, attempting to replicate the precise
conditions under which they believed that the reset occurred. Early in the
morning, after all but one engineer had gone home, the engineer finally
reproduced a system reset on the replica. Analysis of the trace revealed
the priority inversion.

HOW WAS THE PROBLEM CORRECTED?

When created, a VxWorks mutex object accepts a boolean parameter that
indicates whether priority inheritance should be performed by the mutex.
The mutex in question had been initialized with the parameter off; had it
been on, the low-priority meteorological thread would have inherited the
priority of the high-priority data bus thread blocked on it while it held
the mutex, causing it be scheduled with higher priority than the
medium-priority communications task, thus preventing the priority inversion.
Once diagnosed, it was clear to the JPL engineers that using priority
inheritance would prevent the resets they were seeing.

VxWorks contains a C language interpreter intended to allow developers to
type in C expressions and functions to be executed on the fly during system
debugging. The JPL engineers fortuitously decided to launch the spacecraft
with this feature still enabled. By coding convention, the initialization
parameter for the mutex in question (and those for two others which could
have caused the same problem) were stored in global variables, whose
addresses were in symbol tables also included in the launch software, and
available to the C interpreter. A short C program was uploaded to the
spacecraft, which when interpreted, changed the values of these variables
from FALSE to TRUE. No more system resets occurred.

ANALYSIS AND LESSONS

First and foremost, diagnosing this problem as a black box would have been
impossible. Only detailed traces of actual system behavior enabled the
faulty execution sequence to be captured and identified.

Secondly, leaving the "debugging" facilities in the system saved the day.
Without the ability to modify the system in the field, the problem could not
have been corrected.

Finally, the engineer's initial analysis that "the data bus task executes
very frequently and is time-critical -- we shouldn't spend the extra time in
it to perform priority inheritance" was exactly wrong. It is precisely in
such time critical and important situations where correctness is essential,
even at some additional performance cost.

HUMAN NATURE, DEADLINE PRESSURES

David told us that the JPL engineers later confessed that one or two system
resets had occurred in their months of pre-flight testing. They had never
been reproducible or explainable, and so the engineers, in a very
human-nature response of denial, decided that they probably weren't
important, using the rationale "it was probably caused by a hardware
glitch".

Part of it too was the engineers' focus. They were extremely focused on
ensuring the quality and flawless operation of the landing software. Should
it have failed, the mission would have been lost. It is entirely
understandable for the engineers to discount occasional glitches in the
less-critical land-mission software, particularly given that a spacecraft
reset was a viable recovery strategy at that phase of the mission.

THE IMPORTANCE OF GOOD THEORY/ALGORITHMS

David also said that some of the real heroes of the situation were some
people from CMU who had published a paper he'd heard presented many years
ago who first identified the priority inversion problem and proposed the
solution. He apologized for not remembering the precise details of the
paper or who wrote it. Bringing things full circle, it turns out that the
three authors of this result were all in the room, and at the end of the
talk were encouraged by the program chair to stand and be acknowledged.
They were Lui Sha, John Lehoczky, and Raj Rajkumar. When was the last time
you saw a room of people cheer a group of computer science theorists for
their significant practical contribution to advancing human knowledge?
It was quite a moment.

POSTLUDE

For the record, the paper was:

L. Sha, R. Rajkumar, and J. P. Lehoczky. Priority Inheritance Protocols: An
Approach to Real-Time Synchronization. In IEEE Transactions on Computers,
vol. 39, pp. 1175-1185, Sep. 1990.

mutex's always cause trouble (3, Interesting)

hey (83763) | more than 9 years ago | (#10875984)

In my experience mutex's, semaphores, etc always cause trouble. There is nearly always another way to write things.

And you'll never ever seem me coding an infinite wait for a mutex. That's just asking for trouble.

Bad: in Windows, FindNextChangeNotification()
requires those IPC operations and I always gives me grief.

Good: The Linux File Activity Monitor (FAM). Lets you open and read a pipe of actions. Nice!

Re:hard to imagine.. (0)

Anonymous Coward | more than 9 years ago | (#10876164)

Fourtunately they were lucky enough to reproduce the glitch on earth, unfourtunately the problem was a design mistake (not preserving thread inheritance).
It's so 'real', so 'understandable' the image of software engineers dimissing a spurious error as a 'hardware glitch'!, let's talk about software QA!



AC by slashdot still blocking my ISP's Inktomy server.

Re:hard to imagine.. (4, Interesting)

AaronW (33736) | more than 9 years ago | (#10876217)

As someone who's worked with VxWorks for the last several years I'm surprised they didn't turn on priority inheritance to begin with for the semaphore. As a rule, we usually turn on priority inheritance for our mutex semaphores.

Other problems in the Mars Pathfinder were related to using the VxWorks filesystem. VxWorks basically only supports FAT on top of flash. For flash, FAT is a poor choice since some areas of the disk like the root directory and FAT tables will quickly wear out. Also, I don't think VxWorks has much support for working around bad sections of flash.

As far as VxWorks memory allocation support, in an ideal world one would statically allocate all memory, but oftentimes things are not ideal. In the product I work on, we have to have dynamic memory allocation, since depending on how the product is being used at the time, different data structures are required with no way of knowing beforehand how many of a particular type are needed, and this changes dynamically. For a simple device, it's easy to statically allocate everything, or if you have enough memory where you can statically allocate everything.

In our case, while we statically allocate memory where we can, however, in many cases we cannot. For example, I have to maintain a data structure keeping track of all of the network gateways connected to an output interface. We can have many thousands of gateways and thousands of output interfaces. There could be anything between one and thousands of gateways on an interface. In this case, I use static arrays for information on each gateway and each output interface, but must use dynamic data structures to list all the gateways connected to an output interface. It would be prohibitive to allocate storage for 30,000 gateways with 30,000 interfaces! I also can't use a linked list of gateways per interface since it doesn't scale, a linked list having access time O(n).

Also, we use third party libraries that perform dynamic memory allocation and it would be prohibitive to change that.

By replacing Wind River's malloc code with Doug Lea's code we eliminated fragmentation problems and saw our startup time jump from 50 minutes to 3 minutes. Doug Lea's malloc code is the basis of malloc in glibc and is very effecient. We also added support for tracing heap memory allocations to keep track of which task allocated a block and where it was allocated. This alone helped tremendously in tracking down a number of memory leaks since we can just walk the heap and see exactly where all the memory is being allocated. This is a sorely missing feature in VxWorks.

The lack of memory protection is another major problem for complex tasks. We have a bug we've spent weeks trying to track down the cause without any luck where random memory locations get corrupted.

Needless to say, all new projects where I work will not run on VxWorks. All of the chip vendors we're looking at are either dropping support for it or have already dropped it and are focusing on Linux.

BTW, this is one feature I would *REALLY* love to see added to Linux. The company I'm working for is looking at writing our next generation platform on top of an embedded Linux. We have not yet decided which one to use, but want something 2.6 based.

With priority inheritance, if a mutex is held by a low priority task and a high priority task tries to grab it, the low priority task is automatically boosted to the highest priority task that has attempted to acquire the semaphore. When the semaphore is released, the low priority task's priority is restored.

Some other nice features are interrupt scheduling and better priority based message passing support (which may already be present, I'm still looking into this).

Finally, one very useful feature would be the ability to guarantee a real-time thread a certain percentage of the CPU, with the option of placing a hard limit if it tries to exceed that or temporarily lowering it's priority to non-realtime so as to not starve non-realtime tasks. Timesys has something like that, but we don't need anything quite that fancy and it would be nice if it were in the mainline kernel.

We're using Timesys Linux for one of our current projects, but are rather disappointed in the support we received. They've been slow to upgrade their kernel beyond 2.4.18 for our particular CPU (one of the embedded PowerPC variants), which has caused a lot of problems since we've hit a lot of kernel bugs fixed in 2.4.20 and later. Also, they don't yet have realtime support for the 2.6 kernel.

Re:hard to imagine.. (1)

StefanoB (775596) | more than 9 years ago | (#10876449)

Have you looked into RTAI [rtai.org]? It's a kernel patch (even the 2.6 series) to get it real-time.

Greets,

Stefano

Efficiency (2, Funny)

Maxim Kovalenko (764126) | more than 9 years ago | (#10875613)

"The operating system and kernel fit in less than 2 megabytes; the rest of the code, plus data space, eventually exceeded 30 megabytes." This should be used as the example for efficient coding

Re:Efficiency (3, Interesting)

Omicron32 (646469) | more than 9 years ago | (#10875633)

That's all well and good, but don't forget that this kernel only has to interface with one set of hardware.

Things like the Linux kernel has to know about hundereds and thousands of different devices which is why it's so big.

Re:Efficiency (2, Insightful)

CarlDenny (415322) | more than 9 years ago | (#10875940)

Just to clarify, VxWorks runs on a hell of a lot of hardware, dozens of CPUs across all the major families, thousands of device drivers.

Now, any particular instance of the kernel gets compiled for a specific processor, and only includes the drivers it needs. Which does save on some space. But a lot of that extra space comes from things like a dynamic loader/loader, graphics packages, local shells (usually in multiple flavors,) and host of other applications that are "standard."

The thing that saves *that* space is the local WDB debugging agent. It lets you offload almost all of the bells and whistles to another machine, which does the object loading, provides your shell, does whatever debugging you need, then sends simple instructions ot he agent to carry them out, and generally dramatically increase the interface capabilities without increasing footprint.

Re:Efficiency (0)

Anonymous Coward | more than 9 years ago | (#10875637)

It could have been done smaller with DOS.

Re:Efficiency (4, Interesting)

Armchair Dissident (557503) | more than 9 years ago | (#10875740)

I used to write embedded applications using OS-9 (NOT MacOS 9) on 68000-based systems as a sub-contractor for Nuclear Electric (nuclear power stations company in the UK before it became BNFL). Our development system - complete with OS/Kernel and compilers - had only about a meg of memory; the final embeded systems often only had 512K if we were lucky

Okay, so this was some 14 years ago - but it was doing a lot of work. 2 megabytes is a lot of memory! There's a phenomenal amount of code and data that can be stored in 2 meg. Maybe it's good by current standards, but - personally - I would suggest that current standards is a bad place to start from.

Re:Efficiency (1)

Rattencremesuppe (784075) | more than 9 years ago | (#10875914)

I'm currently writing an application for a MSP430 microcontroller which has 60K flash and 2K (yes, 2048 BYTES) RAM.

(it doesn't have to land on Mars, though ;)))

Re:Efficiency (1)

JamesP (688957) | more than 9 years ago | (#10876017)

Pur-lease...

I wrote stuff for the PIC microcontroller 16F84. Thet's 4k of code and 68 bytes of RAM (yes, 68 bytes - a SMS Message can be bigger)

Re:Efficiency (1)

Infinityis (807294) | more than 9 years ago | (#10875920)

Well now, that all depends on how you define efficient. Some people would say efficient means compact code...others might say efficient code is written quickly. I mean, an efficient worker does a lot of work in a little time, would not the same standard apply for a software/OS developer?

Re:Efficiency (5, Informative)

Brett Buck (811747) | more than 9 years ago | (#10876011)

> "The operating system and kernel fit in less than 2
> megabytes; the rest of the code, plus data space,
> eventually exceeded 30 megabytes." This should be used as
> the example for efficient coding

You've GOT to be kidding, right? 2 meg of OS code? That's ULTRABLOAT compared to most spacecraft. In fact, for the vast majority of the space age, that would have exceeded the resources of the computer by several orders of magnitude.

I've done this kind of programming for a living (for 10 years, moved up to controls design) but the last system I programmed for has 372k of memory, total. That includes data, code, OS, everything. Runs at 432 KIPS. And it performs what it probably one of the most complex in-flight autonomous control operations ever.

Most are even more restrictive. For example, 8K of PROM and 1k of volatile memory (and 28 WORDS) of non-volatile memory. This more than adequate for most applications, if you do it right.

Many spacecraft OS's are more akin to this:

hardware interrupt
external electronics power up processor.
external electronics set PC = 80hex
run
{execute all the code}
halt
power down

Once every 1/4 a second for 15 years.

The project I am currently working on uses VxWorks (and so we were quite interested in the Mars Rover problem) and it's so bloated with unnecessary features it's absurd. This is not a Windows box, it's a spacecraft processor.

I can't argue with the 30 meg of data space. Using the memory as a data recorder would be quite useful and a good picture takes a lot of space. But it's alarming to me that you could figure out how to waste maybe 4-5 meg on code. If you started with a bare home-brew OS, I would guess (and I get paid for this sort of guess) that you could do the entire flight code in 512K, with maybe 8k of data space, excluding the science data.

Only recently have space-qualified rad-hard processors with this kind of capability become available. Until then, if you said you needed 2 meg for the OS alone, you would have gotten fired on the sopt and referred to mental health professionals. The availability of these processors enabled people to use high-level languages with tremendous overhead (like C++) to be used. And this was only done for employee retention purposes during the bubble. For years it was done at the assembler or even machine level. It's still not at all uncommon to do, and we've done MANY flight code patches, with only a processor handbook, an engineering paper pad, and by setting individual bits one-by-one.

Brett

Re:Efficiency (0)

Anonymous Coward | more than 9 years ago | (#10876213)

It's still not at all uncommon to do, and we've done MANY flight code patches, with only a processor handbook, an engineering paper pad, and by setting individual bits one-by-one.


Yes that's how a hardware control level software works, not an extra push/pop, I love this programming level!.



AC by slashdot still bloking my ISP's Inktomy server IP.

Re:Efficiency (0)

Anonymous Coward | more than 9 years ago | (#10876322)

As I understand it, the rovers have image recognition software that allows them to visually navigate and autonomously avoid hazards using their cameras. That's a little more complex than the simple orbital mechanics and instrument control code that most probes run.

Implementing a functioning artificial vision and autonomous decision-making system in a small fraction of the space that a Java "Hello World" program takes is still pretty impressive in my book.

Maybe now that the hardware supports the luxury of 3 megs of code, trying to write all of the functionality in assembler would "get you fired", because it is an excessive use of expensive developer time and QA resources.

Contiki (0)

Anonymous Coward | more than 9 years ago | (#10876079)

Contiki [www.sics.se] - multitasking kernel, TCP/IP stack, GUI, themeable window system, web server, web browser, etc. Runs in 40k RAM (yes, only 40960 bytes!). That's efficient coding.

Imagine a ... (-1, Troll)

Anonymous Coward | more than 9 years ago | (#10875615)

...Beowulf cluster of Mars Rovers.

Summary of OS code (4, Funny)

boingyzain (739759) | more than 9 years ago | (#10875617)

while (1 = 1) { Dig(); Picture(); }

Re:Summary of OS code (5, Funny)

zeath (624023) | more than 9 years ago | (#10875684)

roveros.c: 1: non-lvalue in assignment
make: *** [roveros] Error 1 I'm sorry, your rover is lost in space. Insert $1 billion and press any key to try again.

George Neville-Neil (4, Informative)

cpghost (719344) | more than 9 years ago | (#10875618)

The interviewer George Neville-Neil co-authored "The Design and Implementation of the FreeBSD Operating System" with Marshall Kirk McKusick.

Too bad about their compiler/asssembler line... (0)

Anonymous Coward | more than 9 years ago | (#10875619)

Too bad about their compiler/asssembler line it is not half as reliable as their mars rover software...

In outer space... (0, Funny)

Anonymous Coward | more than 9 years ago | (#10875621)

...rover codes you!

Re:In outer space... (-1, Offtopic)

Anonymous Coward | more than 9 years ago | (#10875694)

In Russia they execute people for this stupid joke

just imagine ... (-1, Redundant)

xlyz (695304) | more than 9 years ago | (#10875628)


a beowulf cluster of rovers ...

Re:just imagine ... (0)

Anonymous Coward | more than 9 years ago | (#10875658)

a beowulf cluster of morons who think this joke is funny...

Errr... (0)

Anonymous Coward | more than 9 years ago | (#10875661)

a beowulf cluster of rovers
Don't you mean a convoy of rovers? :P

Kevin

Reinventing the wheel. (5, Funny)

Anonymous Coward | more than 9 years ago | (#10875644)

Should have just used WinCE, with a few of the productivity apps cut out. Adding a copy of pocket Auto-route, with some Martian JPEGS would have helped navigation as well.

Carmack (3, Interesting)

mfh (56) | more than 9 years ago | (#10875648)

I would like to think that this article embodies the reasons that John Carmack got into space program development to begin with.

In the beginning he got into 3d game applications for a similar reason. The cutting edge is always the very outer area of human development, and Carmack makes a good example of a programmer who has taken aim at the edge of what is known to programmers. Maybe Mr. Carmack would care to comment?

Much like how Id Software develops engines, the space craft programming is new an innovative, although the difference is that space craft have systems have no room for error.

Re:Carmack (1)

Infinityis (807294) | more than 9 years ago | (#10875868)

I was hoping for a less abstract reason, like an upcoming game, such as The Sims: Space Station or The Sims: Mars Rover.

Guess I'll have to scratch another one off my Christmas wish list...

Wait a minute? (3, Insightful)

Billly Gates (198444) | more than 9 years ago | (#10875669)

Was not the OS about Rover loaded with problems? Go read past news from last Febuarary here on slashdot?

VXworks does not even offer memory protection and the ram can get fragmented. Not to sound trollish but I would pick something like Qnx or NetBSD for any critical app or embedded device.

Its amazing the engineers fixed it and got it to work reliably but better more mission critical operating systems would be a better choice.

Re:Wait a minute? (2, Interesting)

cpghost (719344) | more than 9 years ago | (#10875763)

I would pick something like Qnx or NetBSD for any critical app

Okay, let's turn NetBSD into a real-time OS. Add some "hardening" features like watchdogs etc. Hmm... what should we call it? Perhaps: SpaceBSD?

Re:Wait a minute? (1)

Dominic_Mazzoni (125164) | more than 9 years ago | (#10875772)

VXworks does not even offer memory protection and the ram can get fragmented. Not to sound trollish but I would pick something like Qnx or NetBSD for any critical app or embedded device.

I think QNX is a valid alternative. But is NetBSD hard-real-time?

Re:Wait a minute? (1)

bhima (46039) | more than 9 years ago | (#10875915)

NetBSD is not Hard Real Time. But most applications don't need true Real Time behavior. I use it at work for a couple of projects and find it more satisfactory than VXworks, Linux or (god forbid) Windows XP embedded.

Also Dynamic Memory allocation makes for ... "Interesting" testing "Oppurtunities". That's not to say I've never done it, only that I sort of wish I hadn't

Re:Wait a minute? (4, Insightful)

neonstz (79215) | more than 9 years ago | (#10875822)

VXworks does not even offer memory protection and the ram can get fragmented.

Dynamically allocating memory is usually a big no-no in real time systems.

Re:Wait a minute? (3, Insightful)

RAMMS+EIN (578166) | more than 9 years ago | (#10875916)

``VXworks does not even offer memory protection and the ram can get fragmented.''

Why would you even want memory protection in a system like this? Memory protection is great to prevent crappy apps on your PC from doing too much damage, but in a system like the Rover it's pure overhead.

As for ram getting fragmented, it all depends on how you program it. Often, you don't even need memory allocation, so you won't have any problem with fragmentation.

2MB Kernel (0)

Anonymous Coward | more than 9 years ago | (#10875675)

My linux kernel comes in at 1.7 meg and that's a fairly large kernel from what I've seen.

Re:2MB Kernel (0)

Anonymous Coward | more than 9 years ago | (#10876250)

...add to that the space your kernel modules take up.

Re:2MB Kernel (1)

lintux (125434) | more than 9 years ago | (#10876618)

But that file's probably called bzImage or vmlinuz. And do you know what the z means? Right, compressed. :-)

vmlinux files are 3-4MBytes (2.6) AFAIK. And, as the other poster pointed out, that doesn't include the modules.

I hope they had the foresight to add spam-blocking (1, Funny)

Infinityis (807294) | more than 9 years ago | (#10875710)

From Mr.Marvin
Olympus Mons Coast.

DEAR SIR/MADAM,

I AM HAPPY TO WRITE AND SEND THIS MESSAGE TO YOU.
AND I STRONGLY BELIEVE THAT THIS MESSAGE WOULD COME TO YOU AS A SURPRISE BUT I HOPE YOU WILL CONSIDER IT AS A CALL FROM A FAMILY IN DARE NEED AND GIVE IT URGENT CONSIDERATION. MY NAME IS MR marvin, A CITIZEN OF MARS AND THE SON OF LATE DR. FIDELIS GUBWANO WHO BEFORE HIS DEATH WAS THE MANAGER OF MARTIAN FINANCIAL TRUST CORPORATION (M.F.T.C). UPON HIS DEATH HE $60,000,000 (SIXTY MILLION U.S. DOLLARS) IN A THE OLYMPUS MONS BRANCH OF THE MARTIAN PLANETARY BANKING SYSTEM. I BELIEVE YOU TO BE AN HONEST AND TRUSTWORTY CITIZEN AND CAPABLE OF ASSISTING ME IN REMOVING THE MONEY FROM THIS ACCOUNT.

compilation error found (3, Funny)

circletimessquare (444983) | more than 9 years ago | (#10875721)

#include
int main() {
printf("Hello World!\n");
return 0;
}

marsrover.c: 3: You are no longer on the planet Earth.

Re:compilation error found (1)

Infinityis (807294) | more than 9 years ago | (#10875887)

I dunno, I think that program is actually best suited to the task of interplanetary exploration.

Also, you need a new compiler. The real reason for that error is because you need to follow up #include with a filename.

Re:compilation error found (0)

Anonymous Coward | more than 9 years ago | (#10876220)

you need to follow up #include with a filename.

Slashbug.

Will they quit using FAT? (4, Informative)

EqualSlash (690076) | more than 9 years ago | (#10875731)

Remember sometime ago Spirit was continously rebooting due to a flash memory problem. The usage of FAT file system in the embedded systems was partly responsible for the mess.

The problem, Denise said, was in the file system the rover used. In DOS, a directory structure is actually stored as a file. As that directory tree grows, the directory file grows, as well. The Achilles' heel, Denise said, was that deleting files from the directory tree does not reduce the size of the directory file. Instead, deleted files are represented within the directory by special characters, which tell the OS that the files can be replaced with new data.

By itself, the cancerous file might not have been an issue. Combined with a "feature" of a third-party piece of software used by the onboard Wind River embedded OS, however, the glitch proved nearly fatal.

According to Denise, the Spirit rover contains 256 Mbytes of flash memory, a nonvolatile memory that can be written and rewritten thousands of times. The rover also contains 128 Mbytes of DRAM, 96 Mbytes of which are used for data, such as buffering image files in preparation for transmitting them to Earth. The other 32 Mbytes are used for code storage. An additional 11 Mbytes of EEPROM memory are used for additional program code storage.

The undisclosed software vendor required that data stored in flash memory be mirrored in RAM. Since the rover's flash memory was twice the size of the system RAM, a crash was almost inevitable, Denise said.

Moving an actuator, for example, generates a large number of tiny data files. After the rover rebooted, the OSes heap memory would be a hair's breadth away from a crash, as the system RAM would be nearly full, Denise said. Adding another data file would generate a memory allocation command to a nonexistent memory address, prompting a fatal error.

Source: DOS Glitch Nearly Killed Mars Rover [extremetech.com]

BTW, there is another interview of Mike Deliman [pcworld.com] I read sometime ago in PCWorld.

Other options being considered (5, Insightful)

Dominic_Mazzoni (125164) | more than 9 years ago | (#10875748)

For those who are wondering, JPL is very aware of the shortcomings of VxWorks and has seriously considered other alternatives for every mission. Keep in mind that the choice of OS has to be made years before launch, so at the time the OS for the 2004 Mars Rovers was decided on, many options that are possibilities today were not contenders. Also keep in mind that in spite of many shortcomings, VxWorks is a known quantity. JPL has been working with it for years and had a lot of in-house expertise with it.

There are a few groups at JPL that have been actively experimenting with other options, including RTLinux and a few different variants of hard-real-time Java (basically Java with explicit memory management and no garbage collection).

Huh, its easy.. (5, Funny)

adeyadey (678765) | more than 9 years ago | (#10875778)

you are in a red rocky landscape..

GO NORTH..

you are in a red rocky landscape..

DIG.

ok. you see some red sand.
it is getting dark.

GO NORTH..

you were eaten by a grue.

Article Slashdotted? (1)

emiddlec (673376) | more than 9 years ago | (#10875784)

I can't get through to acmqueue.com. Can someone post an alternate link to the article?

Re:Article Slashdotted? (1)

elFarto the 2nd (709099) | more than 9 years ago | (#10875825)

http://www.mirrordot.org/ [mirrordot.org] is your friend.

Regards
elFarto

Re:Article Slashdotted? (1)

emiddlec (673376) | more than 9 years ago | (#10875856)

Many thanks.

Other (older) articles that I found if anyone is interested:

No Life on Mars, But Many Bugs [wired.com]

Three Minutes With Mike Deliman [pcworld.com]

Out-of-memory problem caused Mars rover's glitch [computerworld.com]

MarsNews.com :: NewsWire :: Mars Exploration Rovers :: Archives [marsnews.com]

Red Rover's master coder [arnnet.com.au]

Re:Article Slashdotted? (1)

emiddlec (673376) | more than 9 years ago | (#10875947)

MirrorDot question -- is it mirroring all four pages of this article? The links for pages 2-4 lead to the acm site.

Today's /. footer quote (1)

edittard (805475) | more than 9 years ago | (#10875786)

"Examinations are formidable even to the best prepared, for even the greatest fool may ask more the the wisest man can answer. -- C.C. Colton "

Of course, a wise man knows the difference between "the" and "than".

On the other hand.. (0)

Anonymous Coward | more than 9 years ago | (#10875787)

..if only Wind River spent half the time they use on VxWorks on pSOS as well the world would be a much better place for a lot of people.

Don't hold your breath though.

my satellite debugging experience (5, Interesting)

nil5 (538942) | more than 9 years ago | (#10875847)

I worked on a satellite mission where we had some trouble. Due to an error the satellite wound up pointing 16 degrees away from the sun in a higher-than-expected orbit of 443 miles (714 kilometers) above Earth.

The misalignment meant the spacecraft was unable to look directly at the sun's center to record the amount of radiation streaming toward Earth. To accurately measure sunlight, the darn thing needed to be pointed to within a quarter of a degree of dead center.

It took about four and a half months to fix that problem, due to uplink difficulties. Ground controllers from first had to slow the spacecraft's spin in order to transmit a series of software "patches" and then gradually speed it up to see how well the commands worked.

Then things were fixed.

Moral of the story: it is a tough job indeed!

Marketing crap (3, Insightful)

jeffmock (188913) | more than 9 years ago | (#10875994)

Okay, I've got to call foul on this WindRiver marketing ploy. They're trading on the last days of being able to get away with saying that something mystical and special and super-high quality is going on behind the walls of trade secret and proprietary software.

I used vxworks on a reasonably large project several years ago, it's a fine piece of work, but nothing special, it's no where close to the quality of a recent linux kernel.

About half-way through our project we developed a need for a local filesystem on our box. We bought a FAT filesystem add-on from wind river that was annoyingly poor quality, lots of bizarre little problems, memory leaks, and of course no source to look at. In the end we didn't use it, we put together our own filesystem from freely available sources.

When I read the articles about vxworks filesystem problems nearly borking the entire Mars rover mission I laughed and laughed. I'm sure that it was the same crappy code (although I don't really know for sure).

For me it's a case study on why you shouldn't use closed source software, you can't evaluate the quality of the code on the other side trade-secret barrier and you wind up trusting things like glossy brochures.

jeff

Open source spaceware (5, Insightful)

relaxrelax (820738) | more than 9 years ago | (#10876018)


If that was open source, there are so many space nerds who are programmers that flaws of that magnitude would never get by the army of testers.

Many would help out simply because hey it's the *space program* and that's good enough for them. Other would want their name listed next to some obscure bug fix on a NASA site; it's good for the ego or your CV.

Simply put, even a binary distribution of that code would allow unlimited free testing for crashes. Why wouldn't NASA do it?

Because there are still people in washington that think code mysteriously get damaged by being public - even if such code isn't modifiable by the public who reads it.

This is evidence of advanced cluelessness in Washington and maybe independant anti-free-source advocates (spelled M-i-c-r-o-s-o-f-t) are at cause.

But I've learned not to bash. Never explain by Microsoft malice what could be explained by stupidity. Such as using DOS on a space thing...

Re:Open source spaceware (2, Insightful)

Gogo Dodo (129808) | more than 9 years ago | (#10876566)

Uhhh... and exactly how are you going to allow people to test "spaceware"? Last I checked, nobody owns their own satellite system. You just don't dump some satellite code onto your PC and "test" it.

Open Source is great and all, but it's hardly the answer to everything.

Re:Open source spaceware (0)

Anonymous Coward | more than 9 years ago | (#10876601)

This is evidence of advanced cluelessness in Washington and maybe independant anti-free-source advocates (spelled M-i-c-r-o-s-o-f-t) are at cause.

But I've learned not to bash. Never explain by Microsoft malice what could be explained by stupidity. Such as using DOS on a space thing...

Sounds like you forgot what you "learned". Seeing how we're talking about VxWorks, I fail to see how this is Microsoft's fault. I'm amazed how some people take an opportunity to turn any non-open source project into a bash Microsoft thing.

In Soviet Russia... (-1)

Anonymous Coward | more than 9 years ago | (#10876050)

In Soviet Russia, spacecraft writes code for YOU.

Microsoft. Where do you want to go today? (-1, Troll)

rice_burners_suck (243660) | more than 9 years ago | (#10876064)

I don't understand why writing an OS for something like the Mars rover or some spacecraft that's going to be on the other side of the universe needs to be such a difficult thing to do. It's just software. Why can't they just save the billions of taxpayer dollars and just install an old copy of Windows 95 on it? That would be good enough.

After all, nobody ever got fired for buying Microsoft. And putting Windows on Sputnik 2 gives a new meaning to "Where do you want to go today?"

Re:Microsoft. Where do you want to go today? (1)

syynnapse (781681) | more than 9 years ago | (#10876234)

well, it's software running on several million dollars worth of hardware that is in no way easy to troubleshoot. rebooting probobly takes many hours considering the delay associated with transmission to mars.

I remeber running windows 95, and if my pc cost a few million dollars i wouldn't want a copy of win95 within 100ft of it.

Re:Microsoft. Where do you want to go today? (1)

syynnapse (781681) | more than 9 years ago | (#10876263)

to further answer:

Writing the code for spacecraft is no harder than for any other realtime life- or mission-critical application. The thing that is hard is debugging a problem from another planet: you can't put your hands on the malfunctioning system to see what's going on; you must use intuition and experience.

Debugging in space: a case for dynamic systems. (4, Interesting)

voodoo1man (594237) | more than 9 years ago | (#10876098)

In 1998-2001, the JPL successfuly flew the Deep Space 1 [nasa.gov] spacecraft. One of the systems on board was the Remote Agent [nasa.gov], a fully autonomous spacecraft control and guidance system. The software was written entirely in Common Lisp, and parts were verified in SPIN [spinroot.com] (there is an interesting paper [psu.edu] written on the verification process, along with an informal account [google.com] by one of the designers), which yielded the detection of several unforeseen race conditions. The parts that were not verified were thought to be thread-safe, but unfortunately this proved mistaken as a race condition occured in-flight. With the help of the Read-Eval-Print Loop and other Lisp debugging facilities, the bug was tracked down and fixed in less than a day, and Remote Agent went on to win NASA's Software of the Year Award.

Perhaps not surprisingly for anyone who has heard about the management at NASA, C++ was selected for the successors to the Remote Agent on the grounds that it is supposed to be more reliable (this despite the fact that the Remote Agent was originally to be developed in C++, an effort that was abandoned after a year of failure). This caused more than a few people to be upset [google.com] (including a very personal account [flownet.com] by one of the aforementioned designers). Clearly the debugging facilities of Common Lisp are far superior to static systems like C++, something which is very useful in diagnosing unexpected error conditions in spacecraft software (read the first question on p. 3 of the interview to see what pains the JPL staff went through to adapt similar, ad-hoc methods to VxWorks). It's also clear from this interview (question: "How is application programming done for a spacecraft?" Answer:"Much the same as for anything elsesoftware requirements are written, with specifications and test plans, then the software is written and tested, problems are fixed, and eventually its sent off to do its job.") that NASA has in no way tried to adapt formal verification methods for it's software, prefering instead to rely on the "tried and true" (at failing, maybe) poke-and-test development "methods."

Clearly, formal verification methods to eliminate bugs before critical software is deployed, and deployment in a system with advanced debugging facilities is a clear win for spacecraft software, and should be adapted as the standard model of development. Unfortunately, like in many other software development enterprises, inertia keeps outdated, inadequate systems going despite a strong failure correlation rate.

Re:Debugging in space: a case for dynamic systems. (3, Interesting)

GileadGreene (539584) | more than 9 years ago | (#10876399)

NASA has had an active formal methods/formal verification program for a number of years, located at NASA Langley [nasa.gov]. They mostly do research, but have worked on a few practical applications, mostly in the shuttle program. Additionally, JPL recently (2003) set up the JPL Laboratory for Reliable Software [nasa.gov], which is chartered to look into formal verification among other things. The lead technologist in the LaRS is none other than Gerard Holzmann [spinroot.com], the man behind SPIN.

Having said all of that, I'll agree that formal verification at NASA is in its infancy, and is facing an uphill battle for acceptance (witness how long the Langley group has been trying to push formal methods). It'll be interesting to see what happens with JPL's LaRS.

Microsoft Windows (1, Funny)

Anonymous Coward | more than 9 years ago | (#10876303)

Hands down for any Mission Critical application.

Out of curiousity (2, Interesting)

FunkSoulBrother (140893) | more than 9 years ago | (#10876351)

Why, in the 21st century, is it necessary to fit something like the Mars rover code in 2MB of memory? If something like a Gameboy Advance or a PDA can hold 64MB-a couple gigs, what is holding NASA back, with their gigantic budget and all?

I can't imagine it would be the cost of the memory... I mean I know it costs much much more to make chips to a very strict specification, but if you are already producing so few units, isn't your cost of production going to be extrodinarily high whether you are making 64KB chips or 2MB or even 64MB?

This is not to say that I don't have admiration for fitting all that code in such a small space, but is there a reason they feel the need to do so?

Re:Out of curiousity (1, Informative)

Anonymous Coward | more than 9 years ago | (#10876469)

why, in the 21st century, is it necessary to fit something like the Mars rover code in 2MB of memory? If something like a Gameboy Advance or a PDA can hold 64MB-a couple gigs, what is holding NASA back, with their gigantic budget and all?

One thing, radiation. It cheaper to take simpler purpose designed and fabricated, bulkier chips up that dont get upset once a particle hits it then it is to send up the lates and smallest chips supersensitive to radiation but oh so fast, and add lead shielding doubling only as dead weight.

Re:Out of curiousity (4, Informative)

The Vulture (248871) | more than 9 years ago | (#10876531)

The problem is that technology moves too quickly for it to get "NASA certified". When you send something up in space where making changes to it will be difficult, you need something that is known to be robust and reliable, that has several years of testing.

Last I read (maybe a year ago?), NASA still used 386 and 486 chips because they didn't generate a lot of heat (compared to todays machines) and could be made to withstand higher than normal forces (through extra padding on the device I imagine). They were more resiliant to the issues you might see in space than newer processors.

Simply put, if they put the latest CPU with tons of RAM in there, and it fails, how are they going to fix it?

-- Joe

Spacecraft (3, Funny)

sheetsda (230887) | more than 9 years ago | (#10876384)

Writing Code for Spacecraft

My first thought was "Spacecraft? is that a new Starcraft clone I hadn't heard about?". It was then I realized I've been hanging out on the Game Programming Wiki [gpwiki.org] too much lately.
Load More Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Sign up for Slashdot Newsletters
Create a Slashdot Account

Loading...