Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Ultra-Stable Software Design in C++?

Cliff posted more than 8 years ago | from the failure-minimization dept.

Programming 690

null_functor asks: "I need to create an ultra-stable, crash-free application in C++. Sadly, the programming language cannot be changed due to reasons of efficiency and availability of core libraries. The application can be naturally divided into several modules, such as GUI, core data structures, a persistent object storage mechanism, a distributed communication module and several core algorithms. Basically, it allows users to crunch a god-awful amount of data over several computing nodes. The application is meant to primarily run on Linux, but should be portable to Windows without much difficulty." While there's more to this, what strategies should a developer take to insure that the resulting program is as crash-free as possible?"I'm thinking of decoupling the modules physically so that, even if one crashes/becomes unstable (say, the distributed communication module encounters a segmentation fault, has a memory leak or a deadlock), the others remain alive, detect the error, and silently re-start the offending 'module'. Sure, there is no guarantee that the bug won't resurface in the module's new incarnation, but (I'm guessing!) it at least reduces the number of absolute system failures.

How can I actually implement such a decoupling? What tools (System V IPC/custom socket-based message-queue system/DCE/CORBA? my knowledge of options is embarrassingly trivial :-( ) would you suggest should be used? Ideally, I'd want the function call abstraction to be available just like in, say, Java RMI.

And while we are at it, are there any software _design patterns_ that specifically tackle the stability issue?"

cancel ×

690 comments

Sorry! There are no comments related to the filter you selected.

You're not the first one.... (4, Insightful)

DetrimentalFiend (233753) | more than 8 years ago | (#14644208)

I'd hate to say it, but you might want to SERIOUSLY consider managed code. You could build some of the parts in C++ if need to be, but doing it purely in C++ seems like a bad idea to me. You're asking for a silver bullet that just doesn't exist...but managed code is getting faster and can be pretty stable.

Re:You're not the first one.... (2, Informative)

destuxor (874523) | more than 8 years ago | (#14644247)

Although I've never had reason to do this myself, I've heard people recommend cross-compiling code onto a PPC or SPARC64 platform and then fuzz your program on those platforms to look for bugs that might not have shown up on x86.
As for your question about CORBA, look into IceC++ [zeroc.com] . I read about it somewhere and it sounded cool :)

Re:You're not the first one.... (3, Informative)

arkanes (521690) | more than 8 years ago | (#14644250)

I have to agree. You say it can't be changed due to efficency and core library issues, and then list a bunch of components for which there are no core libraries in C++ and which are rarely if ever CPU bound. Further, if you're asking this question then you aren't a highly experienced guru in this field and thats what you need to be to write these sort of applications in C++. Managed Code (tm), as in .NET, is not your only option but you should look at something higher level and more robust than C++. Haskell or one of the other functional languages might be a good idea.

Test, test, test. Use test driven development if you can. Have a good test harness and use it all the time. Do the stereotypical "random input" tests. Test each and every component to destruction white-box style and then double test all your interfaces. Design by contract can help here. Use a tinderbox that runs continual builds. Maintain strict version controls. Maintain code discipline (getting rid of C++ helps here too). Realize that you probably do not (currently) have the skills to produce the kind of product you are talking about and be willing to commit the time and effort to tear up mistakes, to start over and to teach yourself. What you are attempting to do is not easy.

Question from a Newbie (0)

Anonymous Coward | more than 8 years ago | (#14644267)

Given the advice you are giving the original poster, would the D programming language be a good alternative choice for him? Programming by contract, binary compatibility with C libraries, and actually compiled as CPU instructions.

What a frightening post. (2, Insightful)

AltGrendel (175092) | more than 8 years ago | (#14644383)

Sounds like the poor soul is in over his/her eyeballs.

Re:You're not the first one.... (5, Insightful)

mr_tenor (310787) | more than 8 years ago | (#14644424)

WTF? I love Haskell as much as the next programming-language-theory fanboy, but saying "Haskell or one of the other functional languages might be a good idea." in reply to the OP strongly suggests to me that you are just making stuff up and/o are copy/pasting things that you have read elsewhere out of context

If not, then great! Please post some references to literature which demonstrates how what you've suggested is sane and/or possible :)

Re:You're not the first one.... (3, Interesting)

merlin_jim (302773) | more than 8 years ago | (#14644304)

I was going to post pretty much the same thing - managed code approaches C++ efficiency close enough that it shouldn't matter (I've seen figures of 80-95%)

And, in visual studio .net 2005 there are built in high performance computing primitives - all the management of internode communication and logical data semaphore locking are handled by the runtime - presumably debugged and stable code...

Re:You're not the first one.... (5, Insightful)

gadzook33 (740455) | more than 8 years ago | (#14644313)

Ah, another true believer. I work heavily in both managed and unmanaged code (c/c++/c#) hybrid solutions. In my experience, a well designed C++ program is as stable as a well designed C# program. Who cares if it "crashes" if it doesn't do what you want? The worst program is one that seems to be working but is generating invalid results. Don't let anyone convince you that C# is going to provide more reliable execution. We use C# for its nice GUIs; C++ for cross-platform portability.

Re:You're not the first one.... (1)

Anonymous Coward | more than 8 years ago | (#14644329)

Ah, the first of 200 people who don't know how to do what he wants so will answer an easier question, as happens every time any kind of question about anything lower-level than Java.

Now can anyone who can answer the question offer some advice, please? The asker isn't the only one interested in hearing from a few experts.

Good call (2, Interesting)

lifeisgreat (947143) | more than 8 years ago | (#14644342)

Good call. I'm not sure why C++ is being mandated for something that has stability as a top priority. Though there are some language-independent things that should be taken into account:

Executive summary of this post: Keep it simple. As simple as it can be while getting the job done. The more buzzwords you think about implementing, the more you need to reconsider whether you really need that whiz-bang feature.

You need to abstract your design into really independent layers, such that the backend processing can be done across linux, windows and even beos slaves simultaneously, and the frontend is viewable via a web interface, fed into excel or whatever. You can't look at this as one big project, but many independent (and more easily verifiable!!) applications cooperating with each other.

My impression from the description is that you want a system like folding@home for corporate customers - they have a whole heap of data they want analyzed (parallel workload across many clients) and a small subset of results they're interested in. Don't make things any more complicated than they have to be - the data sets could simply be files that are partitioned by a master, sent out when requested to client workhorse computers, getting there by http, nfs or whatever, processed, and the results returned into an incoming directory for a simple frontend to tabulate.

The biggest mistake you could make is having one gargantuan application in charge of everything. The race conditions will drive you mad, be they in data access, allocation, retrieval, dispatch or anything else you're trying to manage that the OS could do for you.

Just look at Froogle. Their millions upon millions of store/price listings are fed by people ftp'ing a feed of tab-separated text values.

Re:You're not the first one.... (0)

Anonymous Coward | more than 8 years ago | (#14644360)

Wow, what stupid advice.

Re:You're not the first one.... (1)

burni (930725) | more than 8 years ago | (#14644367)

Yes, hm.. I wanted to say the same, so I follow up here instead of starting
a seperate thread ;)

He(null_functor) can still use java and the feature-rich api on
the one side, and C++ on the other side, with the Java Native Interface,
it´s both way possible,

you(null_functor) can call the VM+Javaprogram from inside a C++ Program,
and the other way around, calling a C/C++ Program from a Javaprogram
inside the virtual machine
- sun has a good easy-to-start-tutorial http://java.sun.com, with code samples
- google("JNI Java";"Java Native Interface")

I´ve tried out the code samples and everything works fine, ongoing from this I´ve started interfacing with the ncurses-lib for example, because java and the unicode-centric-api-functions are the easy way to create multilanguage unicode menus from XML templates, it´s just fun from my point of view ;)

Communicating over JavaVM and native world borders shouldn´t be a big thing,
running Java on a linux-system you have the same access than a native program,
Pipes, Socket, ..

RPC ;) sun-rpc :D

jdbc for database access,

Speed :
it´s not 100%, but 80% mostly are enough.

naturally I divide the programming tasks into two categories,
soft/well-with-the-object-model and hard/bad-with-the-object-model,

so where OOP is needfull, I use java, I´ve a dislike against object-like
programming in C, so where a classic structured programming language
is needed I use C,

so you can see, Java and C for me are symbiotic lifeforms ;)

solving the complex things with java, I´ve also the feature, that porting
this application to other operating systems just involves some
JNI-code alteration.

closing words,
read this assuming that I like java and that I´m not a neutral person, so I promote what I like and what works for me, ;) generally I use Java and C,
and interface both sides when needed using JNI.

Re:You're not the first one.... (1, Insightful)

Iceberg1414 (952051) | more than 8 years ago | (#14644474)

1. I don't mean to bash your idea, but unless your "modules" are seperated into seperate processes, any part of your code or libraries can affect any other part. Writing a multi-process system has it's own complexities and issues.

And FYI, detecting hangs is a pretty hard problem.

2. Like was said, heavily consider going to a VM langauge (Java, .NET), unless you are willing to invest a lot of extra time in testing, code review, analysis, etc.

3. If you are going to use C++, consider using Smart Pointers, garbage collectors, etc, available for C++ at http://www.boost.org/ [boost.org]

4. Test early and often, and aim for full code coverage with code coverage tools. Code you don't run is code you haven't tested. Test also with memory tools such as Bounds Checker. I beleive Linux has robust gcc-enabled tools in this area also. Throw anything and everything at your code, including strange data, bad inputs, etc.

5. Use static code analysis if possible. You will have bugs in your code, period. Historically, this generates a lot of false positives. MS VStudio 2005 Team Edition is not free, but it may be available to you through MSDN or something. This has pretty good basic analysis of common errors available through the /analyze switch. Coverity is also a better ($$$) commercial static analysis tool.

6. Use plenty of assertions in your code

7. Make sure you have a good crash handler setup to generate relevant crash data and hopefully semi-automatically get it back to you.

8. Write solid unit tests WITH your code, and make sure the code is built often /w automated running of unit tests & other tests.

Get another programmer (0)

Anonymous Coward | more than 8 years ago | (#14644214)

Write with them.

Re:Get another programmer (5, Funny)

Philip K Dickhead (906971) | more than 8 years ago | (#14644229)

Make sure his name is something like "Bjarne" or "Knuth".

inline code (3, Informative)

jrockway (229604) | more than 8 years ago | (#14644217)

> Sadly, the programming language cannot be changed due to reasons of efficiency and availability of core libraries.

You can easily embed C/C++ in other languages. Take a look at Inline::CPP [cpan.org] , for example. With code like:


      use Inline CPP;

      print "9 + 16 = ", add(9, 16), "\n";
      print "9 - 16 = ", subtract(9, 16), "\n";

      __END__
      __CPP__

      int add(int x, int y) {
            return x + y;
      }

      int subtract(int x, int y) {
            return x - y;
      }


you can put the parts that need to be fast in C++, and the parts that need to be easy in Perl. (If you do the GUI in perl, you won't have to worry about portability or memory allocation. And the app will be fast, because the computation logic is written in C++.)

> The application can be naturally divided into several modules, such as GUI, core data structures, a persistent object storage mechanism, a distributed communication module and several core algorithms.

Yup. There's no need for the GUI to know how to do computations, remember. The more separate components you have, the more reliable your application (can) be. Make sure you have good specs for communication between components. Ideally, someone will be able to write one component without having the other one to "test" with. For testing, write unit tests that emulate the specs... and make sure your tests are correct!

Re:inline code (1)

stoicio (710327) | more than 8 years ago | (#14644244)

"you can put the parts that need to be fast in C++, and the parts that need to be easy in Perl" Yeah, use two languages instead of one. That will make it simple and stable...*not*! Keep it coded silly, simple intead. That will lead you to stability.

Re:inline code (0)

Anonymous Coward | more than 8 years ago | (#14644419)

Don't forget it's perl though! The language that's known for its robustness, maintainability, and has been proven to write mission critical software for YEARS. Wait... what was I talking about?

I'm gonna take a guess, but.. (5, Funny)

Anonymous Coward | more than 8 years ago | (#14644218)

try not to de-reference any NULL pointers and you should be ok..

Re:I'm gonna take a guess, but.. (0, Funny)

Anonymous Coward | more than 8 years ago | (#14644302)

dammit, dammit, dammit!

BRB

Performance? (2, Insightful)

rjstanford (69735) | more than 8 years ago | (#14644219)

If you're willing to compromise performance to the point that you can use CORBA for IPC, then you should be more than willing to write it in the language of your choice, within reason. C, C#, C++, Java, all are far faster than your CORBA transport.

If you can provide more details about the specific requirements, you might get more informed responses. As it is, though, your stated goals really don't seem to add up.

Even as stated, I would write the core in a highly tuned fashion (although C++ might not be my best choice for this), then write the GUI in the language of your choice, quite frankly. Optimise the bottlenecks (ie: your core processing) for speed, optimise everything else for maintainability and ease of development.

Re:Performance? (1)

jeff_schiller (877821) | more than 8 years ago | (#14644395)

Wouldn't that depend on how much communication happens between modules?

Engineering, the art of compromise and tradeoffs (1)

Morgaine (4316) | more than 8 years ago | (#14644473)

>> As it is, though, your stated goals really don't seem to add up.

I agree entirely with your reply here. The poster's statement (below) is frankly ludicrous:

>> Sadly, the programming language cannot be changed due to reasons of efficiency and availability of core libraries.

Well in that case, sadly, the inherent unreliability of the programming language and core libraries cannot be changed either. Efficiency is the *primary* inverse determinant of reliability.

This is ENGINEERING we're talking about here, ie. a practical discipline that's all about making tradeoffs in one area in order to reap benefits in another. He's not willing to make any key tradeoffs, so he's not going to gain what benefits he seeks either.

Development Practices (3, Insightful)

the eric conspiracy (20178) | more than 8 years ago | (#14644220)

THere is no silver bullet for what you describe other than sound development practices. The best results in this area are acheived by teams who are constantly refining their processes based on lessons learned in previous software iterations.

Bulletproof code isn't cheap, but it can be done.

Re:Development Practices (1)

ScrewMaster (602015) | more than 8 years ago | (#14644287)

Yeah ... the Space Shuttle software group pretty much exemplifies what it takes to write software that is about as fault-free as it's possible to get. And they work as you say, by sound development practices, and constant, never-ending testing and refinement. It's a grueling process but it works.

Here's your best bet. (5, Interesting)

neo (4625) | more than 8 years ago | (#14644222)

1. Write the whole thing in Python.
2. Once it's bullet-proof, replace each function and object with C++ code.
3. Profit.

Re:Here's your best bet. (1)

jellomizer (103300) | more than 8 years ago | (#14644259)

Using python as a prototyping language what a concept!
Plus you are also able to include c++ libraries in python too. So after you get it to work you can replace each python module with the c++ ones and still make sure the app still works.

I tend to do most of my programming in python for proof of concept even if it takes hours to run the code at least I know the concept works or not. If it does then I go into optimizing in a higher level language.

Re:Here's your best bet. (2, Informative)

beyonddeath (592751) | more than 8 years ago | (#14644358)

Wouldn't you want something thats lower level? Im not even sure theres much higher a language than python!. perhaps you need to revisit csc108 (into to comp sci).

Re:Here's your best bet. (1)

jellomizer (103300) | more than 8 years ago | (#14644394)

Sure there is. VB.NET.
I did mean lower level/ higher performance its late and I am tried.

Re:Here's your best bet. (1)

Daxster (854610) | more than 8 years ago | (#14644373)

If it does [work] then I go into optimizing in a higher level language.


I think you meant lower-level language ;-).

Re:Here's your best bet. (5, Informative)

YGingras (605709) | more than 8 years ago | (#14644361)

This is really good advice but it needs more details:

1) Wrap your legacy libs with SWIG
2) Code a working prototype in Python
3) Profile it (never skip this step)
4) Use SWIG to write the bottle neck parts in C++
5) Use Valgrind to ensure you are still OK memory wise
6) Profit!!

I am invoking Greenspun's 10th law (1)

Latent Heat (558884) | more than 8 years ago | (#14644379)

Just as it is assumed that a language like C or C++ translates statement-by-statement into machine code, you are assuming that a language like Python translates line-by-line into C++. Does it?

A variable in Python is a variable as in anything else, but a variable is a reference to an instance of a type that could be anything -- the referenced instance has a type as opposed to being some universal type like a string, but it can be assigned on the fly, and it can be a number, a string, an object instance, a class, or a function. I suppose assignment is just copying the reference, but as soon as you do anything with it, you have to somehow look up the dynamic type and decide to do something legal. And fail gracefully if called to do something illegal.

And as to memory allocation, everytime you touch a reference you are doing something with a reference count, and I believe there is some kind of primitive mark-scan garbage collection layered on top of the reference count to break circular chains. Are you going to hand translate that into C++ as well?

Greenspun's 10th law is this inside joke among Smug Lisp Weenies (TM) that any sufficiently complicated Fortran program (back in the day, today substitute C/C++ program) implements a good chunk of Common Lisp, only slower and with a lot of bugs. I may offend people to compare meek Python to mighty Common Lisp, but Python has a sufficient dynamic behavior to make the connection.

My advice on the reliable C++ program is 1) design to the level of having a clear idea of the architecture of your app before coding -- the classes, their purpose, their containing other objects, 2) for each object where a reference is contained in another object, do some kind of code reading/verification/check of conformance to a standard you have established as to how that object gets deleted in a safe way. It could be reference counts, auto-objects, caller deletes/callee delets -- just decide on what you are going to do and read code to see that you are consistent about it.

Re:Here's your best bet. (2, Informative)

Chandon Seldon (43083) | more than 8 years ago | (#14644470)

I'll have to back up the start with python plan.

Two additional points:
1.) You don't need to replace all the python code.
2.) Use a garbage collector like http://www.hpl.hp.com/personal/Hans_Boehm/gc/ [hp.com] for your C++ code.

Ask Microsoft.... (-1, Troll)

KeiichiMorisato (945464) | more than 8 years ago | (#14644224)

While there's more to this, what strategies should a developer take to insure that the resulting program is as crash-free as possible? Ask Microsoft developers....

For starters: (1)

Bluesman (104513) | more than 8 years ago | (#14644227)

You can use the Boehm garbage collector to eliminate a huge class of typical memory errors:

http://www.hpl.hp.com/personal/Hans_Boehm/gc/ [hp.com]

This isn't necessarily something you'd have to design around, either. You can add it later.

Boehm and RAII? (1)

tepples (727027) | more than 8 years ago | (#14644351)

How well does a Boehm garbage collector work with the Resource Acquisition Is Initialization pattern [wikipedia.org] ? In Boehm's library, do destructors or finalizers get called in any sort of predictable order?

You need three things (1)

hsmith (818216) | more than 8 years ago | (#14644234)

Good people
lots of time
lots of money


then you have a chance of pumping out the good product

Don't get too fancy... (4, Informative)

Pyromage (19360) | more than 8 years ago | (#14644242)

First, consider how complex you want to make the system. The decoupling is a good idea, I think. However, I don't think that having modules automatically restart one another is a good idea; it introduces a whole slew of other problems. At most I'd say use a watchdog process (principle of single responsibility).

Furthermore, you're crunching large amounts of data, so I'm guessing batch processing. If you can have the application not be a server, then you simplify things a lot. Make it a utility that takes data on standard input and runs whatever analysis you need, and duct tape it together with cron or a simple program that watches for new input files.

Also, I'd like to suggest that you consider whether other languages could be efficient for the task. For example, Java is pretty good numerically, and as far as your libraries go, see if you can use SWIG to generate JNI wrappers. Also, then you get Java RMI.

Next, get them down to one platform. It's *way* easier to develop software with tight constraints on a single platform (versus multiple platforms). Investigate QNX: a reliable operating system (though admittedly quirky) with a beautiful IPC API. In any case, make sure you get a well-tested library with message queues, etc. You don't want to be using raw sockets; you could but that's just another pain in the ass on top of everything else.

Last, figure out what the cost of a failure is. Getting that last few percent of reliability is very very expensive. Unless you're a pacemaker or respirator, the cost of failure is probably not as high as the cost of five nines of uptime.

Don't code to impress. (5, Informative)

jellomizer (103300) | more than 8 years ago | (#14644243)


When coding something that needs to be stable, you need to keep your ego aside and concentrate on the task at hand. Stick with tried and true methods don't go with any algorithm that you are not 100% comfortable with even if it makes the code less ugly. Be sure to follow good practices make many function/methods, and make each one as simple as possible, makes it easier to check each function for bugs when they are simple. Secondly document it like you never want to touch the code again (in code and out of code), you want to know what is going on at all time and the bigger it gets the larger chance you could get lost in your own code. When working in a team and you are in someone else's code document that you did the change.

Next take into account what causes most Crashes.
Bad/Overflow memory allocation.
Memory leaks.
Endless loops.
Bad calls to the hardware.
Bad calls to the OS.
Deadlock

If you are going to decouple modules keep in mind that you will need to do as much processing as possible with minimum message passing and allow for mirrors so if one system is down and other can take its place, without killing the network.

For IPC I tend to like TCP/IP Client server. But that is because it tends to offer a common platform independence and allows for expansion across the network. Or try other Server Methods such as a good SQL server Where you can put all the shared data in one spot and get it back. But not knowing the actual requirements it may just be a stupid idea.

I would suggest that you also ask in other places other then Slashdot. While there are many experts on this topic there are also equal if not greater amount of kids on there who think they know what they are talking about, or they have there ego in this technology/or method.

Check your code... (0)

Anonymous Coward | more than 8 years ago | (#14644249)

before you even write it. I mean get your idea clear and then write the code.
Check your input _always_ and get clear on error signaling. Any module can cause an error, but this thing should be efficently said to other modules so they can handle the error.
Create an universal error trap that will catch any error you don't expect, process it and allow for further program run.
That should do it.

The weakest link in the chain (1, Insightful)

karmaflux (148909) | more than 8 years ago | (#14644251)

Your program's only as stable as the "core libraries" your company wants you to use.

Uphill Battle (1, Informative)

twiddlingbits (707452) | more than 8 years ago | (#14644265)

First, there is not a silver bullet design that makes a program 100% crashproof. Even if there was there would need to be the corresponding crash proof Operating System, which there really isn't. Linux and some Unixes have very high uptime (99.997%), as do Mainframe OSes, but Windows certainly is not normally in that category.

To make your program as crash proof as YOU can control you should validate your requirements using Use Cases, minimize Design Complexity, use good C++ programming practices, and do extensive testing at every level using white box and black box testing techniques. Testing is key, and regression testing after changes is even more key. Don't assume fixing this didn't break that. Test with REAL data if you can. Test with invalid data so you will test your error handling, test at maximum usage levels to validate no memory leaks, resource contentions, deadlocks, etc.

However, at some point things get out of your control, as you don't write the C++ system calls, or the compiler code, or any OS features the code uses. So bugs in those can cause your program to crash. It wasn't your code that crashed but you'll get the blame. So to be crashproof it takes a "system" that is crashproof, you program is just one part of that.

Re:Uphill Battle (0)

Cheapy (809643) | more than 8 years ago | (#14644380)

"Linux and some Unixes have very high uptime (99.997%)"

99.997 percent of what? Context!

Use state machines (2, Informative)

Warlock48 (132391) | more than 8 years ago | (#14644268)

State machines help make sure you cover (almost) all possibles cases your app may encounter.

Here's a great framework to start with:
http://www.quantum-leaps.com/products/qf.htm [quantum-leaps.com]
And the book:
http://www.quantum-leaps.com/writings/book.htm [quantum-leaps.com]

XP (1)

joebebel (923241) | more than 8 years ago | (#14644269)

Extreme programming is your friend on this one. Doesn't matter what language you use, test and retest at every change. Testing is the only, only, only way to get extremely stable software outside of formal verification methods.

Re:XP (3, Insightful)

Anonymous Brave Guy (457657) | more than 8 years ago | (#14644344)

Extreme programming is your worst enemy on this one. If you need a system that is truly reliable, you cannot take an approach that fundamentally bases its quality controls on a finite number of tests, unless you can test absolutely every possible set of inputs your program can ever receive (legitimately or otherwise).

Testing is good, of course, but for this sort of job, you must have a proper design, such that all components can be properly verified. (And of course, you must have a proper spec against which to verify.) The XP methodolgy is pretty much the antithesis of what's needed here.

Re:XP (1)

joebebel (923241) | more than 8 years ago | (#14644372)

Having a design is worthless unless you develop a series of tests against it. Even the best designs by the best software engineers have flaws, and if there aren't, some flaws will pop up in coding. You can have a perfect design and crummy code, I've seen it over and over again.

Re:XP (1)

Anonymous Brave Guy (457657) | more than 8 years ago | (#14644384)

Of course proper unit tests are necessary; I wouldn't dream of suggesting otherwise. My point was simply that for this kind of project, they aren't sufficient. The XP approach inherently assumes that they are, and is fundamentally unsuitable for this type of work.

Re:XP (2, Informative)

Fulg (138866) | more than 8 years ago | (#14644346)

Extreme programming is your friend on this one. Doesn't matter what language you use, test and retest at every change. Testing is the only, only, only way to get extremely stable software outside of formal verification methods.

Exactly. Three words: Test Driven Development.

Since you're tied to C++, may I suggest CppUnitLite2 1.1 [gamesfromwithin.com] ...

It's incredible how much more productive you can be writing the tests first (contrary to what you might think initially). I hardly ever need a debugger anymore, and I know that the code I wrote does the right thing, and doesn't adversely affect something else.

Put the unit tests as a post-build step (or a dummy target in a makefile) and any defect will pop up instantly. If you find a bug not covered by your test suite, add a test that reproduces the problem, ensuring that it will never bite you again.

If you're not familiar with TDD, check out Wikipedia for an explanation and some useful external links: http://en.wikipedia.org/wiki/Test_driven_developme nt [wikipedia.org]

HR (1)

Blue Mandelbrot (951902) | more than 8 years ago | (#14644274)

I think the people you have working on this particular project will have the most influence on whether you have a stable design in the end; especially when working with C++. Put together a team of top-notch engineers, read the Mythical Man Month, then start to think about the design. If you gave three different teams the same task, most likely in the end these three teams would produce three different, yet functional designs; with one of these designs being the most stable. The success of many large projects hinge on the skill set of the engineers, communication, project management, and process (CMM, etc.).

What's important? (1)

mr_tenor (310787) | more than 8 years ago | (#14644278)

You talk about making an "ultra stable, crash free" ystem and then talk about crash recovery. I'm guessing from this you don't want to protect the application from harm (eg. full of exception catching and internal recovery from those evil buggy 3rd party librariesor whatnot), but how important is your data? Is that what you mean when you mean ultra stable, that you end up getting the right results at the end? Maybe you should think about redundancy, tracability of results etc.

Fault Tolerance Vs. Stability (5, Insightful)

aeroz3 (306042) | more than 8 years ago | (#14644280)

I think perhaps what you REALLY mean here by stability is Fault Tolerance. It's impossible to write code that has zero defects, outside of any trivial examples. Real Code Has Real Defects. Now, as you talk about modular design and being able to restart modules, you're talking about, not stability, but fault tolerance; the ability of the application to recognize and recover from faults. For instance, you can't necessarily guarantee that the module on machine A running task B won't die, hell the computer could accidently fry, but if your application was Fault Tolerant then the application would kick off another process somewhere else on computer C to rerun job B. Stable systems aren't built necessarily by trying to write defect-free code, but by recognizing that defects will occur and architecting the system in such a way that it can recover from them. Here you need to be concerned about things like transactions, data roll-back, consistency, techniques (active vs. passive, warm vs. cold). The key thing is before you even write a LINE of this C++ code, make sure that you have a complete, comprehensive ARCHITECTURE for your application that will gracefully handle faults.

Re:Fault Tolerance Vs. Stability (1)

cjonslashdot (904508) | more than 8 years ago | (#14644349)

That is an excellent point. A stable program has to be resilient to fialure.

be assertive (2, Informative)

Anonymous Coward | more than 8 years ago | (#14644282)

be assertive

Pretend you are writing code for an airplane... (2, Informative)

smackenzie (912024) | more than 8 years ago | (#14644285)

...that you are about to board.

I've spent over a decade refining how best to create stable, great software. And guess what? I still learn things every day. If you are really new to enterprise-grade software, the best thing you can do is search amazon and choose 3 to 5 great books about writing stable, bug-free enterprise code and just start reading and scheming. Give yourself lots of time. Be neurotic, type-A, attention to every detail, stay up at night wondering how your system could fail and what you can do to prevent it. Some immediate thoughts, however:

1. Good hardware. Obviously. Redundancy everything, self-diagnosing, etc. How can things go wrong? What will go wrong? How can I know when something is going wrong? How I can fix it quickly without impacting the system? Etc.

2. Enterprise grade (n-tier) architecture: You'll definitely want to do something where you have a database running on one or two (or more) machines, at least two business servers and at least two web servers. Redundancy is good. As you suggested, a setup like this lets you isolate problems (and provides for better security in general).

3. Test, test, test. From the very start, every day to the very end. Start coding by writing test suites for your code. Learn about unit testing, black box testing, user testing, regression testing, etc. And hire developers and QA whose sole job is test, test, test using great automated testing software.

4. Profile. Stress-load-test. Know how your system responds to all scenarios. Feel comfortable knowing the limits of your system. There should be no surprises.

5. Assert. Learn the magic of assert(). If your code isn't at least 25% asserts, you are not trying hard enough.

I told myself I was only going to write the first five thoughts that came to my mind, otherwise I could spend weeks trying to answer your question!

Re:Pretend you are writing code for an airplane... (0)

Anonymous Coward | more than 8 years ago | (#14644414)

5. Assert. Learn the magic of assert(). If your code isn't at least 25% asserts, you are not trying hard enough.

Very bad advice. All assert() does if an error condition is encountered is print a message and exit the application. That is hardly ideal for a production application. Consider the subject of your reply, 'Pretend you are writing code for an airplane...'. Should the application controling the airplane just print a stupid message and die when it encounters an error? If a condition is worth checking for with assert(), it would be better to check for it using 'if ()' and handle the condition properly. Worst case, you can exit the application after performing proper cleanup and shutdown like assert would have done. In many cases, it will be possible to recover without exiting the application. How many times do we have to put up with shitty applications printing cryptic error messages and exiting because some wanker decided to use assert() instead of handling errors like a grown up.

Try ACE/TAO or ICE (0)

Anonymous Coward | more than 8 years ago | (#14644286)

If you consider CORBA check out TAO (http://www.cs.wustl.edu/~schmidt/TAO.html [wustl.edu] ), it is a reliable open-source implementation that is widely used. You can get commercial support for it from OCI (http://www.theaceorb.com./ [www.theaceorb.com]

If CORBA is too heavyweight for you take a look at ACE (http://www.cs.wustl.edu/~schmidt/ACE.html [wustl.edu] ). ACE is an open-source portable framework that is used within the TAO real-time CORBA. It allows you to write portable networked applications in C++ but is a lot smaller than CORBA as it does not implement the ORB etc. Several companies use ACE as a lightweight middleware as it has a very permissive license. You can get commercial support for it from Riverace (http://www.riverace.com/ [riverace.com] ).

If you're aiming for performance you might check out ICE (http://www.zeroc.com./ [www.zeroc.com] ICE is available under the GPL and commercially and is a really fast middleware that is not CORBA but is portable between several languages.

All options provide you with a good framework to develop reliable and maintanable code.

It's simple, really (3, Funny)

eclectro (227083) | more than 8 years ago | (#14644288)


Use TPS reports [wikipedia.org] . You'll thank me later.

Small code (1)

Gyorg_Lavode (520114) | more than 8 years ago | (#14644290)

I heard once that the NSA would only certify software under 4,000 LOC. (I don't know if this is still true.) The reason was over 4,000 lines of code it became to complex to validate for highly critical systems. The person who told me this also stated that all of those old systems that have been running for ever that protect are nation were coded below this requirement. Some of them signifigantly under it.

I don't know how complex your system has to be, but I'd strip out anything that isn't 100% necessary. No convenience code. No pretty, easy-to-use, fully featured gui. Just the basic required to get the job done. At that short a length you should be able to reliably verify a VERY low coding error/KLOC. Also, I would recommend 2-person coding if you have the say in it. Have 2 people working at the same time. 1 codes, the other checks. It will save yourself a lot when you go to testing and check out.

Re:Small code (0)

Anonymous Coward | more than 8 years ago | (#14644429)

"...it became to complex...", "...that protect are nation...", "Just the basic required to get the job done.", "At that short a length ..."

Too bad you didn't follow your own advice -- with spelling and grammar!

You're already screwed from the getgo... (0)

Anonymous Coward | more than 8 years ago | (#14644291)

It doesn't have anything to do with using C++, because you are ultimately at the mercy of how the libraries you're accessing are going to interact with whatever systems you're doing. Because you have that dependency you can make nothing rock solid without putting a strong layer of security between those libraries and you.

If I were to make this as secure and stable as possible that's where I'd start - by wrapping those libraries in some strong error handling systems. Probably even go one step further and use some managed code wrappers (JNI, COM, CORBA, whatever) so that you can interact with the libraries using managed code. That will save you any number of headaches in the long run and will be immediately more testable, as you can separate easily the libraries you're using from the code you're writing and use stubs to test both fairly trivially.

Another tool: Microreboots (1)

pkhuong (686673) | more than 8 years ago | (#14644294)

In short, have each component loosely coupled with the whole system, and make each component crash and restart (to a recent good state) on failure. When shit happens the whole system can go on working, and the component that crashed resumes work quickly.

c.f. http://crash.stanford.edu/ [stanford.edu]

Use formal methods (0)

Anonymous Coward | more than 8 years ago | (#14644300)

Use formal methods. If you really need all of the things you mentioned, than I'm sure you client will be happy to pay the premium. Unless, like every other frakking client I've ever seen, they want it yesterday and for peanuts :).

+5 Funny (2, Funny)

amling (732942) | more than 8 years ago | (#14644301)

I wish I could mod the article +5 Funny.

Do your own work (-1, Flamebait)

Anonymous Coward | more than 8 years ago | (#14644307)

Do your own work?

Stability (1)

nroose (738762) | more than 8 years ago | (#14644309)

At some level, the definition of stability is that it does not change often. Thorough testing is of absolute importance. Make sure every block of code is tested with a large enough variety of data. Once you have it in production, make sure you go after any problems that you have - never let any bug bite you twice. Don't add unnecessary features; keep it as simple as possible.

Some simple rules... (1)

SingleShot (952042) | more than 8 years ago | (#14644310)

I've been out of C++ programming for several years, but I do remember a couple basic rules I followed that saved me from a lot of memory problems and invalid state problems. This may not be the kind of thing you're looking for, but here it goes... 1) Never allocate memory to a raw pointer. Never. That is, if you allocate memory to something, it better be allocated to a smart pointer like auto_ptr or a reference counting pointer (boost.org at the time had a family of these). The only exception to this rule is in the implementation of the smart pointers themselves. You should be able to find a good number of articles on this. 2) Always follow the "strong exception safety guarantee". Classes that provide this guarantee promise that they will not change their state if they throw an exception. Again, there are many articles. Here's an example of an assignment operator providing the guarantee (please forgive me if my C++ is not quite right - I'm rusty): Whole::operator=(const Whole& that) { auto_ptr tempPart1 = new Part(that.part1); auto_ptr tempPart2 = new Part(that.part2); this.part1 = tempPart1; this.part2 = tempPart2; } The example is a class Whole with two dynamically allocated Parts. The assignment operator instead of having two lines of code - i.e. cloning the two parts and assigning them directly to the member variables - has four. It first clones the parts to temporary variables and then assigns them to the member variables. Why? Without the temporary variables, if the second "new" operation throws an exception (such as bad_alloc), the state of the class would be different and inconsistent from before the call. It would have one original part, and one part from the cloned class. There are lots of other simple rules like this that can make code more solid, easier to read, and easier to maintain. If I remember right, the C++ FAQ from the C++ newsgroup contains a lot of them.

C++, automatics without limits will destroy you (2, Insightful)

jimmydevice (699057) | more than 8 years ago | (#14644312)

If your develop safety critical code, or anything that requires hi-rel you need to break down the application into functional testable units, with test fixtures to test each module. Then a integration test framework. You can't create a "verified" correct system with ad-hoc testing. Unless you're very good and you own the whole thing and then it's just you that knows it's right, Ya right.
JimD.

obvious answer (0, Flamebait)

cameronpurdy (781944) | more than 8 years ago | (#14644314)

> what strategies should a developer take to insure that the resulting program is as crash-free as possible? First, avoid using C++.

A few tips off the top of my head (1)

3770 (560838) | more than 8 years ago | (#14644315)

*Get a coverage testing tool

*avoid pointer arithmetic

*declare your copy constructors private (with no body) if you don't plan to use them. With this you'll catch unintentional use of the copy constructor through parameter passing.

*Use unit testing and make sure you can regression test your system

*Get a tool such as purify to find memory leaks and use of uninitialized memory

*turn on compiler warnings to its most anal setting

*Create a system to give you a call stack in case of errors (to quickly squash bugs because you will have bugs).

*Only write multi-threaded if you have to. If you have to program multithreadedly, try to have a good and well thought out strategy to avoid race conditions.

Know your client. (1)

jellomizer (103300) | more than 8 years ago | (#14644317)

You sound like you are new at this, probably just graduated from college a few months ago. But I would suggest that you know your clients and what their expectations are. Every client will say that they need the application bullet proof and fast, and it is the most important program on earth.
But what they really need is a simple solution that is better then what they currently have. This is not an excuse to write sloppy code. But to keep in mind what is needed. If you can get the job done in something simple do it. If you make it more complex then you need trouble will just occur in the future. If the program takes 100 minutes to run vs. 90 minutes. They will learn to deal with it. Remember It is often cheaper to buy a computer that is twice as fast then it is for the Programmer to write there code to run 10% faster.

This sounds like an issue my friend from college was talking to me about when he first started working. The client wanted a High performance method of sending messages to the company and managing the data. So he spend months of working with low level programming calls to have almost working solution that did what the management said they wanted. Then what they really wanted was a mailto: link on the page, and have outlook filter the data.

Question your assumptions (0)

Anonymous Coward | more than 8 years ago | (#14644318)

You say "for reasons of efficiency". How do you know that some other safer language (like Java) wouldn't be efficient enough? Have you done smoe tests? Have you analyzed the business case? The business case would look like this: "We could write it in C++, which would be efficient enough to run on Hardware X, or we could write it in Java, which would require more expensive Hardware Y. The cost difference between the two is Z, and the programming time to get the same level of safety between C++ vs Java is Q, so clearly it's a lot cheaper to do it in C++ and save money on the hardware."

Most of the projects that worry about "efficiency" haven't done an analysis like this, and 99% of the time, if they did such an analysis, the would find out that they are blowing $50k in programmer-hours to save $5k in hardware.

Then if you do another step of the analysis and put in a term like this: "A one-day outage or security breach could cost us $500k in lost business. Java has no buffer overflows, direct memory access or other common causes of security problems. Which is the cheaper option?"

Optimize last, usually.

I know, I hate it when I ask a question, "how do I do this with this certain tool" and someone says, "you shouldn't be using that tool", but, unless what you are doing is a ray-tracing cluster or similar, it sounds like you are on the wrong track.

You are worrying about the wrong thing. (0, Offtopic)

jefp (90879) | more than 8 years ago | (#14644321)

If you want an ultra-stable crash-free system, you will need to avoid both Linux and Windows. The choice of programming language and methodology is way down in the noise compared to that.

Use FreeBSD or stay home.

Oh, come on now... (1)

Max Threshold (540114) | more than 8 years ago | (#14644433)

Linux never crashes unless you try to upgrade something. :o)

It's not the language that counts... (1)

insert_username_here (844281) | more than 8 years ago | (#14644327)

it's how you use it!

Honestly, everyone seem to believe that all C/C++ code is unstable (probably because of all those people working for companies like Sun/Microsoft, who are promoting The Next Big Thing in Software Development), but it's far more to do with how you go about using the language and its features.

However, it is honestly an improvement on C. I think Bjarne Stroustrup (I'm almost certain I spelled that wrong) said it best: "C makes it easy to shoot yourself in the foot; C++ makes it harder, but when you do it blows your whole leg off." (http://public.research.att.com/~bs/bs_faq.html [att.com] ; apart from the quote, it has a whole lot of useful tips).

So, here's my advice, from my experience working in all sorts of languages (professionally I've used everything from TCL through PHP to Java/C#, but at home I use C++ exclusively). This is just my experience, though; take this with the requisite grain of salt:

STL is your friend. I cannot stress this enough. STL allows you to leverage complex containers (automatically resizing lists, hash tables, ropes - which are mega-long strings, etc) with complete type safety: you can create a container for a particular type, and the compiler will balk if you ever try to put something incompatible in. Also, the generated code will be optimized for your particular storage type. In this respect, C++ is actually better than most other languages (only with Java 1.5 and .NET 2.0 do Java and C# implement Generics, and in the case of Java, it's only implemented in the compiler).

Pointers are not always your friend. When you allocate data structures on the stack (e.g. "string blah;"), they will automatically be taken care of by the language. Even if an exception is thrown, these objects will still have their destructor called (which in turn will call all other necessary destructors) and the memory will be deallocated freely.

Of course, this will not work everywhere (large numbers of polymorphic, dynamically allocated objects). But in these cases, you can use helper classes (such as auto_ptr in the STL, or something like shared_ptr from Boost or nsCOMPtr from Mozilla). Look around; lots of other people have already solved this problem. In fact, there are even Garbage-Collection libraries available for C++!

Use exceptions instead of value checking, dammit! Every time you call a function that "returns 0/false/-1 on error" you are exposing yourself to possible bugs. Try to avoid this wherever possible, and try to keep all your OS-specific calls in one spot.

Check out Boost: (www.boost.org [boost.org] ) There is a LOT of useful stuff in here, and it will certainly speed up the development process.

Finally, make sure you design it properly! A couple of well-defined interfaces to separate things out will go a LONG way towards simplifying code, as it will ensure there is less coupling between different code modules (i.e. they don't depend on eachother as much, so you can rewrite one without affecting others). This goes for ANY language. C++ doesn't directly support interfaces, but abstract classes will do the same thing.

As for separating modules out, you could try CORBA (as you mentioned): ORBit (orbitcpp.sourceforge.net [sourceforge.net] ) is a free implementation (that happens to be used by GNOME). But to be honest you probably won't need to go to this effort. If written properly, your code should be stable enough that you don't even need to separate code out into separate processes. The furthest I'd go in your situation would be to write a command-line application that does all the work, then have a graphical client (although you could write this in a different language; don't get me started on graphics library support under C++) that uses a socket or TCP/IP to communicate with the worker thread. In this case, you could just use a very simple protocol to send messages like "start processing this file" or "stop processing". Of course, for communication much more complex than that, you will need to look into some sort of IPC.

Good luck!

Try the following (1, Interesting)

Anonymous Coward | more than 8 years ago | (#14644333)

If you really want stable programs using C++, be sure of the following basics -
1. Hire good programmers.
2. Make sure that EVERY function is defined with a specification, describing everything within the function. This allows you debug much easier.
3. Make sure that you've got all requirements written.
4. Try not to use fancy stuffs such as function pointers, de-referencing pointers, etc. Not all programmers are genius.
5. 1 good practice, if you allocate memory in an object, make sure that the same object is responsible for de-allocating memory. This is commonly practical.
6. For IPC, try not to use shared memory. Using message queue makes your work easier because of its guarenteed nature. Try to use MQ Series or something similar. They provides a robust mechanism for transferring and retrying data. It is the money worth spending. It is also compatible with Windows and Linux as well.
7. Stick to ANSI C++ functions to ensure compatibility.
8. Use a portable UI language such as qt.
9. Test, test and test. Peer review codes.
10. Establish a naming convention for variables and classes.

must steralize biological units... (1)

v1 (525388) | more than 8 years ago | (#14644335)

should be portable to Windows without much difficulty."

...

  insure that the resulting program is as crash-free as possible?

errrror.... eeeeeeeeeror... (computer explodes)

What You're Asking For Is....Difficult (1)

OmgTEHMATRICKS (836103) | more than 8 years ago | (#14644341)

But there are ways to approach this problem. Here are a couple:

1. Read absolutely everything you can find on Software Reliability.

2. Experienced Software Engineers opinions should count for more than Joe Q. Random here on Slashdot.

3. The fact that you are doing this in C++ is not the problem. More to the point, the quality of your development staff is paramount. One bad programmer set loose on a good project can wreck it.

4. Learn and use the priciples Extreme Programming. This should take care of many of your reliability problems.

5. Define *exactly* what you mean by "never crash". What are the risks involved? If it crashes, do people die or go to jail? If thats the case then you need to seriously consider correspondingly large funding levels, and ....

6. Testing. Addressed somewhat by Extreme Programming. You absolutely positively need to define exhaustive tests for each module, before you even bother coding it. This means you need to encode exactly what each module should be producing for a given set of inputs and use good sense in your OO programming to isolate each object/class from side effects and cascade failure. Furthermore, test not just for inputs and outputs, but for timing and also consider security implications. What if the system doesn't normally fail unless someone is trying to push it into a failure mode (known as a morbidity state). Make sure you are do tests for input validity. Buffer overflows in your code can open the entire system up to hacking. If this is a web app, scan it with WebInspect. Hire a penetration team to do application vulnerablity testing on it. Have the platform built from a known good system image. Lock the machine down. Test the platform for vulnerablities.

7. Test, Test, Test. Run the test suite with each and every build. Make each programmer responsible for writing the test cases against his partners code. If the test cases fail, make the responsible programmer fix the code (or the tests if they are broken) before the new module/class can be checked in.

8. You are using a decent version control system right? Anything is better than nothing. RCS, CVS, SVN whatever.

9. Use a proper software engineering life cycle. Make sure you never push code directly into production before Q/A testing. This can suicide a project (and a company) faster than you might imagine. If the tests are properly written, if positive tests results are required before code checkin, then Q/A should be a very fast process and they will thank you.

go!

Know what is happening (1)

SEWilco (27983) | more than 8 years ago | (#14644356)

I suggest you plan on knowing what is happening. Choose a logging system to use for reporting informational, debugging, and error conditions. Then use it generously. As reliability is important, you'll probably be testing input for sanity, and you should have messages available so people can figure out why data is rejected. Also have available informational messages about decisions being made, so it can be found that, umm... no widgets are being emitted because a gadget needs to be supplied.

test with valgrind! (4, Interesting)

graveyhead (210996) | more than 8 years ago | (#14644364)

valgrind -v ./myapp [args]

It gives you massive amounts of great information about the memory usage of your program.

The other day I spent nearly 3 hours trying to decode what was happening from walking the backtrace in gdb. Couldn't for the life of me figure out what was happening. Valgrind figured out the problem on the first run and after that, I had a solution in a few minutes.

Highly recommended software, and installed by default on several distributions, AFAIK.

Enjoy!

Erlang (0)

Anonymous Coward | more than 8 years ago | (#14644370)

I know you asked about C++, but for fault-tolerant network applications, nothing beats Erlang.

Oh come on... (1, Flamebait)

wbren (682133) | more than 8 years ago | (#14644371)

The application is meant to primarily run on Linux, but should be portable to Windows without much difficulty.
Well there's your problem.

I know: -1 Flamebait. But really, this is Slashdot. A story with such a minor reference to Windows going without a Windows-bashing comment for this long is just inexcusable.

If you have to ask this question (1)

GomezAdams (679726) | more than 8 years ago | (#14644389)

you might consider a career at RadioShack selling cell phones.
However the answer to your question is: DESIGN (Requirements Doc), DESIGN (High Level Design doc and Test Plan), DESIGN (Detail Design doc). CODE. Then TEST, TEST, TEST, TEST and RETEST. After that's done. then TEST some more.
Get customer signoff at every stage of design so as to have a stable target. Nothing screws with stability more than a customer/client who is allowed to change the requirements on the fly.
Following this pattern I've designed and built communications servers for credit card authorizations and N-tiered communication servers for claims submissions that ran error free for five or more years. But in C and UNIX or DOS - never Winblows. Or C++. But the design,code,test 'till you puke paradigm will work all the same.

Good Luck.

Have to say this... (1)

n54 (807502) | more than 8 years ago | (#14644392)

...although not strictly true, it's just too good of an Ask Slashdot motto to pass up :)

Ask Slashdot: impossible questions with impossible answers!

Ref:
"I need to create an ultra-stable, crash-free application in C++"
"...due to reasons of efficiency and availability of core libraries."
"...but should be portable to Windows without much difficulty."

Lots of posts with interesting advice though so best of luck! Would make for an interesting Slashback entry when/if you make it succeed (and possibly even if you don't).

--
this additional sig includes a portrait of Mohammed in support of freedom of expression, feel free to reproduce it

preconceptions or misconceptions? (1)

abes (82351) | more than 8 years ago | (#14644397)

First of, not sure why a big deal is made that it's in c++. c++ was developed to make more stable code. Sloppy programming in any language will cause a crash. I can write python code that will come to halt, it's not that hard.

It sounds like part of the problem is you don't *know* the c++ language. I suggest your first move is to get a book. Bjarne Stroustrup's book is pretty decent, and he goes into design issues.

Here are a couple features that can help your code be stable:

* Object-oriented design allows you to protect your variables by providing a protective layer. Provide access functions to change these variables. This makes debugging a cinch too, because there will be very few places you need to look that directly change variable 'foo'. Also, the constructors allow making sure that your variables are properly initialized. Humans err, so it's not perfect, but what is? You can easily come up with a system to make sure every new variable introduced will be initialized properly.

* Templates allow you to write code once, and use many times. The less code you have, in theory the less errors you will introduce.

* Smart pointers are you friend.

* STL - standard template library, provides almost all the standard container classes one would ever wish for. Less coding on your part, less errors.

* You can actually find garbage collectors for c++ (I am assuming this is why you might think other languages might be better than c++). The advantage, you don't have to worry about memory allocation. The disadvantage? You will lose some of your precious speed.

* Expections. The biggest reason for a program crashing, besides just plainly bad code (i.e. overwriting memory locations etc.) is not handling error conditions correctly. Exceptions are a huge leap from standard C in that they allow you to manage errors in a much more sane way. It will make your code a little more ugly, but if crashing is a major concern, use them.

* RTTI - run time type identification, yes that's right, c++ can do run time inspection if you want. You can use this to make sure that functions are receiving the correct types.

* '#ifdef __DEBUG__ #endif's make code a little ugly but is a great way to have production and a testing code. Put code in to check the sanity of things.

Other posters have suggested that you avoid pointer arithmetic. Generally, it's not a bad practice, but just like fire, pointers aren't bad. Sometimes you *need* fire to do certain things. No getting around it. Just remember, you are playing with fire.

As for design, you should read books. There's a couple good ones out there. You should also read the Linux guidelines. One of the best ones:

* Keep all your functions below 20 lines of code. I adhere strongly to this, and it has kept my code relatively bug free (although, code practices cannot save your from your own stupidity).

In general, keep your code small, and modular. This allows you to test out portions and check the sanity of things.

It is possible to use c++ with other languages. For example, there is a library called boost::python, which allows you to very easily create python modules.

On a final note, I've written tons of simulation code in c++. The only time I really encountered crashing code, was when I had code that sent out data to be crunched over TCP/IP, and then receive back the results. The crashing was simply because I didn't have enough time to write all the error checking code. The more distributed or complex the design, the more errors that can arise, and you have think of what they are and be able to catch them all.

Code generators, managers, frameworks, developers. (1)

radtea (464814) | more than 8 years ago | (#14644398)

Heavy use of code generators is always a good place to start--the less code you write, the fewer bugs you will create.

Distributed applications are very, very hard. It has all the joy of multi-threaded code with latency and communications issues added in. Stability of the overall system can only be achieved by a layered design: I've never seen the design patter described, but there is a "Manager Pattern" in which one process takes responsibility for controlling another process or set of processes. Autonomous restart is not a good idea because the single node that has experienced a crash does not have all the information required to make a good judgement about what to do. An external manager process that has an overview of the whole system status will do better.

Also, restarting a process and hoping the crash does not happen again is not in general the right thing to do, as students of the Ariane V disaster will realize. In that case there were multiple redundant processors that all had the same bug (relative to the inputs they were getting from the new vehicle). In most cases restarting after a crash will just result in another crash. Realistically, you need to be able to inform the user that something bad has happened and ideally give the user the opportunity to intervene (change parameters, for example) before restarting the process. This may require that the whole data analysis run be restarted, again indicating the need for an external manager process to co-ordinate everything.

For IPC, if you are using a common language on all platforms I strongly favour XML serialization and sockets. Any good code generator will generate serialization code to dump your classes to an XML string, and you can then send the string through a socket. It is relatively easy to do this, and avoids the huge overheads that CORBA involves (the only large project I've used CORBA on has since stripped it all out as being too heavy-weight, a decision I think is quite reasonable.)

Using a solid framework like Qt or wxWidgets (which I've honestly found to be superior to Qt in many respects) will help reduce the amount of code you write. For crash-free code you must use open-source frameworks as much as possible, because every set of libs has bugs, and the only way you can track them down and fix them is if you have the source.

Finally, you should think about hiring someone who's done it before [sidurisystems.com] :-)

Why stop there? (0, Flamebait)

davebo (11873) | more than 8 years ago | (#14644401)

Here's a better request: "I want an ultra-stable, crash-free application in C++ and a pony."

Anyone that would think it'd be a good idea to Ask Slashdot(tm) for advice on how to write the program you described isn't smart enough to write said program. Seriously. Call your boss/manager/lab supervisor/cult leader and tell them to find somebody else for the job, because you will fuck it up just as sure as the sun will rise.

And for all of you folks suggesting this guy/gal writes it in Python/Perl/.Net/Whatever instead of C++, give it a rest. Please. Does the questioner sound like the kind of person that would bother to write exception handlers? That would even bother to buy a frickin' book already to find out what an exception was? No, they do not.

Christ. I'm sick of this sea of idiots.

What's the application? (1)

Animats (122034) | more than 8 years ago | (#14644403)

What are you trying to do? You haven't said.

If you really want the reliabilty you say you want, you probably need something like QNX with the High Availability Toolkit. That's what drives the newer Cisco routers. Or a Tandem system from HP. Or some kind of fault-tolerant cluster architecture.

But you probably don't, or you would have mentioned MTBF requirements and allowed restart times.

minimize your own code when possible (1)

Ktulu_03 (668300) | more than 8 years ago | (#14644405)

A lot of already built software is available to help you:
  • The Boost library offers a number of useful tools to aid in C++ development. shared_ptr will offer a reliable reference counted pointer mechanism, that will help with eliminating memory leaks. scoped_ptr, scoped_array offer automatic memory cleanup.
  • Use STL containers, instead of home-grown linked lists, maps, etc. Get to know which container is the right one to use, under the circumstances.
  • Use STL algorithms. When using STL containers, use the STL algorithms to perform actions on the containers. The built-in loop mechanisms know how to enumerate through a container in a more efficient way than writing your own loop.
  • Combine the use of shared_ptr and STL containers, to store pointers in a list (for efficiency), but be able to have them automatically deleted when removed from the list.

If you have to write multithreaded code, plan out the design before implenting it, and always put the locks in as part of the design, not trying to shoehorn them in.

Forget it. (4, Funny)

Pig Hogger (10379) | more than 8 years ago | (#14644407)

Forget it, with C and C++.

Those are low-level programming-jock languages disguised as high-level languages. As long as the punks who program them will have pissing contests in code obfuscation, you can count on having buffer overflows and memory leaks.

Unit Testing and Smart Pointers (4, Insightful)

pjkundert (597719) | more than 8 years ago | (#14644415)

60,000+ lines of communications protocol and remote industrial control and telemetry code. No memory leaks, and less than 5 defects installed into production.

The reasons? A unit test suite that implements several million test cases (mostly pseudo-random probes -- the actual test code is about 1/3 the size of the functional code). In fact, the "defects" that hit production were more "oversights"; stuff that didn't get accounted for and hence didn't get implemented.

Just as importantly; every dynamically allocated object just got assigned to a "smart pointer" (see Boost's boost::shared_ptr implementation).

Quite frankly, compared to any Java implementation I've seen, I can't say that "Garbage Collection" would give me anything I didn't get from smart pointers -- and I had sub-millisecond determinism, and objects that destructed precisely when the last reference to them was discarded. The only drawback: loops of self-referencing objects, which are very simple to avoid, and dead trivial if you use Boost's Weak Pointer implementation.

We didn't have access to Boost (which I Highly Recommend using, instead of our reference counted pointer) when we first started the project, so we implemented our own Smart Pointers and Unit Testing frameworks [2y.net] .

I've since worked on "Traditional" C++ applications, and it is literally "night and day" different; trying to do raw dynamic memory allocation without reference counting smart pointers is just insane (for anything beyond the most trivial algorithm). And developing with Unit Testing feels like being beaten with a bat, with a sack tied around your head...

Re:Unit Testing and Smart Pointers (1)

pjkundert (597719) | more than 8 years ago | (#14644420)

Sorry; that's "developing without Unit Testing feels like being beaten with a bat, with a sack tied around your head..."

(Freudian slip? Nah... ;)

Simple (1)

Pao|o (92817) | more than 8 years ago | (#14644448)

Don't use Windows. :)

Resources / Techniques (1)

dennisne (947879) | more than 8 years ago | (#14644449)

First, you want to get some very experienced engineers who have done this type of thing before. Try ones with a background in either Avionics or medical devices, since both are life-critical / mission critical arenas. Second, you may want to look at companies which make fail-safe systems as these usually require special purpose hardware. HP has a computer line called NonStop which may be worth looking into (no, I don't own any HP stock :)). In terms of techniques: 1. NEVER, NEVER, NEVER, NEVER -- NEVER execute a loop waiting for some event to happen, that does not have a bailout mechanism, even if its just counting a variable up to (or down from) a few million or so (however long you've determined would be the maximum wait interval. If a piece of hardware breaks or a sibling thread crashes you'll be out to lunch. 2. Try to use a real-time system that is used on fail-safe systems commercially. 3. Don't use Windows. No matter how defect-free / error-free you make your system, it won't matter, because Windows will have more than enough defects and flaws to make your system fail in weird and mysterious ways. 4. Use a journalling file system like ext3 or reiserfs. 5. keep a recent copy of your operational state / data somewhere safe, like in non-volatile memory. If your system has to restart itself, this data will help you become operational again much faster. 6. Use a watchdog timer. Basically, this is a piece of hardware that your code has to "feed" on a periodic, repeated basis. If your code gets hung up in an infinite loop somewhere, the watchdog timer will assert the reset line and start things up again. That's where your "warm" data comes into play. 7. As many here have mentioned, try to partition your system in such a way that you can stay away from C++ as much as possible. 8. As some here have mentioned, real-time java or a commercial garbage collector library service could help alot in avoiding pesky memory leaks. 9. Assume you will mess up the first time. Its a much more realistic assumption than assuming you'll get it right the first time. Hey, most of us didn't even get our first KISS right the first time, and what you are looking at is alot more complicated than that :)). So, schedule enough time to do so (call the first one an R&D program), collect enough information about your design decisions and rationale that they will help you to understand where you went wrong, and help you to do better the second time around. Good Luck. 10. You've gotten alot of good comments from a lot of very intelligent and experienced people on this list. Read them over carefully. Good Luck dennis

Congratulations! Nice Work! (5, Funny)

aendeuryu (844048) | more than 8 years ago | (#14644452)

"I need to create an ultra-stable, crash-free application in C++. Sadly, the programming language cannot be changed...

From zero to flame war in under 20 words. Well done!

Be realistic (1)

The Clockwork Troll (655321) | more than 8 years ago | (#14644459)

Do your best to code correctly, write ample unit and integration tests for each component.

But at the end of the day, you have to assume your program is going to crash, either because of intrinsic uncaught bugs, or more likely, unexpected system problems (power outages, network wires accidentally unplugged, etc.).

What to do? Concentrate as much on recovery mechanisms as you do on code correctness, in case your program (or an entire node) does crash.

This has less to do with C++ (or any language) and more to do with thinking through how to journal your program's state (perhaps you are running on top of a file system or database that has transactional semantics and can help you here) and how to have the nodes coordinate after a failure.

Memory leak tracing (1)

Gravis Zero (934156) | more than 8 years ago | (#14644462)

There are many leak tracing tools, both libraries and external applications. a quick search with the keywords memory, leak and tracing will popup lots of tools. chances are the reason your application is going to crap out is because of memory issues. like others have said, test the hell out of everything, not just as a whole, each class individually. multithreading and the odd modularization scheme you are thinking of is most likely going to make it near impossible to find bugs.

Well... (1)

blair1q (305137) | more than 8 years ago | (#14644463)


I could tell you, but then I'd have to bill you.

Stable program will crash someday... (1)

zeekiorage (545864) | more than 8 years ago | (#14644464)

I don't know how to make a program crash free. What I do know is - if you have enough logging in your application, when your program does crash, you can quickly look at the logs and find the exact class/method that cause the fault and fix it.

Break down the big project into small components. Have your programmers write unit test for each components and also add instrumentation/logging code, with an option to turn on/off the logging.

Many people are suggesting that you move to the managed language, even when the OP has stated that it is not really an option. I think moving to managed language can help with stability but it can't eliminate all the crahses or memory leakes. You can still have things like null pointer exceptions or an ever growing array hogging lots of memory.
Load More Comments
Slashdot Login

Need an Account?

Forgot your password?

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>