Beta

×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Debugging

timothy posted more than 10 years ago | from the unlousy dept.

Programming 290

dwheeler writes "It's not often you find a classic, but I think I've found a new classic for software and computer hardware developers. It's David J. Agan's Debugging: The 9 Indispensable Rules for Finding Even the Most Elusive Software and Hardware Problems." Read on for the rest.

Debugging explains the fundamentals of finding and fixing bugs (once a bug has been detected), rather than any particular technology. It's best for developers who are novices or who are only moderately experienced, but even old pros will find helpful reminders of things they know they should do but forget in the rush of the moment. This book will help you fix those inevitable bugs, particularly if you're not a pro at debugging. It's hard to bottle experience; this book does a good job. This is a book I expect to find useful many, many, years from now.

The entire book revolves around the "nine rules." After the typical introduction and list of the rules, there's one chapter for each rule. Each of these chapters describes the rule, explains why it's a rule, and includes several "sub-rules" that explain how to apply the rule. Most importantly, there are lots of "war stories" that are both fun to read and good illustrations of how to put the rule into practice.

Since the whole book revolves around the nine rules, it might help to understand the book by skimming the rules and their sub-rules:

  1. Understand the system: Read the manual, read everything in depth, know the fundamentals, know the road map, understand your tools, and look up the details.
  2. Make it fail: Do it again, start at the beginning, stimulate the failure, don't simulate the failure, find the uncontrolled condition that makes it intermittent, record everything and find the signature of intermittent bugs, don't trust statistics too much, know that "that" can happen, and never throw away a debugging tool.
  3. Quit thinking and look (get data first, don't just do complicated repairs based on guessing): See the failure, see the details, build instrumentation in, add instrumentation on, don't be afraid to dive in, watch out for Heisenberg, and guess only to focus the search.
  4. Divide and conquer: Narrow the search with successive approximation, get the range, determine which side of the bug you're on, use easy-to-spot test patterns, start with the bad, fix the bugs you know about, and fix the noise first.
  5. Change one thing at a time: Isolate the key factor, grab the brass bar with both hands (understand what's wrong before fixing), change one test at a time, compare it with a good one, and determine what you changed since the last time it worked.
  6. Keep an audit trail: Write down what you did in what order and what happened as a result, understand that any detail could be the important one, correlate events, understand that audit trails for design are also good for testing, and write it down!
  7. Check the plug: Question your assumptions, start at the beginning, and test the tool.
  8. Get a fresh view: Ask for fresh insights, tap expertise, listen to the voice of experience, know that help is all around you, don't be proud, report symptoms (not theories), and realize that you don't have to be sure.
  9. If you didn't fix it, it ain't fixed: Check that it's really fixed, check that it's really your fix that fixed it, know that it never just goes away by itself, fix the cause, and fix the process.

This list by itself looks dry, but the detailed explanations and war stories make the entire book come alive. Many of the war stories jump deeply into technical details; some might find the details overwhelming, but I found that they were excellent in helping the principles come alive in a practical way. Many war stories were about obsolete technology, but since the principle is the point that isn't a problem. Not all the war stories are about computing; there's a funny story involving house wiring, for example. But if you don't know anything about computer hardware and software, you won't be able to follow many of the examples.

After detailed explanations of the rules, the rest of the book has a single story showing all the rules in action, a set of "easy exercises for the reader," tips for help desks, and closing remarks.

There are lots of good points here. One that particularly stands out is "quit thinking and look." Too many try to "fix" things based on a guess instead of gathering and observing data to prove or disprove a hypothesis. Another principle that stands out is "if you didn't fix it, it ain't fixed;" there are several vendors I'd like to give that advice to. The whole "stimulate the failure, don't simulate the failure" discussion is not as clearly explained as most of the book, but it's a valid point worth understanding.

I particularly appreciated Agans' discussions on intermittent problems (particularly in "Make it Fail"). Intermittent problems are usually the hardest to deal with, and the author gives straightforward advice on how to deal with them. One odd thing is that although he mentions Heisenberg, he never mentions the term "Heisenbug," a common jargon term in software development (a Heisenbug is a bug that disappears or alters its behavior when one attempts to probe or isolate it). At least a note would've been appropriate.

The back cover includes a number of endorsements, including one from somebody named Rob Malda. But don't worry, the book's good anyway :-).

It's important to note that this is a book on fundamentals, and different than most other books related to debugging. There are many other books on debugging, such as Richard Stallman et al's Debugging with GDB: The GNU Source-Level Debugger. But these other texts usually concentrate primarily on a specific technology and/or on explaining tool commands. A few (like Norman Matloff's guide to faster, less-frustrating debugging ) have a few more general suggestions on debugging, but are nothing like Agans' book. There are many books on testing, like Boris Beizer's Software Testing Techniques, but they tend to emphasize how to create tests to detect bugs, and less on how to fix a bug once it's been detected. Agans' book concentrates on the big picture on debugging; these other books are complementary to it.

Debugging has an accompanying website at debuggingrules.com, where you can find various little extras and links to related information. In particular, the website has an amusing poster of the nine rules you can download and print.

No book's perfect, so here are my gripes and wishes:

  1. The sub-rules are really important for understanding the rules, but there's no "master list" in the book or website that shows all the rules and sub-rules on one page. The end of the chapter about a given rule summarizes the sub-rules for that one rule, but it'd sure be easier to have them all in one place. So, print out the list of sub-rules above after you've read the book.
  2. The book left me wishing for more detailed suggestions about specific common technology. This is probably unfair, since the author is trying to give timeless advice rather than a "how to use tool X" tutorial. But it'd be very useful to give good general advice, specific suggestions, and examples of what approaches to take for common types of tools (like symbolic debuggers, digital logic probes, etc.), specific widely-used tools (like ddd on gdb), and common problems. Even after the specific tools are gone, such advice can help you use later ones. A little of this is hinted at in the "know your tools" section, but I'd like to have seen much more of it. Vendors often crow about what their tools can do, but rarely explain their weaknesses or how to apply them in a broader context.
  3. There's probably a need for another book that takes the same rules, but broadens them to solving arbitrary problems. Frankly, the rules apply to many situations beyond computing, but the war stories are far too technical for the non-computer person to understand.

But as you can tell, I think this is a great book. In some sense, what it says is "obvious," but it's only obvious as all fundamentals are obvious. Many sports teams know the fundamentals, but fail to consistently apply them - and fail because of it. Novices need to learn the fundamentals, and pros need occasional reminders of them; this book is a good way to learn or be reminded of them. Get this book.


If you like this review, feel free to see Wheeler's home page, including his book on developing secure programs and his paper on quantitative analysis of open source software / Free Software. You can purchase Debugging: The 9 Indispensable Rules for Finding Even the Most Elusive Software and Hardware Problems from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.

cancel ×

290 comments

Sorry! There are no comments related to the filter you selected.

Don't forget... (-1, Troll)

SCO$699FeeTroll (695565) | more than 10 years ago | (#8376746)

...to pay your $699 licensing fee you cock-smoking teabaggers.

i hate debugging (5, Funny)

Anonymous Coward | more than 10 years ago | (#8376753)

cause when i do it, it is often re-bugging

Effective Technique (5, Funny)

Rick the Red (307103) | more than 10 years ago | (#8377038)

I find the best way to uncover bugs is to do a demo for your boss's boss.

Good thing this comes out now, (-1, Flamebait)

Anonymous Coward | more than 10 years ago | (#8376783)

Instead of when Slashdot was originally started, otherwise we'd have had a nice, well written messageboard here.

Hardware *Debugging*? (-1, Informative)

freeze128 (544774) | more than 10 years ago | (#8376786)

I think the term you want is TROUBLESHOOTING.

Kids these days.... (1, Flamebait)

Anonymous Coward | more than 10 years ago | (#8376875)

the original bug was a hardware problem.

Re:Kids these days.... (0, Troll)

incubusnb (621572) | more than 10 years ago | (#8377001)

why is this modded funny? it should be informative because its true

Re:Hardware *Debugging*? (3, Informative)

scatterbrained (144748) | more than 10 years ago | (#8376888)

there's a distinction (in real life) and
in the book between troubleshooting something
that's supposed to work (think TV repair) and
debugging something that's never been made
before (hardware design).

Troubleshooting lends itself more to scripted
debugging, and "real debugging" is a bit more
free-form

Re:Hardware *Debugging*? (5, Insightful)

Mick Ohrberg (744441) | more than 10 years ago | (#8376889)

My boss has three standard trouble-shooting questions:
  1. Is it plugged in?
  2. Are you logged in?
  3. Is it spelled right?
Works in 9 cases out of 10.

Re:Hardware *Debugging*? (0)

Anonymous Coward | more than 10 years ago | (#8377020)

If it's Windows, add:

4. Try rebooting.

That fixes almost half of all Windows problems for me.

Negative. (1, Insightful)

Anonymous Coward | more than 10 years ago | (#8376904)

Chips have bugs, why do you think there are re-spins? We are talking from a design point here, not a "techie-fix-this-shit" point. Different ballgame.

Re:Hardware *Debugging*? (5, Insightful)

pclminion (145572) | more than 10 years ago | (#8376922)

I think the term you want is TROUBLESHOOTING.

Troubleshooting is what you do to fix your mom's ethernet card. "Oooh, it's on the bottom PCI slot, has no interrupt line. I'll just move it up one slot..."

Debugging is what you do with an oscilloscope to figure out why a particular circuit design isn't working as anticipated. You don't "troubleshoot" a circuit design. You debug it.

Or, to put it another way, "troubleshooting" is what a tech support monkey does. "Debugging" is what an engineer does.

MOD PARENT UP!!! (0)

Anonymous Coward | more than 10 years ago | (#8376969)

OP and the mods that modded OP up are morons and obviously not anywhere near the hardware design industry.

Re:Hardware *Debugging*? (3, Insightful)

wondafucka (621502) | more than 10 years ago | (#8376926)

Get off it. I can't think of a single reason why someone can't "debug" hardware or anything else for that matter. The origin of the word comes from a troubleshooting situation anyways. Why should someone be able to debug a relational database but not a relationship?

Re:Hardware *Debugging*? (1, Informative)

SamiousHaze (212418) | more than 10 years ago | (#8376940)

Actually,
the first computer "bug" was a hardware bug, as it was a moth that flew into a relay and jammed it. Removing the bug physically was debugging. http://www.maxmon.com/1945ad.htm is a reference.

Besides, when you are building a machine and dealing with Logic Gates - its the same type of debugging as with software logic.

Re:Hardware *Debugging*? (2, Interesting)

Anonymous Coward | more than 10 years ago | (#8377063)

Besides being highly apocryphal - that was the first use of the word bug in context of computing. It is not the first hardware bug by a long shot. Actually you would have known that if you actually read the page you linked to.

Re:Hardware *Debugging*? (0)

Anonymous Coward | more than 10 years ago | (#8376967)

Considering that, as of now, no less than six people have pointed out the wrongness of this statement, can somebody please mod him back down to reality? Thanks.

Anecdote (2, Funny)

ackthpt (218170) | more than 10 years ago | (#8377059)

A dishonest computer repairman dies and finds himself in Hades. The Devil smiles and says, "we've been waiting for you and have your place for eternity all ready." The repairman shudders, but follows the Devil as he is lead down a tunnel. They pass several doors along the way and the repairman peers through portals to see the other condemned up to their necks in feces, languishing in pools of acid and being proded by lesser demons with red hot pokers. The devil finally comes to a door and rubs his hands together. "Here you are, your eternal damnation." The repairman cringes as the door is flung open, but sees only a vast cavern filled with PC's, Mac's, Sun SparcStations, etc. "What? That's it?", he enquires, "I shall spend eternity fixing these then?" "Oh, yes", says the Devil. "Well that's not so bad," the repairman cracks his knuckles and strides into the cavern. "Just one thing", says the Devil as he closes the door, "they've all got intermittent problems."

"AAAHHHHHHHHHHHHH!!!!!!"

Re:Anecdote (0)

Anonymous Coward | more than 10 years ago | (#8377249)

you suck at joke telling. Please die. kthxbye.

me me me! (-1, Offtopic)

Anonymous Coward | more than 10 years ago | (#8376799)

I'm first!

debug this (-1, Troll)

Anonymous Coward | more than 10 years ago | (#8376801)

I received the email first thing in the morning from the IT department. Our network would be undergoing a major overhaul to correct the ad hoc growth it had experienced in the last year, and starting next week Internet access would be sporadic. There would also be a new firewall and security measures, replacing the old OpenBSD system I'd managed to get installed last Spring. Happy for the heads-up, I went to work right away to make sure Linux had no place on our network. This was not the first time that I had faced this threat.

Since the Open Source Mullet had been canned, a new threat had arisen at my workplace: the Fat Perl Hacker had assumed most of the Open Source Mullet's system and network administration duties, and it was no mystery to anyone at my workplace that he had a hard-on for Linux tucked away under his enormous, cascading gut. Since he was a major suck-up and workaholic, he had a lot more credibility than the Open Source Mullet this would be a real challenge for once. Dealing with the Open Source Mullet had been cake.

One day about a year ago our network guy gets asked to draw up firewall plans for this subnet of servers we have. Our network guy was your typical GNU-slinger save that he had a cascade of flowing hair down the back of his head and not a beard hanging from his face. And yeah, you can guess what he thought those firewalls were gonna run. Fast forward two days. I'd caught wind of the plans and had charts, graphs, and comparisons written up detailing OpenBSD and Linux security. Since this GNU guy had a mullet and dressed like a slob, I got taken seriously. Not to mention my data, impenetrable by any hippy "logic." OpenBSD was the more secure, even to the beancounters and idtiot management. So thanks to me, our firewalls happily run OpenBSD and not Linux, which would have buffer-overflowed into no-man's land every other hour. The Open Source Mullet gives me a lot of dirty looks lately.

That night, I went to work on my strategy. First, I would document the changes in Linux and OpenBSD since a year ago when we last went with a security plan. Linux was still at version 2.4, while OpenBSD had raced from version 2.8 to 3.1 a major revision! This was good so far, and I included the relevant diffs for each. I wondered what the Fat Perl Hacker was up to and pushed ahead with my preparations.

Tuesday morning, I went to talk with the VP of Operations, who had final say on the network project. I wouldn't leave anything to chance. But after chatting with him for a few minutes, I learned of a major monkey-wrench I hadn't expected: instead of a Unix firewall system, he was planning on installing a dedicated firewall box running Windows XP. Thankful for my fortuitous social engineering, I went back to my desk and began making over my strategy to deal with this new threat. Not only would I have to deal with Linux, I'd have to eschew the Windows option now.

Sitting in front of my iBook after work, I realized that taking on Windows XP in the same manner I was going to deal with Linux would be foolish if not wasteful. Obviously the Windows option was not about numbers, anecdotes, or experience. It was a bean-counting decision and all of the security statistics in the world wouldn't matter. Since I hadn't the foggiest about how our accountants viewed the whole operation and didn't have time to learn, I'd have implement a rapid-fire real-life assault on the Windows box, which was sitting on the VP's desk awaiting its place on the network. It was time to put on my Black Hat, and that night I stayed up until 02:00 researching Windows XP vulnerabilities. Linux would have to wait.

With just two days before the network changeover was to take place, I marched into work Wednesday morning knowing that what I did in the next few hours would decide the fate of our network security. To my surprise, just moments after I had sat down, the Fat Perl Hacker asked me to join him for a cigarette outside away from the ears and eyes of the office. 15 minutes later, I was fully aware of the precarious situation I was in.

Joining forces with the Fat Perl Hacker was something I had thought about but hadn't wanted to consider. It was a double-edged sword, and I wasn't about to kid myself. Although I am damn good, he had another full decade of experience over me and that included office politics. If we aided each other I ran the risk of pushing for Linux, even if inadvertently. And I certainly wasn't about to reveal my anti-Linux research to him. After doing some quick scheming, I agreed to help the Fat Perl Hacker dissuade the VP from using Windows XP but I had my own twist to what would follow after. Knowing my shortcomings, I decided to do the only thing that would give me an edge. And that was doing something that I knew better than anyone else at my office: playing dirty.

After a power-lunch of strategizing, the Fat Perl Hacker and I went to work on cracking the Windows XP box into oblivion. We then called back to the VP and told him to load the web administration page on the firewall box. A few minutes later he was standing in my cubicle smiling. I already had a print-out of the exploits we had used and handed them to him without a word. After looking it over for a minute, he shook his head and chewed his lip. He looked at the Fat Perl Hacker and me and told us to have something more secure ready by tomorrow morning before returning to his office. Now it was crunch time. The Fat Perl Hacker smiled at me in victory, and I smiled back at him in anticipation of putting my grand plot to work.

Now early Thursday morning, I revised my anti-Linux, pro-OpenBSD presentation into an airtight backup. I would use it as my last-ditch effort in case my primary plan failed. And that primary plan just happened to be underhanded, dirty, scandalous, unfair, and full of treason. After closing PowerPoint X I carefully downloaded and burned Slackware and OpenBSD 3.1 on the same brand of blanks the the Fat Perl Hacker used. I happened to know, thanks to some late-night "overtime" I put in the night before, that the Fat Perl Hacker was planning on presenting a burnt CD of Slackware as the solution to our firewall problem. Now if only I wasn't so scatter-brained and mislabeled burnt CDs so easily!

After a few brief hours of sleep, I waltzed into the VP's office, asking when we would have our meeting about the firewall. He asked me if 30 minutes was OK, to which I said was fine, and also asked that I go and ask the Fat Perl Hacker if that was good for him as well. Back in the cubicle farm, I told the Fat Perl Hacker that the VP wanted to talk to him about the meeting. I had about 45 seconds in his empty cubicle to find his Slackware CD, replace it with my mislabeled OpenBSD CD, and book it back to my cubicle to put on an innocent face. I just barely made it as I passed him on the way back to my seat. Wiping the sweat from my brow, I read my email for the next 28 minutes.

The moment of truth had finally arrived as I sat down in the conference room in front of a newly-purchased, bare Pentium4 PC. The Fat Perl Hacker joined me and the VP moments later and we got down to business. The VP smiled and said he knew we both probably had our own ideas about network security, and he wanted to hear them both. Playing the fool I volunteered to let the Fat Perl Hacker present his solution first. I tried vainly to suppress a smile as he slipped his CD from its sleeve. Holding it up, he said the magic words I had counted on him saying:

This is all we'll ever need to keep the network secure.

A few beeps and whirs later from the PC and the Fat Perl Hacker was greeted by OpenBSD 3.1, ready to format and install on the hard drive. Not waiting a second for his jaw to unslacken, I jumped up, slapped the table, and exclaimed that I couldn't have picked better myself, shaking my own burnt CD in the air. What a coincidence! And things just got better from there. So much better, in fact, that I didn't even need to bust out my PowerPoint presentation. It turned out that Fiscal wanted an answer right then and there, I heard through the freshly-answered phone, and the VP didn't waste an instant telling them he was on his way. That is, before informing the Fat Perl Hacker that he was about to get assigned a bunch of new security modules to customize and that I'd have to do the firewall install and configuration. The L-word hadn't even been uttered during the meeting and I was homefree.

The weekend overtime didn't bother me at all. I got time-and-a-half for it and the firsthand opportunity to make sure OpenBSD would oversee the sanctity of our network. Things went so well that we didn't even have any network hiccups the next Monday morning. Despite the unexpected Windows XP push, the Fat Perl Hacker's Linux obsession, and a few variables left to chance, I had come through with flying colors and even impressed myself.

The Fat Perl Hacker, however, never invited me to join him for a cigarette again.

you a stupid asshole (-1, Offtopic)

Anonymous Coward | more than 10 years ago | (#8376921)

ill hack your mother goatse style, perl ownz you

Re:debug this (1)

stuffduff (681819) | more than 10 years ago | (#8376993)

Interesting and fairly well written.

Re:debug this (-1, Offtopic)

pinchhazard (728983) | more than 10 years ago | (#8377113)

This is lovely!

Taco in the news again (-1)

Muda69 (718162) | more than 10 years ago | (#8376805)

from Ananova [ananova.com] :

Pooing burglar sentenced to toilet cleaning

A US burglar has been sentenced to clean 100 prison toilets after raiding a house and defecating on the floor.

A Saginaw County judge has ordered Rob Malda to carry out the sanitation duties at Saginaw County Prison after he admitted stealing a gun from a home in Richland Township, Michigan. Malda is best known as the creator and primary editor of the popular technology website Slashdot [slashdot.org] .

The 23-year-old Washtenaw county native was also sentenced to two years' probation and has been ordered to pay about 350 to clean the property, reports MLive.com.

Story filed: 12:14 Monday 23rd February 2004

#9 is wrong (5, Funny)

Anonymous Coward | more than 10 years ago | (#8376808)

What if someone else fixes it?

Re:#9 is wrong (1)

Neil Blender (555885) | more than 10 years ago | (#8376894)

What if someone else fixes it?

Rule 0: Use a bug tracking system and assign yourself the bug before starting anything.

yuck (4, Funny)

theMerovingian (722983) | more than 10 years ago | (#8376810)

Make it fail: Do it again, start at the beginning, stimulate the failure, don't simulate the failure, find the uncontrolled condition that makes it intermittent, record everything and find the signature of intermittent bugs, don't trust statistics too much, know that "that" can happen

Isolate the key factor, grab the brass bar with both hands (understand what's wrong before fixing), change one test at a time, compare it with a good one, and determine what you changed since the last time it worked.

Does anyone else feel dirty after reading this?

Re:yuck (2, Insightful)

kooso (699340) | more than 10 years ago | (#8376950)

Not me. It would be interesting to have a rule of thumb for the real economic cost of debugging this way.

We all (except Dijstra [utexas.edu] , perhaps) take trade-offs, for a reason. Perhaps that reason is only ignorance, but then we wouldn't get anything done.

Change one thing at a time (5, Insightful)

tcopeland (32225) | more than 10 years ago | (#8376822)

> Change one thing at a time: Isolate the
> key factor, grab the brass bar with both
> hands (understand what's wrong before fixing),
> change one test at a time, compare it with a
> good one, and determine what you changed
> since the last time it worked.

This is helpful with unit tests, too. If I find a bug, I want to figure out which unit test should have caught this and why it didn't. Then I can either fix the current tests, or add new ones to catch this.

Either way, if someone reintroduces that particular bug it'll get caught by the unit tests during the next hourly build [ultralog.net] .

NOTICE: Fag weddings outlawed (-1, Flamebait)

Anonymous Coward | more than 10 years ago | (#8376826)

President Bush has announced his intention to put a stop to fag weddings. You guys are free to pack fudge and go to the opera as often as you want, you just won't have your "relationship" legally recognized. Same goes for the ladies. This is America, deal with it or leave.

Heisenbugs... (5, Informative)

Aardpig (622459) | more than 10 years ago | (#8376829)

...are always the worst: bugs which disappear when you look for them. Insert a print statement? The bug disappears. Use a debugger? The bug reappears, but in a different place.

Heisenbugs are almost always caused by buffer overflows. They can often be prevented (at least in Fortran 77/90/95/03) by enabling array-bounds checking at compile time; but before I knew about this, I had a hell of a time tracking them down.

Re:Heisenbugs... (5, Funny)

AndroidCat (229562) | more than 10 years ago | (#8376953)

When I was working on arcade games, we had a sure-fire method of making bugs go away. However, shipping each coin-op game with an engineer and $40k worth of testing equipment connected to it wasn't really cost-effective.

Sonuvabitch! (3, Interesting)

Anonymous Coward | more than 10 years ago | (#8376985)

Like 15 years ago in my intro CSE class my first Fortran program which found "edges" in a text file filled with numbers did this. Everything looked good. It would compile. But wouldn't print out its little thing. So I instert statements to print out status of where it is, and it works! I take out the statements and it doesn't. In/out in/out. SO I go ask the TA for help. He says its one of the damndest things he's seen, sorry, Fortran isn't something he's really an expert at.

I have hated fortran for years, having written a single program in it, based on this.

Re:Sonuvabitch! (3, Informative)

Aardpig (622459) | more than 10 years ago | (#8377072)

I have hated fortran for years, having written a single program in it, based on this.

Fortunately, things have changed a lot since then. With the introduction of modules and array arithmetic in Fortran 90/95, sitations where routines are called with the wrong arguments, or arrays are subscripted incorrectly, are much less frequent. I haven't been bitten by a Heisenbug for a couple of years now; and when I am, switching on checking at compile and run time usually reveals the problem pretty quickly.

Re: Heisenbugs... (3, Interesting)

gidds (56397) | more than 10 years ago | (#8376998)

You're describing bugs which are reproducible, but only on the unchanged code.

Worse even that those are bugs which aren't reproducible at all, where there's no way to determine the conditions that caused them, or be sure you've fixed them. The only way to handle them is to fill the code with assertions and defensive code, and hope that at some point it'll catch something for you...

Re:Heisenbugs... (4, Insightful)

WayneConrad (312222) | more than 10 years ago | (#8377029)

Heisenbugs are almost always caused by buffer overflows.

They are also almost always caused by race conditions, the most insidious of which is thread-safe code that turns out only to be safe on a uniprocessor system.

And don't forget the phase of the moon, or for the truly unlucky, intermittently glitchy hardware.

Re:Heisenbugs... (0)

Anonymous Coward | more than 10 years ago | (#8377033)

Yeah it's weird isn't it? In my operating system class my groups' program caused an error at one of the delete[] statements and it dissappeared and reappeared depending on whether we ran it in the debug environment or not.

Re:Heisenbugs... (3, Interesting)

pclminion (145572) | more than 10 years ago | (#8377128)

In my operating system class my groups' program caused an error at one of the delete[] statements and it dissappeared and reappeared depending on whether we ran it in the debug environment or not.

I'll tell you with 99% certainty that this was caused by a piece of code overrunning the end (or beginning) of a new[]'d buffer, clobbering the memory allocation meta-data. This causes delete[] to crump when it hits a bogus pointer and flies off into never never land.

By running in the debug environment you changed the memory layout of the allocation in such a way that the problem was masked.

These kinds of bugs only seem weird the first time you encounter them. They're actually some of the most common types of bugs. With enough experience you'll be finding them in your sleep.

Re:Heisenbugs... (4, Insightful)

kzinti (9651) | more than 10 years ago | (#8377107)

Heisenbugs are almost always caused by buffer overflows.

In my experience, Heisenbugs are almost always caused by stack problems. That's why they go away when you put print statements in the code - because you're causing the usage of the stack to change.

Buffer overflows (to arrays on the stack) are one good way to munge the stack. Returning the address of an input parameter or automatic variable is another way, because these are declared on the stack and cease to exist when the enclosing block exits. Anybody else using such an address is writing into the stack in an undefined manner, and chaos can result!

Re:Heisenbugs... (2, Interesting)

morcheeba (260908) | more than 10 years ago | (#8377152)

that's funny... I just tracked one of these down that existed in our software - the optimized version ran differently than the non-optimized version. It turns out the bounds checker is in the non-optimized version, and a couple of places in the code used x[rand()]=y ... the bounds-checker (implemented as a macro which had side effects) *caused* the heisenbug!

The Slashdot 9 (0, Funny)

grub (11606) | more than 10 years ago | (#8376833)


(this IS slashdot after all)

1) Check your registry.
LINUX ain't got no registry crap!

2) Check your FAT32/CXFS filesystems.
LINUX is JOURNALLED and can do that in the background!

3) Verify your drivers are current.
LINUX is stable with drivers written in COBOL back in the 50's!

4) Defrag your disks.
Defrag?! You must be a WINDOZE LOOSER!!

5) Check your connections on the back of the PC.
HAHAHA! LINUX does that AUTOMATICALLY!!! LOOSERSSSSS!!!

6) Are your cards well seated? Power down and reseat.
HAHAHAAHA! LINUX can HOTSWAP EVARYTHING EVEN CPUs, LOOSERS!

7) Is your OS up to date? Perform a Windows Update.
HAHAAHAHA!!! LINUX can update itself automatically cuz of its LEET HEURISTICS and COOLNESS that MS aint got, LOOSERS!!!

8) Start in "SAFE MODE"
HAHAHA! What's the other? UNSAFE MODE!?!?! LINUX is always safe, LOOSERS!

9) Reinstall Windows.
HAHAHAH! LINUX NEVER NEEDS INSTALLING! Pour the blood from a freshly sacrificed penguin on the disk and it installs AUTOMATICALLY THROUGH AIR!!!!! LOOOOOOOSERS!!!!!

Re:The Slashdot 9 (-1, Offtopic)

Anonymous Coward | more than 10 years ago | (#8376851)


Check your FAT32/CXFS filesystems.

Whoops, s/CXFS/NTFS.. too much time on our SGI machines at work.. :)

Finally, grub posts something truly funny (0)

Anonymous Coward | more than 10 years ago | (#8376936)

because it's truly true.

How Lame. (1)

BigChigger (551094) | more than 10 years ago | (#8376956)

Maybe if you spend a few weekends dealing with MS "bugs" you would have a new appreciation.

BC

Re:How Lame. (0)

Anonymous Coward | more than 10 years ago | (#8377005)

Maybe if you spend a few weekends dealing with MS "bugs" you would have a new appreciation.

And realize that you are better off starting at step 9.

I'd agree (5, Informative)

scatterbrained (144748) | more than 10 years ago | (#8376839)

I've read it and it's a good book, but I would
just borrow it from the library and then print
out the poster to remember the 'rules'.

There's not enough meat to keep it on my
precious shelf space.

I don't need a book... (4, Funny)

garethwi (118563) | more than 10 years ago | (#8376848)

...to learn how to debug. I only need my own sloppy code.

Re:I don't need a book... (2, Funny)

Dukael_Mikakis (686324) | more than 10 years ago | (#8377026)

Yeah who needs a book?

System.out.println("1");
ComplexClassInstantiator _cci = new ComplexClassInstantiator((UtilType)ClassGrabber.ge tObjectFromDefaults(_a, _b, _kl1, _z56), new UtilSocket(_p23876, _p5541), new Runnable() { public void run() { runDataSetAnalysis(_p1, _p2, _paramClass); } });
System.out.println("2");

Output: 1
[Error message]
So obviously the error is in the line between the two print statements.

So, I repeat, who needs a book?

Re:I don't need a book... (1, Funny)

Anonymous Coward | more than 10 years ago | (#8377144)

If that's how you code all the time, you do.

Re:I don't need a book... (0)

Anonymous Coward | more than 10 years ago | (#8377109)

"Daddy made him for good, but he's turned out Evil." - Wallace and Grommit? First time I've ever seen a quote from those used on Slashdot... nice one! :D

He forgot regression tests (5, Insightful)

mark99 (459508) | more than 10 years ago | (#8376853)

Regression test suites (if possible) should be maintained so that when bugs get fixed, they stay fixed.

Just my 2 cents.

Good read (5, Insightful)

GoMMiX (748510) | more than 10 years ago | (#8376857)

"
If you didn't fix it, it ain't fixed: Check that it's really fixed, check that it's really your fix that fixed it, know that it never just goes away by itself, fix the cause, and fix the process."

I can think of a WHOLE lot of tech's and admin's who really need to follow number 9 a lot closer.

Especially those Windows admins/techs who think 'restart' is the ultimate fix-all. Though, sadly, I suppose in many cases that's about all you can do with proprietary software. Well, that and beg vendors to fix the problem. (We all know how productive that is....)

Re:Good read (1)

Dukael_Mikakis (686324) | more than 10 years ago | (#8377096)

Yeah, I can't tell you how many times, at all levels of my company, we are told simply to try it again, and to change the parameters and how the error must be "configurational".

Try telling the client that what they want to do with our software and how they want to use it is a "configurational" problem and they're using our software incorrectly, and 9 times out of 10 the clients (in our case, major banks) will drop our software.

But then again, Microsoft uses the "configuration" argument all the time with its customers, so I guess it works sometimes.

Re:Good read (5, Insightful)

swb (14022) | more than 10 years ago | (#8377126)

No, it's number *5* that EVERYONE needs to remember to follow. I see way too many people (including myself in a hurry) changing more than one thing at a time and then immediately wondering what fixed or why it didn't get fixed.

This is especially important when changing a second variable can actually mask the fix of the change of the first variable or cause a second failure that appears to be the same as the initial failure.

I guess they should have added a rule 10: be patient and systematic. Obvious problems usually have non-obvious solutions, and a thorough examination of the situation is time consuming. Don't take short cuts or you might miss the problem.

Re:Good read (1)

Ytsejam-03 (720340) | more than 10 years ago | (#8377253)

What I find most frustrating is working on a team with someone who does not follow number nine. If they can't reproduce a problem, then they assume it has gone away. Then a customer reports it, it becomes a major catastrophe, and we have to debug it using information from the customer because we can't reproduce it in the lab.

but how do you know it's fixed? (4, Insightful)

sohp (22984) | more than 10 years ago | (#8376858)

Nothing about writing code for a test case that exercises the bug, then rerunning it every time you make a change you think will fix the bug? Seems like a big oversight. Any program of reasonable size is going to require wasting a significant amount of time restarting and re-running to the point of failure, and with every manual check of the result, there's an increasing probability that fallible human will make a mistake.

More programmers need to get Test Infected [sourceforge.net] .

Re:but how do you know it's fixed? (1)

scatterbrained (144748) | more than 10 years ago | (#8376939)

this is covered under the heading of 'make it
fail', IIRC under one of the subheadings.

My Favorite Debugging Tale (2, Interesting)

stuffduff (681819) | more than 10 years ago | (#8376882)

Soul of a New Machine [amazon.com] by Tracy Kidder [bookbrowse.com] (book teaser) [businessweek.com] My favorite chapter was The Case Of The Missing NAND Gate.

for every cell phone provider out there (1)

NumLk (709027) | more than 10 years ago | (#8376885)

know that it never just goes away by itself

I can't tell you the number of times I've heard something along the lines of "Resetting your account will fix the problem." Guess what- it doesn't. Then again, after this [slashdot.org] , I guess I shouldn't expect much.

The first law of debugging (5, Funny)

ToSeek (529348) | more than 10 years ago | (#8376892)

"The most likely source of the current bug is the fix you made to the last one."

The first rule of debugging (1, Funny)

XiChimos (652495) | more than 10 years ago | (#8377171)

No,

The first rule of bugs is, you do not talk about bugs. The second rule of bugs is, YOU DO NOT TALK ABOUT BUGS!

-From a memo to Microsoft's new employees

Hey...the chicken bones are a valid fix too.... (4, Funny)

Dr_Marvin_Monroe (550052) | more than 10 years ago | (#8376895)

These "rules" are great, but nothing beats the mystic power of a little goat blood and chicken bones waved over a misbehaving system.

Without these, the average user might be tempted to try and fix it themselves.... Next thing, my job is being "offshored" to a phone bank in India.

No, the chicken bones and a little incantation will keep my job right here, where it belongs.

Re:Hey...the chicken bones are a valid fix too.... (2)

pnatural (59329) | more than 10 years ago | (#8377003)

RMS, is that you?

note to moderators

i love what RMS has done for Free Software. the comment above is a joke, take it as such.

Re:Hey...the chicken bones are a valid fix too.... (2, Funny)

Saeed al-Sahaf (665390) | more than 10 years ago | (#8377098)

i love what RMS has done for Free Software. the comment above is a joke, take it as such.

I might consider your request if your comment was... funny?

And the final solution (4, Funny)

aliens (90441) | more than 10 years ago | (#8376898)

10) Hammer.

if 10 fails

11) Shotgun.

Congrats problem solved, human destressed.

Re:And the final solution (0)

Anonymous Coward | more than 10 years ago | (#8377162)

I assume you mean, shotgun to the face?

Re:And the final solution (2, Funny)

Rahga (13479) | more than 10 years ago | (#8377239)

Most solutions only go up to ten.... These go to eleven.

Time (5, Insightful)

quarkoid (26884) | more than 10 years ago | (#8376906)

One thing's clear from looking at that list - spend more time on testing your code.

Unfortunately, speaking as an ex-programmer, time is one luxury that PHBs don't afford their minions. A project needs to be completed and knocked out of the door as soon as possible. The less time spent on unnecessary work, the better.

It is also unfortunate that PC users have been brought up expecting to have buggy software in front of them and expecting to have to reboot/reinstall. What motivation is there to produce bug free code when the users will accept buggy code?

Ho well, at least I run my own company now - master of my own wallet - and can concentrate on quality solutions.

Re:Time (2, Interesting)

Dukael_Mikakis (686324) | more than 10 years ago | (#8377216)

Yeah, the sad truth seems to be that when prioritizing general and regression testing seems to rank low on the list because it doesn't actually create new product (though it is of course necessary, we aren't selling our testing, we're selling our new code).

With marketers and product managers and sales people all pushing our product and making wild promises about delivery dates and patch dates it becomes a fruitless effort to keep on top of the regression testing, and I've found that with the software at my company, it's sort of ramped up until it'll reach a breaking point where we'll just need to scrap big portions of our system and release a whole new build, likely using "Buzzwords" or cryptic acronyms that are supposed to indicate progress.

... and it doesn't help that a big chunk of our source code was recently leaked [slashdot.org] .

Sounds interesting (4, Interesting)

pcraven (191172) | more than 10 years ago | (#8376907)

Teaching people how to debug isn't that easy. It requires some experience before they get the hang of it.

I'm a stickler for labeling code often, and tracking changes released to production. Because of this, I often seem to be a stick in the mud when it comes to refactoring.

Heavy refactoring makes your code nicer. But when you have to do a lot of debugging on something that worked be refactoring, you can start to appreciate that keeping the change set managable is a 'good thing'. (I do financial apps, so this may not work for everyone.)

The things I see people fail at most is the ability to 'bracket' the problem. Go between code that works and doesn't work, filtering the problem down to something simple.

The second thing is the inability of some people to go 'deep' in their debugging. Decompile the java/C#/whatever code, trace through the library calls, whatever.

Its nice to see another good book on the market that seems to cover these topics.

Re:Sounds interesting (0)

Jotaigna (749859) | more than 10 years ago | (#8377000)

There is another book, but this time a novell by Ellen Ullman its called "The Bug" [amazon.com] and it deals with a software developer fighting this kind of critter.
There is some insight on the characters mind and obssesing with the bug. I saw this in Spectrum Magazine, i dont know if it has been reviewed here.

Rule 0 (5, Funny)

Anonymous Coward | more than 10 years ago | (#8376913)

0. If you're a software guy blame it on hardware, if you're a hardware guy blame it on software.

0.1. Blame it on the user.

0.2. Blame it on your colleague.

0.3. Blame it on your manager.

0.4. Yell at the computer and tell it to work dammit!

0.5. Put head on keyboard and sob.

0.6. Read Slashdot.

0.7. Post on Slashdot.

0.8. Call it a feature not a bug.

Re:Rule 0 (1)

Patrik_AKA_RedX (624423) | more than 10 years ago | (#8377222)

0.9. Get an hexorcist

0.A. light candles in the form of a polygon around monitor. Pray to the Holy Electron (Or the Holy Proton if don't believe in Leptons)

0.B. Burn several Windows CD's (with fire that is)

0.C. Pull the plug, claim a power blackout and call it a day.

Remain focused. Don't let others' WAGs get to you (1)

PornMaster (749461) | more than 10 years ago | (#8376914)

I find that when troubleshooting systems with which other people have worked longer, I have had better luck just asking them simple facts and troubleshooting myself rather than listening to their wild-ass guesses and having to shoot them down.

You can read a sample chapter in PDF format (5, Informative)

TheCrayfish (73892) | more than 10 years ago | (#8376915)

You can read a sample chapter from the Debugging Rules book in PDF format by going here [debuggingrules.com] . (Requires the free Adobe reader [adobe.com] .)

Rule #10 (0)

UncleBiggims (526644) | more than 10 years ago | (#8376920)

Market the bug as a "random feature".

Are you Corn Fed? [ebay.com]

Re:Rule #10 (2, Funny)

Dukael_Mikakis (686324) | more than 10 years ago | (#8377245)

... either that or add a comment right before the pertinent code:

/* Code used with permission: Microsoft Corporation */

(Not that your clients would have your source code to look at, but ...)

Top 10 Rules of Debugging (5, Funny)

ackthpt (218170) | more than 10 years ago | (#8376930)

10. Code is _always_ Beta. It's never done until it's no longer in use or support no longer exists.

9. The better the SDK, the more sophisticated the bugs.

8. There's always more bugs in the other guy's (girl's) code.

7. Declaring code bug-free is asking for it to fail at the worst possible time with the greatest visibility.

6. A good design is as likely to have bugs as a bad one. Bugs are equal opportunity.

5. Debugging time is inversely proportional to coding time.

4. If it works the first time, there's a bug, but you won't find it until you roll it out.

3. Debugging is fun. Really! It's when you run out of bugs that you should wonder if you got them all, that's not fun.

2. The most difficult bugs to find are in the most straightforward looking code.

1. That's not a bug, that's a feature.

Re:Top 10 Rules of Debugging (3, Interesting)

kooso (699340) | more than 10 years ago | (#8377091)

10. Code is _always_ Beta. It's never done until it's no longer in use or support no longer exists.

What about the opposite. Anyone against versioning? Tried and failed in Google to find an "Against versioning" campagin. I mean, somebody must be out there who only wants version 1.0 for all software.

I guess the issue is in the meaning we attach to version numbers. What about a program as a well-specified function that, once is implemented (at least for a fixed platform) needs no "enhancements"?

(E.g. Don Knuth adds a digit to each version of TeX, implying that he doesn't plan to add anything substantial, or else he'll be running into very long version numbers).

Re:Top 10 Rules of Debugging (1)

ackthpt (218170) | more than 10 years ago | (#8377121)

I guess the issue is in the meaning we attach to version numbers. What about a program as a well-specified function that, once is implemented (at least for a fixed platform) needs no "enhancements"?

Then it becomes Bug Ver. 1.1

First posdt (-1, Offtopic)

Anonymous Coward | more than 10 years ago | (#8376954)

up today! If you [mit.edu] fou8d is dying and its Invited back again.

FP for Nader! (-1, Offtopic)

Anonymous Coward | more than 10 years ago | (#8376964)

Don't forget to Vote Nader [slashdot.org] in 2004!!!!!!!

Number one (2, Interesting)

Jooly Rodney (100912) | more than 10 years ago | (#8376977)

Okay, haven't read the book, and I guess dhweeler is distilling the rules down to a soundbyte, but isn't #1 the most important and difficult part of debugging? I mean, if I knew system Foo ver. Bar had such-and-such an idiosyncrasy, I could code around it, but Googling for hours to find the one message board post that lets you Understand The System can be aneurysm-inducing. It's not even always the idiosyncrasies of a system -- the sheer volume of stuff you have to learn about I/O conventions, operating systems, etc., in order to write a useful program in a non-toy language boggles the mind. I'm surprised people are able write programs in the first place.

Race Conditions? (4, Insightful)

Speare (84249) | more than 10 years ago | (#8377015)

Make It Fail is pretty hard to do when it comes to race conditions. This has got to be the most frustrating kind of bug. Others are referring to the Heisenbug which comes in a variety of flavors.

Sometimes you don't KNOW when there's multiple threads or processes, or when there are other factors involved.

Have you noticed that a new thread is spawned on behalf of your process when you open a Win32 common file dialog? Have you noticed that MSVC++ likes to initialize your memory to values like 0xCDCDCDCD after operator new, but before the constructor is called? It also overwrites memory with 0xDDDDDDDD after the destructors are called. And that it ONLY does these things when using the DEBUG variant build process? Did you know that .obj and .lib can be incompatible if one expects DEBUG and the other expects non-DEBUG memory management?

Someone on perlmonks.org was just asking about a Heisenbug where just the timing of the debugger threw off his network queries. Add the debugger, it works. Take away the debugger, it fails. I've got a serial-port device which comes with proprietary drivers that seem to have the same sort of race condition.

The top 9 rules mentioned here look great. But you could write a whole book on just debugging common race conditions for the modern multi-threaded soup that passes for operating systems, these days.

Re:Race Conditions? (0)

Anonymous Coward | more than 10 years ago | (#8377202)

And this is why application programs should avoid threads at all costs.

I really liked the book, but I would have... (4, Insightful)

mykepredko (40154) | more than 10 years ago | (#8377044)

probably added a step stating that the problem symptoms and causes should be articulated clearly (probably between #3 and #4) before trying to fix anything. I've seen too many engineers/programmers/technicians list symptoms and attack them individually, only to discover that they were related.

On the surface, this flies in the face of "divide and conquer" - but what I'm really saying here is make sure you have the problem bounded before you attack it.

Also, with Step 9, I would have liked to see more emphasis on ensuring that nothing else is affected by the "fix". Making changes to code to fix a problem is often a one step forward and two steps backwards when you don't completely understand the function of the code that was being changed.

All in all, an excellent book in a little understood area.

myke

Missed one: explain it to someone (5, Insightful)

deanj (519759) | more than 10 years ago | (#8377060)

They missed a good one: explain the bug to someone.

If you start explaining the bug to someone, there's a good chance in mid-explanation you'll realize a solution to the problem.

Some school (can't remember which) had a Teddy Bear in their programming consulting office... There was a sign. "Explain it to the bear first, before you talk to a human". Silly as it sounds, people would do it, and a large portion of the time they'd never actually have to consult the staff... by explaining it to the bear, they solved the problem.

Weird, but true.

The Three R's of Windows Debugging (1, Funny)

iguana (8083) | more than 10 years ago | (#8377061)

Retry
Reboot
Reinstall

And that's why I love having source code!

Re:The Three R's of Windows Debugging (0)

Anonymous Coward | more than 10 years ago | (#8377178)

I always thought it was:

Reboot (Windows)
Reinstall (Your app)
Reformat (Your hard drive)

Re:The Three R's of Windows Debugging (0)

Anonymous Coward | more than 10 years ago | (#8377240)

ROR ROFL LMAO WinDOZE!!! Micro$oft!!! I'm so fucking 31337 cos I use Linux.


MS VC++ has some of the best debugging tools around period. You can take your open source gdb and ddd and stick it up your Linux Zealot asshole.


When tools like purify, quantify and clearcase are available for free on Linux, then we can talk. Until then, STFU you don't know what you are talking about.

Missing rule (3, Insightful)

timdaly (539918) | more than 10 years ago | (#8377062)

He missed a rule: Explain the bug to someone else.
The second pair of eyes often finds the problem
even if they don't have a clue what you are talking
about.

A missing rule (5, Insightful)

Tired and Emotional (750842) | more than 10 years ago | (#8377124)

One rule he's missed is very important: Before making a measurement (like printing the value of a variable or changing something about the code) work out what answer you expect to see. Note well - do this before you look at the result. When you see something different, either its a symptom of the bug, or a symptom of you not yet understanding the system. Resolving this will either improve your understanding or turn up the problem.

my review... (2, Informative)

chmod_localhost (718125) | more than 10 years ago | (#8377153)

Mr. Agans' book presents real life experiences, or as he calls them war stories and humor filled comment/anecdotes.

I find myself chuckling and giggling along while reading this book, some of what he said brought back my own memories while working/debugging on my own software bug(s), or other people's bug(s) that I have somehow 'inherited' because they left the company, or are too busy on other projects to debug their own code. I like the metaphors that he uses to explain ideas or concepts that seems a bit too complicated to understand.

Mr. Agans made this very clear in the beginning of his book; the book is not a cover-it-all book, it is a general concept book on how to isolate, find, and debug something that has gone wrong. The principles presented by Mr. Agans can be applied to situations covering everyday life. He presented examples of well pump and light bulb, etc...

More experienced software/hardware engineers or more experienced problem solvers who read this book might find it covering bases that they already know, but the humor makes it worth while.

One Rule For 90% of Bugs (5, Informative)

BinBoy (164798) | more than 10 years ago | (#8377156)

4. Divide and conquer: Narrow the search with successive approximation, get the range, determine which side of the bug you're on, use easy-to-spot test patterns, start with the bad, fix the bugs you know about, and fix the noise first.

That's a very usueful rule. In nearly 20 years of programming I haven't found any tool or technique that works better than printf / std::cout / MessageBox and logging.

Logging is especially important if your users aren't conveniently in the same building as you. When a customer has a problem I've never seen before, I usually tell them to run the program with the -log switch and send me the log. Nearly always this leads to the problem and I can fix the bug within minutes.

Add logging to your app and you'll increase the number of hours you can sleep.

Now That It's Written Down (5, Interesting)

severoon (536737) | more than 10 years ago | (#8377164)

Well, even though I think most people 'round these parts would agree with me that the book covers the fairly obvious, I will say this: it's absolutely necessary to have an "expert" write these things down because all too often, us developers try to proceed and get blocked by management. At my last job, we had a big problem with WebLogic transaction management, some bizarre confluence of events was causing a HeuristicMixedException to be thrown by the platform--by the way, WebLogic people, thanks a lot for naming this exception this way and taking the time to make sure it gets thrown in no less than six totally unrelated (as far as I can tell) circumstances. I love it when exceptions originate ambiguously, from several sources, and no one part of the platform has authority over the problem.

This was a big enough problem that we had to set up a separate, isolated environment to figure out what was going on. 4 out of the 5 architects involved on the project (no it wasn't a huge project--you can see HME wasn't the only problem here) had cemented ideas about what was going wrong...none of them agreed of course...and we had no less than 3 managers with theories based on the idea that the Earth sweeps through an aether against which all things can be measured.

The biggest issue with this testing environment was keeping everyone's mitts off of it, especially those people who didn't have to ask for permissions to the system (the architects, managers...in other words everyone). And the managers didn't agree that it was particularly important to record every step methodically, or limit the number of people making changes to the system to 1 at a time. Instead, they set up a war room and engaged in what I like to call: Fix By Chaotic Typing. (It's chaotic in the sense that, there are definitely patterns to the activity taking place, but you have to be Stephen Wolfram to find and understand them.)

Needless to say, that didn't work. If I'd had access to this book, an authority willing to put the obvious in print might have bolstered my argument that we needed to take resources OFF this issue, not add more. Alas, it was not to be. The bigwigs decided that, since the current manpower wasn't able to track down this bug, it was time to bring in the high-priced WebLogic consultants. We got some 1st generation WebLogic people, 3 of them eventually, and they came in and immediately set themselves to the task of learning our business, telecommunications. And at a mere $150/hour, why not? (Management decided the bug was non-deterministic at some point and this assembly of people was given the informal team moniker: the Heuristics team. I preferred "the Histrionics team".)

So I eventually teamed up with the lead architect on the project and we solved the problem by subterfuge. We had to intentionally set these people working in a direction--everyone, employees and WebLogic consultants alike--that was so off-the-track they actually didn't interfere with any part of the system likely containing the error. This gave us a reasonable amount of time and space to track down the bug in 3 days' time. At only the loss of 6 weeks and several thousand dollars in expenses alone for the WL consultants.

sev

riiiight (1, Funny)

Anonymous Coward | more than 10 years ago | (#8377195)

"9. If you didn't fix it, it ain't fixed: Check that it's really fixed, check that it's really your fix that fixed it, know that it never just goes away by itself , fix the cause, and fix the process."

Obviously, the author has never used C++Builder.

An extra rule (4, Insightful)

MythMoth (73648) | more than 10 years ago | (#8377218)

"Describe the problem to someone else."

This is so effective that it doesn't require the person to whom you're explaining it to pay attention, or even understand. A manager will do ;-) Even when the person to whom you're explaining it is smart, alert, and interested, it's almost never them that fixes the bug.

The process of describing the behaviour of the program as it ought to be versus the behaviour it is exhibiting forces you to step back and consider only the facts. This in turn is often enough to give you an insight into the disconnect between what's really happening and what you know should be happening.

If you catch yourself saying "that's impossible" when debugging some particularly freaky bit of behaviour, it's definitely time to try this.

The input of the other party is so irrelevant in this process that we used to joke about keeping a cardboard cut-out programmer to save wear and tear on the real ones...

Common sense strikes again (3, Funny)

ValentineMSmith (670074) | more than 10 years ago | (#8377248)

Was it just me or did anyone else get to the bottom of that bullet list and feel let down? Here I was expecting some sort of earth-shattering revelation, and all that the list shows are common sense rules. At the risk of sounding elitist, maybe this was epiphany for someone else. Lord knows it would be a revelation in our QA department, where the list consists of exactly one rule:

  1. -Grab a programmer

But still... Someone made money off of that? Heck, look for my new book next week, "Walking to Peoria in 3,976,554 steps", with each step being "Place your rear foot 1.5 feet in front of your front foot."

Load More Comments
Slashdot Login

Need an Account?

Forgot your password?
or Connect with...

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>