Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Comair Done In by 16-Bit Counter

michael posted more than 9 years ago | from the whoopsie dept.

Bug 441

Gogo Dodo writes "According to the Cincinnati Post, the Comair system crash was caused by an overflowed 16-bit counter. Perhaps Comair should have paid for the software upgrade to MaestroCrew." You heard it here first...

cancel ×

441 comments

Sorry! There are no comments related to the filter you selected.

Forget Y2k... (4, Funny)

Cytlid (95255) | more than 9 years ago | (#11218201)

This was Y32k!

Re:Forget Y2k... (1)

Bigby (659157) | more than 9 years ago | (#11218444)

you mean Y65K?

2^16 (or unsigned short in C++ on x86), stores values from 0 to 65535

Re:Forget Y2k... (3, Informative)

stupidfoo (836212) | more than 9 years ago | (#11218483)

RTFA

It was a signed integer. The problem occured at 2^15 (32768) (although the article reported it as 32,000)

Well... (5, Funny)

Tuxedo Jack (648130) | more than 9 years ago | (#11218202)

It seems that 16 bits and 640K wasn't enough for them after all.

actually... (2, Funny)

erroneus (253617) | more than 9 years ago | (#11218204)

...I heard it on BugTraq first...

Re:actually... (1)

Chris Mattern (191822) | more than 9 years ago | (#11218289)

I heard it on comp.risks.

Signed or unsigned (-1, Offtopic)

Jamesie (615784) | more than 9 years ago | (#11218206)

Signed or unsigned? (didn't rtfa)

Re:Signed or unsigned (4, Informative)

Vengeance (46019) | more than 9 years ago | (#11218222)

I believe this will answer your question:

Tom Carter, a computer consultant with Clover Link Systems of Los Angeles, said the application has a hard limit of 32,000 changes in a single month.

"This probably seemed like plenty to the designers, but when the storms hit last week, they caused many, many crew reassignments, and the value of 32,000 was exceeded," he said.


So it sounds like a signed int.

Re:Signed or unsigned (1)

Thowllly (529311) | more than 9 years ago | (#11218232)

Signed. Had the developers used unsigned then it might never have overflowed at all (They were going to replace the system in a few months anyway.)

Re:Signed or unsigned (1)

jejones (115979) | more than 9 years ago | (#11218236)

Since TFA quoted someone who cited a limit of 32,000 reassignments, I would guess signed (and that they were just truncating--if it's really a 16-bit counter, the upper bound would be 32,767).

Re:Signed or unsigned (2, Insightful)

Evangelion (2145) | more than 9 years ago | (#11218270)


Since 2^16 = 65536, I'm guessing signed.

Maybe it had "worked just fine" for them? (0, Troll)

EvilStein (414640) | more than 9 years ago | (#11218219)

Maybe the existing system was working just fine? Upgrades too expensive?

Perhaps this was something that they never anticipated in a thousand years?

I bet *now* they'll upgrade, but until this particularly hairy situation arose, they didn't really see a need to upgrade a computer scheduling system that had been working great for them.

Dunno why this is interesting, aside from seeing "16 bit" in the headline.

Re:Maybe it had "worked just fine" for them? (5, Insightful)

kirun (658684) | more than 9 years ago | (#11218264)

It's interesting because it provides a lesson in software design - arbitary limits will trip you up eventually. It's not as if nobody knew to avoid them [catb.org] before, though.

Re:Maybe it had "worked just fine" for them? (5, Insightful)

jedidiah (1196) | more than 9 years ago | (#11218269)

This assumes that they had the resources. Given the current competitive environment in terms of consumer price and fuel costs, it would not be surprising if IT got the short end of things.

Re:Maybe it had "worked just fine" for them? (1)

barzok (26681) | more than 9 years ago | (#11218300)

Not surprising? I've come to expect it. If a department isn't actively making money, they're considered an expense. Management is usually too shortsighted to notice that without IT, the rest of the departments can't make money (or can't make as much).

Re:Maybe it had "worked just fine" for them? (2, Insightful)

TopShelf (92521) | more than 9 years ago | (#11218351)

According to the article, the system was on track to be replaced in the coming months...

That said, it's very true that many businesses get by "just fine" with existing, antiquated systems. Justifying system upgrades can be difficult from a conventional cost-benefit standpoint, when a large part of the benefit is based on preventing theoretical problems like this one.

Re:Maybe it had "worked just fine" for them? (1)

EvilStein (414640) | more than 9 years ago | (#11218495)

You're right, it's supposed to be upgraded in the near future. I missed that bit, sorry 'bout that.

It's also true that many organizations don't upgrade things because they continue to run worry-free for ages, therefore upgrades aren't seen as necessary.

Upgrades & security patches often get overlooked because of the old saying - "If it isn't broken, don't fix it!"

Re:Maybe it had "worked just fine" for them? (5, Informative)

Remlik (654872) | more than 9 years ago | (#11218371)

bet *now* they'll upgrade, but until this particularly hairy situation arose, they didn't really see a need to upgrade a computer scheduling system that had been working great for them.

RTFA RTFA RTFA - The new system goes live in January. Good god its like herding cats around here.

Gotta love /. when you can get moded +5 insightful without RTFA AND posting verbal vomit....

I did RTFA (2, Funny)

EvilStein (414640) | more than 9 years ago | (#11218446)

"The computer software that crashed and grounded Comair's entire fleet on Christmas Day was an antiquated system due to be replaced in the coming months."

First paragraph. I had just forgotten about it by the time I got to the *end* of the article. 6am + ADD - caffeine = me missing that bit. My bad. :P

Re:Maybe it had "worked just fine" for them? (1)

Saven Marek (739395) | more than 9 years ago | (#11218456)

Well the good thing now is they have to respond quickly. They have problems in december which FORCED them to upgrade and they expect to have it done by january.

So what would have happened if they hadn't been forced to? things would just get worse and quickly

Online Anime Gallery's [sharkfire.net]

Re:Maybe it had "worked just fine" for them? (1)

GoofyBoy (44399) | more than 9 years ago | (#11218422)

>I bet *now* they'll upgrade, but until this particularly hairy situation arose, they

Thank you Mr. Monday-Morning-Quarterback.

Its all different once you are on the field.

Re:Maybe it had "worked just fine" for them? (2, Funny)

EvilStein (414640) | more than 9 years ago | (#11218466)

Oh, you're quite welcome. Be sure to stay tuned for my next opinion piece regarding "World Peace in 6 Easy Steps."

Re:Maybe it had "worked just fine" for them? (2, Funny)

jcr (53032) | more than 9 years ago | (#11218453)

Maybe the existing system was working just fine?

Apparently not.

-jcr

Common problem (3, Insightful)

confusion (14388) | more than 9 years ago | (#11218227)

Well, not this specific problem, but businesses have a common problem of outgrowing the systems that run their business. OTOH, this was an outsourced solution, so this case is pretty hard to explain away, other than sheer incompetence.

Re:Common problem (4, Insightful)

Anonymous Coward | more than 9 years ago | (#11218298)

That's not true.

Even if a system is outsourced it doesn't provide a company with 100% stable system. Frequently businesses define the type of system they want hardware/software and the amount they're willing to pay for it.

I work in a company that provides outsourced solutions. Monthly we provide info to businesses about their system. Also, we frequently make recommendations to augment the systems to improve performance. Businesses often choose to ignore our reports and recommendations.

Nothing's more frustrating then a meeting with a business having them tell us we mucked it up and in return we drop off the last 6 months of recommendations on upgrades to provide them additional hardware for their growing requirements and question why they choose to ignore it.

Now I'm not saying the provider didn't muck up. But, what I am saying is your statement that it's all the provider's fault may not be the case as the airlines probably choose to stay on that system as it 'met' their needs as they saw them.

Re:Common problem (0)

confusion (14388) | more than 9 years ago | (#11218349)

My point was more that it is much much harder to upgrade a system when it's managed internally. That is one of the value props of outsourcing. Organizationally, it is easier to do when it is a hosted system. Having said that, I was referring to the incompetence of Comair, not SBS.

98 (0, Troll)

kg_o.O (802342) | more than 9 years ago | (#11218231)

That's what you get for using a buggy and OLD OS for such important tasks. Grats!

Re:98 (1, Insightful)

Anonymous Coward | more than 9 years ago | (#11218328)

Ah, Slashdot. Where even after the true culprit is unveiled, the problem somehow continues to be on Microsoft's end.

I'm not sure how the open source movement can pride itself with the quality of imbecile fanboys it attracts.

Re:98 (1)

kg_o.O (802342) | more than 9 years ago | (#11218459)

Ah, Slashdot. Where even after the true culprit is unveiled, the problem somehow continues to be on Microsoft's end. What I was reffering to was the California case, similar to this one (described in a bugtraq post). Mixing them up was of course my bad. Anyhow, the main problem was not the fact, the bug exists, but the fact, that they're using OS with known bug where people's lives depend on it, and that's what my comment was reffering to.

Re:98 (1, Funny)

Anonymous Coward | more than 9 years ago | (#11218415)

The paperclip popped up and said, "It looks like you're writing a letter... you should use a 16 bit variable here!"

Comair? (-1, Troll)

babbage (61057) | more than 9 years ago | (#11218235)

For those of us that weren't necessarily spending the holidays keeping up with Slashdot, or the news in general, would it be that bad to spell out what Comair is and what happened to them in the article summary? Would it?

Come on, basic 4th grade reporting rules should apply to Slashdot as well: every article summary should mention Who, What, Where, and When; bonus points for How and Why, though those usually take longer to explain and can be omitted from a 100 word summary. But can we at least cover the basics?

Nah, forget it, Slashdot's editors obviously have no interest in improving the editorial quality of the site...

Re:Comair? (2, Insightful)

bje2 (533276) | more than 9 years ago | (#11218257)

just RTFA linked in the summary ("conair system crash")...

Re:Comair? (0, Troll)

babbage (61057) | more than 9 years ago | (#11218360)

Yeah, but that's my point -- the majority of Slashdot's summaries don't adequately explain what they're pointing to; covering Who / What / Where / When would do it, but one or more of these is almost always missing.

I, and most other people, don't have time to read every single article just to figure out what the Slashdot editor was posting about. Moreover, in a lot of cases -- not this one, but lots of others -- the befuddled herd of Slashdot visitors has trampled the site in question, so it's not even possible to look up the original article. Admittedly, that isn't the case here, but it's true more often than not.

All of this could be avoided easily if the Slashdot editorial staff were forced to sit through the first week of an introductory journalism class, where the grad student teaching the class on a professor's behalf will drill in the mantra of Who / What / Where / When until the students finally get it.

Re:Comair? (1)

bje2 (533276) | more than 9 years ago | (#11218435)

The "Comair system crash" link was just a link to a previous slashdot story (no change for /.'ing that), and already contained a summary...it would have been (-1) redundant to provide a new summary for what originally happened, when you can just link to the original summary...

Time is valuable. (1)

oneiros27 (46144) | more than 9 years ago | (#11218433)

As much a troll as the parent might've been, he does make a valid point in that people's time is valuable.

I can't stand it when someone posts a URL to some mailing list, telling everyone to go and look at it, without telling us why we should care about it.

When taken without neighboring information, the only clues that Slashdot gave about the article was that it was in the 'IT' section, and had a 'bug' picture next to it, so we know it was a technology problem, which most computer geeks would have known from 'overflowed 16-bit counter'.

It doesn't take that much extra effort to add a little more detail so that people can make a decision if it's of sufficient interest to spend our time reading the article:
According to the Cincinnati Post, the Comair system crash that resulted in 1,100 airline flight cancelations on Christmas day was caused by an overflowed 16-bit counter.

Re:Time is valuable. (1)

babbage (61057) | more than 9 years ago | (#11218490)

As much a troll as the parent might've been, he does make a valid point in that people's time is valuable.

I can't stand it when someone posts a URL to some mailing list, telling everyone to go and look at it, without telling us why we should care about it.

Thank you -- that's all I'm trying to say. I'm frustrated at how amateur most of the Slashdot editors are, but I honestly wasn't trying to be a troll about it.

A simple amendment to the statement they used would have been enough to clarify everything for everyone, and would put the link in the proper context for deciding if we want to read more about it.

Why has Slashdot never managed to get this right? They've certainly had enough time and enough complaints about it by now; I can only assume that they're being willfully sloppy. That's again not meant as a troll; it's disappointment in how Slashdot is less useful and credible than it should be simply because such journalistic basics aren't taken seriously here.

Re:Comair? (2, Funny)

slapout (93640) | more than 9 years ago | (#11218265)

The human slashdot editors where replaced long ago. I think it's some google news beta program that currently posts the stories.

Re:Comair? (0)

Anonymous Coward | more than 9 years ago | (#11218271)

Comair is an airline. A Google "I'm Feeling Lucky" search would have brought you straight to their site. A straight Google search would have brought up the following under google news:

Comair Says Back to Normal Daily Schedule - Los Angeles Times (subscription) - 16 hours ago
Comair Downed By Computer Counting Limit - Information Week - 17 hours ago
Comair Faces Government Investigation - InternetNews.com - Dec 28, 2004

Which should pretty quickly get you to a summary account of the problem. A moment or two listening to the news on any reasonably competent radio station in the US would do it, too. And no, Slashdot is not a news reporting site: it's an aggregator.

Re:Comair? (1)

airdrummer (547536) | more than 9 years ago | (#11218278)

what, the link [http://it.slashdot.org/it/04/12/26/052212.shtml?t id=128]
doesn't work 4 u?

Jayson Blair (-1, Troll)

AtariAmarok (451306) | more than 9 years ago | (#11218284)

"Nah, forget it, Slashdot's editors obviously have no interest in improving the editorial quality of the site"

Don't worry. Eventually, Slashdot will be up to the standards of the "New York Times" that you so love. In fact, they have hired Jayson Blair as a consultant to work on the problem. Not only that, we'll have to login as "elmer fudd" with pw "90210" in order to read even the worst trolled response.

Re:Comair? (5, Insightful)

buckeyeguy (525140) | more than 9 years ago | (#11218285)

Potential trolls aside, Comair is a regional air carrier, based at the Greater Cinci airport, that was bought up by Delta, and turned into their secondary route provider. They handle both short and medium-range non-stop flights (i.e. Ohio to Atlanta or Orlando). So it's more closely-related than the code-sharing arrangement that some carriers have.

Now my question would be, since they're owned by Delta, why wouldn't Comair flights be handled within Delta's own reservation/flight tracking system?

p.s. I've traveled through CVG, on Delta, during the holidays. Not anymore... One weather-delayed flight and the whole system falls apart.

Re:Comair? (1, Insightful)

babbage (61057) | more than 9 years ago | (#11218443)

Potential trolls aside, Comair is a regional air carrier, based at the Greater Cinci airport, that was bought up by Delta, and turned into their secondary route provider. They handle both short and medium-range non-stop flights (i.e. Ohio to Atlanta or Orlando). So it's more closely-related than the code-sharing arrangement that some carriers have.

Thank you. So if the article just said something like...

Gogo Dodo writes "According to the Cincinnati Post, the Comair airline shutdown in the midwest last weekend was caused by an overflowed 16-bit counter. Perhaps Comair should have paid for the software upgrade to MaestroCrew." You heard it here first...

...then it would have been enough. That one amendment -- "airline shutdown in the midwest last weekend" -- would have been enough to make the article perfectly clear to anyone who wasn't up to date with the story so far.

Is this really so much to expect? I wasn't trolling, honest, I'm just increasingly frustrated with how addled the editorial review of articles is around here. They've had years to figure this stuff out, but it's just as bad now as it was when Slashdot got started. Are they ever going to start taking their jobs as editors seriously & professionally? (And this isn't meant to be a blanket complaint -- Pudge for one seems to be aware of these basics and tries to do right with his articles, but other editors consistently muck this up...)

Re:Comair? (1)

Alioth (221270) | more than 9 years ago | (#11218503)

Probably so the crews from both parts of the airline don't get mixed-n-matched - there is strict seniority in airline heirachies, and the system Delta has probably can't handle two seniority lines at once for the same job. Your seniority affects what schedules you get - the more senior you are, the more likely you're going to get the schedule you want to fly. New pilots/cabin crew tend to get all the shitty routes no one else wants to fly.

Bugtraq covered this as well.. (5, Informative)

EvilStein (414640) | more than 9 years ago | (#11218238)

Here's the original post [neohapsis.com] :

Hi,

On Christmas Day last Saturday, Comair Airlines had to completely stop
flying
all of its planes due to computer problems. Comair blamed the computer
problems on their pilot scheduling software being overloaded after bad
weather earlier in the week forced many flights to be rescheduled. Comair
now hopes to have all of its 1,100 daily flights restored by tomorrow.

An article which was published today at the Cincinnati Post Web site
provides some interesting details of a software failure in Comair's pilot
scheduling software:

How it happened
http://www.cincypost.com/2004/12/28/comp12-28-2004 .html

According to the article, Comair is running a 15-year old scheduling
software package from SBS International (www.sbsint.com). The software has
a hard limit of 32,000 schedule changes per month. With all of the bad
weather last week, Comair apparently hit this limit and then was unable to
assign pilots to planes.

It sounds like 16-bit integers are being used in the SBS International
scheduling software to identify transactions. Given that the software is 15
years old, this design decision perhaps was made to save on memory usage.
In retrospect, 16-bit integers were probably not a good choice.

An anonymous message posted to Slashdot the day after Christmas first
described the software failure at Comair:

http://slashdot.org/comments.pl?sid=134005&cid=111 85556

Earlier this year, an overflow of a 32-bit counter in Windows shut down air
traffic control over southern California for 3 hours:

Microsoft server crash nearly causes 800-plane pile-up
http://www.techworld.com/opsys/news/index.cfm?News ID=2275

This problem occurred because of a known design flaw in older versions of
Windows:

http://tinyurl.com/5n9gc

Richard M. Smith
http://www.ComputerBytesMan.com

Re:Bugtraq covered this as well.. (5, Insightful)

dmccarty (152630) | more than 9 years ago | (#11218370)

It sounds like 16-bit integers are being used in the SBS International scheduling software to identify transactions. Given that the software is 15 years old, this design decision perhaps was made to save on memory usage. In retrospect, 16-bit integers were probably not a good choice.

Rubbish. Don't judge yesteryear's programs by today's standards. Back then 4MB RAM cost more than $200. That's how important memory conservation was. In 1989 using an int was a perfectly acceptable choice. If you were programming back then you'd know how loathe programmers were to use longs when they didn't have to. (Granted an unsigned int would've worked better here, but that 64K limit could've also been reached.)

The software spec probably says something to the effect of "Don't attempt to schedule more than 32,767 crew changes." If you're running software that's more than a decade old you need to know what the limits of your software are.

From Another article... (4, Interesting)

bje2 (533276) | more than 9 years ago | (#11218245)

from information week [informationweek.com]

"The computer failure that grounded an airline's entire fleet over the Christmas weekend and stranded thousands of travelers was due to creaky software that couldn't count higher than 32,768." ...

According to the Post, the software -- which tracks all details of crew scheduling, including how long they have flown (an FAA regulation restricts airtime), and logs every change -- has a 16-bit counter that limits the number of changes to 32,768 in any given month. ...

to be fair (although it's not an excuse), but 32K crew changes in a month? that's like 1,000 a day? that's crazy!...

Re:From Another article... (0)

Anonymous Coward | more than 9 years ago | (#11218299)

"32K crew changes per month should be enough for anybody." -- Bill Gates

Re:From Another article... (1)

Weissmohr (165967) | more than 9 years ago | (#11218302)

Well, if it logs all changes to the "I've flown this long" counter, supposedly to make sure not to overstep an FAA restriction, I can imagine that many changes would happen, yes. :-P

Re:From Another article... (1)

Eudaemonic Pie (821484) | more than 9 years ago | (#11218308)

Yes --

Comair has roughly 1,100 flights, with an average of 3 crew members per flight.

That's 3,300 changes in a day if everyone's affected by the weather which is close to 10% of your counter's limit.

Re:From Another article... (5, Funny)

Anonymous Coward | more than 9 years ago | (#11218310)

>... 32K crew changes in a month? that's like 1,000 a day? that's crazy!

You arent by any chance the original developer of this software?

Re:From Another article... (3, Interesting)

mikesmind (689651) | more than 9 years ago | (#11218343)

Legacy systems will often contain such hard limits. Usually, they are buried deep in the code and sometimes no one knows that they exist. Any point where such hard limits exist must be discovered. A solution needs to then be designed for each situation. If you are a manager or a maintainer of such a system, it is your responsibility to do this. When you are questioned, just point out the Comair computer disaster.

Re:From Another article... (0)

Anonymous Coward | more than 9 years ago | (#11218363)

Depends on how it internally handles re-shuffling schedules. If it makes multiple passes, solving at least some of the crew assignment problems on each one untill all assignments are OK (like bubble sort), I can see the number of changes to be tracked growing faster than you might expect. The user would only see that "Joe" was moved from flight 123 to 789, not that the system tenatively assigned him to three or four others before it found one that worked with the global solution for all flights and all crew. Add to that a fluid situation with changing flight delays, and it could get out of hand.

make the change, take it back, and repeat (0)

Anonymous Coward | more than 9 years ago | (#11218448)

If the system is designed that the only way to know if the change will work is to make it, and then if you don't like it take the change back, then I could see how very quickly this could create a problem.

Let's not be too hard.. (4, Interesting)

Staplerh (806722) | more than 9 years ago | (#11218250)

This was a horrible chain of events that severely inconvenienced a lot of people for Christmas, and I would be hoppin' mad if I was in any of their places. However, let's not jump on ComAir too hard, IMHO. From TFA:

"This probably seemed like plenty to the designers, but when the storms hit last week, they caused many, many crew reassignments, and the value of 32,000 was exceeded," he said.

It's true, it was an extreme connection of circumstances... horrid weather (heck, there was snow in some Texas town for the first time in like 80 years or something, read it in some glurge article) coupled with the winter holidays. They should redesign their system and admit that they've grown to a level where their system is unable to hand extreme circumstances, and this should serve as a great wake-up call for them.

In the past I've always chuckled at the thought of 'upgrading for the sake of upgrading', but I suppose this is one case where an earlier upgrade could have saved them millions and made a lot of people's holidays better.

Re:Let's not be too hard.. (0)

Anonymous Coward | more than 9 years ago | (#11218340)

I haven't used a database with a 16bit integer in almost 10 years.

This problem should never have happened in this day and age.

Re:Let's not be too hard.. (1)

j0217995 (597878) | more than 9 years ago | (#11218398)

I agree, but also didn't the stewardess/stewards(?) and bagage handlers for Delta, which ComAir is a part of go on strike the same day as well? I think I read that somewhere

Re:Let's not be too hard.. (2, Informative)

danheskett (178529) | more than 9 years ago | (#11218449)

It was an unoffical job action (aka not a strike) - about 1/3 of the flight crew personell called in sick, or did not show for work.

It was a very, very selfish thing to do - stranding thousands of people on Christmas to complain about pay cuts. Will it be effective? Time will tell...

It shows what unions are all about (0)

Anonymous Coward | more than 9 years ago | (#11218464)

' It was a very, very selfish thing to do '

The "job action" shows the laziness and greed. That's what labor unions are all about.

Re:Let's not be too hard.. (2, Informative)

afidel (530433) | more than 9 years ago | (#11218482)

They HAD outgrown their current system, and they knew it. That's why the new system was scheduled to go online in the next couple months. Unfortunatly they met with a perfect storm of problems just at the wrong time. If you've ever worked with retail you know that NOTHING gets changed from mid November to early January unless god and the CEO both say it has to be so, I imagine airlines are pretty much the same. Heck airlines probably have an even larger freeze window since few people book flights at the last minute for holiday travel.

How old was this software? (1, Interesting)

wiredog (43288) | more than 9 years ago | (#11218251)

I stopped using 16 bit ints for anything 10 (or more) years ago when I had the joy of migrating systems from a 16 bit OS to a 32 bit OS.

Re:How old was this software? (0)

Anonymous Coward | more than 9 years ago | (#11218322)

I learned on a 32-bit OS and never used 16-bit ints. Man, I've made some dumb mistakes, but this one takes the cake.

Re:How old was this software? (2, Insightful)

adlaiff6 (810221) | more than 9 years ago | (#11218489)

The article said it was 15 years old. I guess 16-bit systems are really named for their expiration date.

Re:How old was this software? (1)

Bigby (659157) | more than 9 years ago | (#11218498)

I still use shorts and unsigned shorts (and for measure, chars and unsigned chars) for 'for' loops that are guaranteed to be small. There's no reason to take up 2 extra bytes of memory.

So after Y2K is this ... (4, Funny)

adzoox (615327) | more than 9 years ago | (#11218256)

what Initech handles?

Yeahhhhhh! Mmmmmmkay!

Did you get that memo?

Re:So after Y2K is this ... (1)

bje2 (533276) | more than 9 years ago | (#11218277)

wouldn't "Initrode" handle that stuff now?

Re:So after Y2K is this ... (1)

justkarl (775856) | more than 9 years ago | (#11218445)

You mean, "Penetrode"?

Re:So after Y2K is this ... (1)

bje2 (533276) | more than 9 years ago | (#11218492)

actually, both Initrode and Penetrode were competitors to Initech...i believe Micahel & Samir went to work for Initrode at the end, right?

Hilarious (1)

October_30th (531777) | more than 9 years ago | (#11218274)

"Perhaps Comair should have paid for the software upgrade to MaestroCrew." (in the Simpsons Comic Book Guy voice)

Wasn't it Nic Cage? (1, Funny)

AtariAmarok (451306) | more than 9 years ago | (#11218275)

I thought the Comair crashed when Nicholas Cage steered it into the window of a Las Vegas casino.

Re:Wasn't it Nic Cage? (2, Funny)

Zorilla (791636) | more than 9 years ago | (#11218392)

I knew Delta should have left the bunny alone!

Let's try to remember (5, Funny)

CodeWanker (534624) | more than 9 years ago | (#11218281)

That when you are talking about an airline, a COMPUTER crash is by far the least traumatic kind you can have.

Re:Let's try to remember (2, Funny)

thetroll123 (744259) | more than 9 years ago | (#11218442)

COMPUTER crash is by far the least traumatic kind

What about two of those little baggage carts crashing in the arrivals area? Surely that's even less traumatic?

Heard it here first? (1)

Eudaemonic Pie (821484) | more than 9 years ago | (#11218286)

Heard it here first? Sorry, I heard it first on the web/digest version of comp.risks... http://catless.ncl.ac.uk/Risks Excellent place to read about the risks of modern computing equipment and the risks to society by using same usually from mistakes like the 16 bit counter.

idiots (-1, Offtopic)

Anonymous Coward | more than 9 years ago | (#11218293)

What's up with you idiots and all your dumb jokes?

65535+2 post (1)

bjb (3050) | more than 9 years ago | (#11218297)

It looks like it was a signed 16-bit value from the detail in the article (32,767 maximum schedule changes a month). Why they would possibly need the sign is beyond me. Regardless, having 32767 schedule changes in a month? Must track every flight in the world.

Re:65535+2 post (2, Insightful)

arkanes (521690) | more than 9 years ago | (#11218462)

Hypothetical: There's some function that accepts a crew change and returns either the number of schedule changes to date or an error code. The error code is a negative value. This is a really common paradigm in C code.

Damn you 2s Complement! (2, Insightful)

jellomizer (103300) | more than 9 years ago | (#11218303)

It could have worked if it wern't for the 2s complement they would be good for twice what they had. I think programming languages should make numbers unsigned unless asked that way we can take advantage of that extra bit. For things like counters where negitive numbers just wont happen is like having a 15bit number taking 16bits of space.

Re:Damn you 2s Complement! (1)

ThogScully (589935) | more than 9 years ago | (#11218367)

You could also suggest that changing the default behaviour would be a confusing matter. And programmers worth their salt will probably specify signed or unsigned if it's going to matter or if there's an obviously better choice.
-N

Don't mod me up. Mod up the Zucker brothers. (3, Funny)

AtariAmarok (451306) | more than 9 years ago | (#11218305)

Can't help but remember the scene in one of the "Airplane" movies where a kid sneaks into the hi-tech air traffic control room. He sees one of the airline's shuttle-like planes on a screen, and grabs the nearest joystick and begins to (he thinks) play a videogame.

When the shuttle on the screen blows up, and is accompanied by a very loud explosion sound outside the building, the kid looks sheepish and sneaks away.

Error checking is the real culprit (5, Insightful)

Anonymous Coward | more than 9 years ago | (#11218312)

So it turned out to be problematic to use a signed 16-bit integer.

But the real problem is a lack of error checking. It sounds like the code had something like:

int num_crew_changes; ...
crew_change_list[++num_crew_changes] = blah;

And the counter wrapped and the system crashed.

The code should have said:

if (num_crew_changes == MAXINT)
{
ERROR(E1234, "too many crew changes");
}

The system is still degraded after 32767 crew changes. It might be so degraded as to be unusable. But at least the company would know the extent of the degradation and could pull out the appropriate "Plan B". It's much safer and better to work around a known problem of known scope than to work around a system crash when you don't know the exact problem.

The "Real" culprit... (1)

Nick Driver (238034) | more than 9 years ago | (#11218472)

...is the management of software companies who ceased using real computer scientists to design and write their apps because disposable code monkeys work for so much cheaper. And outsourcing is even cheaper... in the short term (which seems to be all that matters in this industry anymore)

Hire white or pay the price. (0)

Anonymous Coward | more than 9 years ago | (#11218484)

' is the management of software companies who ceased using real computer scientists ....monkeys work for so much cheaper. And outsourcing is even cheaper '

Yeah, you got it. Programming should be left to white Americans!

Hmmmm.... Maybe I should edit my software (0)

Anonymous Coward | more than 9 years ago | (#11218313)

I wrote a small POS system for multiple business in my city. It's using a digit that has the same limit to count number of transactions in a day.

My thinking behind it is that a transaction takes, on average, about 30s to complete, start to finish. There are 86400 seconds in a day (24h * 60m * 60s), or 2880 "groups" of 30 seconds (prev math / 30).

As such, there wouldn't be a case where I'd hit the magic 32767 limit. Maybe I should remove it anyway though. Suggestions?

Re:Hmmmm.... Maybe I should edit my software (1)

Jamesie (615784) | more than 9 years ago | (#11218409)

I haven't used 16 bit integers since the early 90's, on a modern pc with the number of transactions you are talking about the change to 32 bit would do no harm and If you reuse the code to record greater numbers it will already be safer (unless your numbers are greater than a few billion).

It's times like this... (4, Funny)

AtariAmarok (451306) | more than 9 years ago | (#11218315)

It's times like this when you begin to realize that the Vic-20 (duct-taped to the bulletin board and surrounded by haywires) might not be the best choice anymore as mission-critical hub of your operations.

Re:It's times like this... (1)

tomstdenis (446163) | more than 9 years ago | (#11218379)

Oh but didn't you hear? The airline industry is bankrupt. ... Bankrupt to the CEO's million dollar tropical mansion that is.

If my seat literally costs $300 and they sell it to me for $225 that's just stupid. Even if I only fly that airline it still COSTS THEM MONEY to fly me. ...

So why don't airlines just charge what it actually costs to fly. The others who don't will die anyways because well they're losing money with each "heart, mind and body" they "win" with lower prices.

As for this particular bug it's a matter of being cheap. They could spend the time and money to fix it [say like a decade ago] but instead they wait until it causes trouble and then spend 10x to fix it in a shorter amount of time.

Go people with MBAs!!!

Teh Winner is you!

Tom

Re:It's times like this... (0)

Anonymous Coward | more than 9 years ago | (#11218404)

Apparently, you've never heard of the concept of competition.

Re:It's times like this... (3, Insightful)

tomstdenis (446163) | more than 9 years ago | (#11218440)

Ok dude do the math.

A sells tickets for $0 loss.
B sells tickets for $75 loss.

B gains many customers. However, the more customers the more loss they incur. Recall EVERY SEAT costs them $75. Eventually B just runs out of money and ups the costs.

Now A and B sell at the same cost. Customers notice the price hike and get upset [because for some reason people think air travel is a god given right so they get insanely upset at everything].

Sure some won-over customers will stay with B but many will spread out [many are also not particularly loyal they just use whatever cheaptickets.com tells them to].

Tell me I'm wrong. Tell me that most airlines haven't been filing for protection. Come on, tell me ;-)

Tom

Re:It's times like this... (0)

Anonymous Coward | more than 9 years ago | (#11218493)

Your understanding of economy is primitive. If you HAVE a facility just sitting there, you are often making a total loss on it by not using it. You make less of a loss by operating the plane at a loss.

Plane not flying at all: money lost = 100%
Plane flying at a loss: money lost 100%

This is why factory closures don't make sense unless the factory is being closed to be sold to someone else. Those closures where the facility is shut and mothballed and the workers fired, but the asset is still kept around, make very little economic sense - you're just getting less and less return on capital employed keeping the damn thing around.

Re:It's times like this... (0)

Anonymous Coward | more than 9 years ago | (#11218507)

ARGH! that should obviously be money lost less than (<) 100% in the plane flying at a loss case - why is it called "plain old text", if it's going to be interpreted as wonky pseudo-HTML?

New CIO? (3, Interesting)

Mr. BS (788514) | more than 9 years ago | (#11218319)

I wonder how fast this CIO is going to be on his butt.

"Well... we were holistically mitigating our financial stance outside the box of current processes while try to forcast our future technological stability within the transport industry."

"Well... you're fired! NEXT?!

Re:New CIO? Better than SCO. (1)

One of the abnormals (817423) | more than 9 years ago | (#11218427)

Better than Mr. Darl McBride. He would've said something like...

"Well... we were holistically mitigating our funds so we would have the money to sue the pants off every person who has ever ridden our planes, while trying to forcast how we plan to make a profit....."

Once did IT support for Comair (3, Informative)

Anonymous Coward | more than 9 years ago | (#11218335)

Having once done tech support for the Maestro program used by Comair (and other scheduling software for other airlines as well), I think the software is junk. The employees undoubtedly said "I told you so!" when it broke, because they hated it as much as the support team did. IMO the airline didn't bother upgrading because they didn't think the old version was broken enough or outdated enough to warrant it.

unsigned (1)

lophophore (4087) | more than 9 years ago | (#11218338)

Hmmm.

why would anybody make ae event counter a signed value?

short numberScheduleChanges;

hello?

unsigned short numberScheduleChanges;

fixes the problem.

Playing with fire (2, Insightful)

ravingidiot (798346) | more than 9 years ago | (#11218369)

Why was conair using signed shorts to track their scheduling changes anyway? It seems to me that a company of that magnitude should expect to run into more than 32000 schedule changes within one month more than once. I mean, I can understand that the counter was probably designed with space constraints in mind, but for christs sake, it would've only only been two extra bytes to fix this. That brings the total up to some 4 billion unsigned if I'm not mistaken. Technically, they could've used just three bytes, but then again, I wouldn't expect them to because how many languages have 24bit integers built in as primitives? Of course like someone else said, I guess we can't blame this all on the programmers either. I wouldn't just consider it very comforting that such a system could become crippled just because the programmers didn't think to allocate enough memory to allow for enough flexibility in scheduling.

Maestro sucks. (4, Informative)

Anonymous Coward | more than 9 years ago | (#11218373)

Maybe Maestro should just die. My friend is a flight attendant for Southwest and has to use Maestro to plan her schedule. To use it she has to citrix into their main server and wait for an open client (I assume they have either a license or horrible programming restriction on concurrent users). On the very day that the new schedules are posted, it can take hours to log in. It's a joke.

This stuff could be handled by a team of a dozen web based programmers (Java? C? ASP? LAMP? You pick.) in a few months. It's not difficult.

The other Maestro (1)

AtariAmarok (451306) | more than 9 years ago | (#11218417)

When Mike Oldfield recently came out with his Maestro game [sean.co.uk] , I did wonder that the "Maestro" name was not used for software before. Now I know better.

Not only that, a lot of Oldfield's game involves piloting glider- or airplane-like avatars.

/. acquired by The Sun (UK) ? (1)

rokzy (687636) | more than 9 years ago | (#11218421)

seems possible with the ever-decreasing quality of journalism seen here...

"yeah what 'appened right was this computer-me-thingy went all pear-shaped and these silly buggers Comair went and got themselves 'Done In'..."

ComAir Now Hiring IT People (5, Funny)

JavaDev04 (844747) | more than 9 years ago | (#11218428)

Hey everybody! Comair is hiring Unix System Administrators and IT Software Engineers! http://www.comair.com/hr/other/ [comair.com]

There was a high profile example of this problem (4, Interesting)

hey! (33014) | more than 9 years ago | (#11218488)

back in the early 80's. There was a big financial company that had an automated system that watched the prices of certain commodities and issued automated trade orders. The transactions where stored in arrays addressed by 16 bit signed integers, with the (now) highly predictable result on the first day that trading volume exceeded 16384 transactions. Since in C arrays are just syntactic sugar for pointer arithmetic, the system started executing trades based on "data" from random bits of heap memory. This apprently went on for some time before a human being figured out something had gone wrong, and (reportedly) the company lost billions in a single day. This might be somewhat exaggerated, since the event now has passed into folklore.

In any case, this is one of those incidents like the Therac-25 accidents that experienced programmers should always have in mind.

Load More Comments
Slashdot Login

Need an Account?

Forgot your password?

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>