Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Dublin Air Traffic Control Brought Down By Faulty NIC

timothy posted more than 6 years ago | from the can-go-wrong-can-go-wrong-nothing-can dept.

Bug 203

Not so very long ago after passengers were left hanging by a similar glitch at LAX, Gilby4mPuck writes with another story of NIC failure leading to a disruption of air traffic, this time in Ireland, excerpting: "Data showing the location, height and speed of approaching planes disappeared from screens for 10 minutes each time. ... Thales ATM stated that in 10 similar air traffic control Centres worldwide with over 500,000 flight hours (50 years), this is the first time an incident of this type has been reported. ... '[They] confirmed the root cause of the hardware system malfunction as an intermittent malfunctioning network card which consequently overcame the built-in system redundancy,' said an IAA spokeswoman."

cancel ×

203 comments

Sorry! There are no comments related to the filter you selected.

testing and QA (0, Redundant)

hostyle (773991) | more than 6 years ago | (#24238867)

Whatever happened to testing of installed hardware? You'd think they might csider that sort of thing important when it involves the lives of thousands of people. Then again, maybe they were drunk at the time.

Re:testing and QA (4, Insightful)

Thanshin (1188877) | more than 6 years ago | (#24238935)

Testing doesn't confer prescience.

Re:testing and QA (5, Funny)

Hal_Porter (817932) | more than 6 years ago | (#24239273)

Only The Spice confers prescience.

Re:testing and QA (1)

Thanshin (1188877) | more than 6 years ago | (#24239501)

Only The Spice confers prescience.

Actually, it confers "the ability to fold space. That is, travel to any part of universe without moving."

Well, at least they got the "not moving" part right.

Re:testing and QA (2, Informative)

putaro (235078) | more than 6 years ago | (#24239677)

That was in the movie. Read the book, it's much better.

Re:testing and QA (3, Informative)

david.given (6740) | more than 6 years ago | (#24239729)

Actually, it confers "the ability to fold space. That is, travel to any part of universe without moving."

Actually actually, the space folding is done using the Holtzman drive, which is a perfectly ordinary machine. The Navigator merely navigates, plotting a safe path through the non-space/time foldspace. The spice grants the Navigator the limited prescience required to do this.

Eventually the Navigators become obsolete, replaced by Ixian semisentient machines known as Compilers that perform the same task without needing melange. A good thing too, because by that point Arrakis is rubble and sandworms are pretty much extinct.

Details courtesy of Wikipedia (and my lack of a social life).

Re:testing and QA (1)

TheSunborn (68004) | more than 6 years ago | (#24239761)

Not according to the book. According to the book the spice is needed to predict what will happen when you arrive, that is: To ensure that you don't arrive inside a planet, or other dangerous place. The point is that when you do something that amount to traveling faster then light, the only way to know anything about where you arrive, is to predict the future.

This is also a big reason, that the guild newer took over Dune. They were so conditioned to always seek the safe path(Because that was what their ships needed) that they could not imagine starting an operation that was not know to be 100% safe for the guild.

(Only slightly offtopic)

Re:testing and QA (1)

ebolaZaireRules (987875) | more than 6 years ago | (#24239931)

I'm sure that its not merely the destination, but also the journey that needs to be known beforehand... 1 in 10 'disappeared', not ended up 1/2 parked inside of an asteroid.

Re:testing and QA (2, Interesting)

zach_d (782013) | more than 6 years ago | (#24238937)

I think the issue is one of maintenance. things need to be replaced after their life-cycle is over, even if they seem to be functioning at the time.

Re:testing and QA (5, Insightful)

HungryHobo (1314109) | more than 6 years ago | (#24239207)

I'm inclined to trust the card which has been working fine for 5 years over a card which was put in yesterday.

Re:testing and QA (3, Interesting)

zach_d (782013) | more than 6 years ago | (#24239257)

in a high noise/vibration/dust environment?

Re:testing and QA (1)

leuk_he (194174) | more than 6 years ago | (#24239269)

If it works after 5 years. sure.

If not there always is a backup. isn't there? Well, in that case there is a backup of the backup.

Re:testing and QA (3, Insightful)

DaedalusHKX (660194) | more than 6 years ago | (#24239287)

The article says "it overcame the built in system redundancy"... how the hell does ONE failing card in a redundant setup "overcome" the redundant backup parts/systems ??

I call "CYA kissass excuse maker" to the stand!

Someone screwed up big, and they're Covering Their Asses now.

Re:testing and QA (4, Insightful)

mowall (865642) | more than 6 years ago | (#24239653)

The article says "it overcame the built in system redundancy"... how the hell does ONE failing card in a redundant setup "overcome" the redundant backup parts/systems ??

I suspect it's because, as mentioned in the summary, it was "an intermittent malfunctioning network card". i.e. the failover system must have thought the card was functioning.

Re:testing and QA (3, Insightful)

methamorph (950510) | more than 6 years ago | (#24240025)

The article says "it overcame the built in system redundancy"... how the hell does ONE failing card in a redundant setup "overcome" the redundant backup parts/systems ??

If the card had failed completely the redundant one would probably have kicked in. What I think happened is the card malfunctioned in a way causing the system to still think that the card is fine and there is no need for the redundant one to kick in.

Re:testing and QA (1)

witherstaff (713820) | more than 6 years ago | (#24240063)

I recall some DEC NICS that when they started to fail, all got the same MAC. Talk about a fun thing to troubleshoot on a network! If it was a plain old switch using MAC switching, you can cause havoc pretty easily.

Re:testing and QA (2, Informative)

diskis (221264) | more than 6 years ago | (#24239419)

Air traffice towers generally are not noisy or dusty. And in any case, disregarding the ports, the NIC card itself is practically eternal. Compared to the rest of the system, and the lifetime of the system that is.

Two lessons learned from years of technical support. The NIC isn't broken, unless the computer has been dragged from the network cable. And that the CPU is not broken as long as the system has not been overclocked, and the heatsink is still in place.

Re:testing and QA (1)

xiox (66483) | more than 6 years ago | (#24239571)

I always thought that CPUs could never be broken. We had an Athlon 64 processor 4600+, it was never overclocked and always used with a standard fan/heatsink, in a well ventilated case. After a year of work, it then started randomly crashing every few weeks. Replacing all the components except the CPU didn't fix the problem (different motherboard, memory, etc). Replacing the CPU did fix the problem. They can die randomly but it is very rare.

Re:testing and QA (1)

somersault (912633) | more than 6 years ago | (#24239811)

If you didn't apply the heatsink yourself, you don't know if it's been done correctly. On my first PC that I bought with my own money (well, half bought and my dad paid the rest), it kept locking up randomly, and after lots of IRQ and driver troubleshooting my dad removed the heatsink only to find that they hadn't applied it correctly. One reapplication of thermal paste and proper connection to the CPU later, and everything was fine (until that system got messed up in a lightning storm a few years later, but I still use the case when building my own machines)

Re:testing and QA (1)

xiox (66483) | more than 6 years ago | (#24239947)

We did - we applied the heatsink several times, when we moved the CPU between different motherboards. Proper thermal transfer compound was used. The temperature of the CPU was fine.

Re:testing and QA (1)

LiquidCoooled (634315) | more than 6 years ago | (#24239873)

CPUs with correctly seated heatsinks which stay within their prime operating temperature usually have no problems.
However its rather easy to get the wrong amount of goop or something else wrong with the airflow, or just a marginal chip, etc

I had an AMD t'bird 1.4ghz which would NOT run happily at 1.4ghz no matter how much I tried.
In the end I gave up and ran it happily at 1.33 for years.

Re:testing and QA (4, Informative)

MortenLJ (686173) | more than 6 years ago | (#24238959)

The possiblity of failure can be reduced, but never completely removed. It's a simple matter of probabilities. E.g. a certain component fails on any day with probability p, if we add n redunndant fail-overs, the total system will fail with probability 1-p^n, an equation which will never be one, but it can get close.

Re:testing and QA (1)

Hal_Porter (817932) | more than 6 years ago | (#24239289)

This is very true. I worked on system where we had lots of redundancy in critical places. But given enough tests sometimes the bugs would get through, usually in ways that you hadn't thought of.

Re:testing and QA (5, Funny)

Hognoxious (631665) | more than 6 years ago | (#24239383)

if we add n redunndant fail-overs, the total system will fail with probability 1-p^n

Any number raised to the power 0 is 1. So if you don't install anything, hence n is 0, it will always work since the probability of failure is 1-1 = 0.

Re:testing and QA (1)

somersault (912633) | more than 6 years ago | (#24239821)

Sometimes, pure intuition can be more handy than maths.

No system, no failure (1)

duyn (1178197) | more than 6 years ago | (#24239955)

G-GP:

if we add n redunndant[sic] fail-overs, the total system will fail with probability 1-p^n

GP:

Any number raised to the power 0 is 1. So if you don't install anything, hence n is 0, it will always work since the probability of failure is 1-1 = 0.

P:

Sometimes, pure intuition can be more handy than maths.

Only if you're not good at the math.

The way the G-GP described the system, the number of redundant fail-overs includes the primary system. With n=0, you have no system in place. No system, no possibility of system failure.

Re:testing and QA (2, Insightful)

TheThiefMaster (992038) | more than 6 years ago | (#24239397)

If one fails with probability p, and you have n of them, a total system failure is probability p^n, not 1-p^n. Well technically it's Mult(p,1->n) where p1 is the probability of the first failing, p2 the probability of the second, etc, multiplying them all together to get the chance of a total system failure.
The probability of any one device in a redundant system failing is (1-((1-p)^n)). This equation rapidly approaches 1, so in larger setups failures will be a common occurrence, but they'll largely be harmless due to redundancy.

Of course this all assumes the failure mode of the device is "off" or "non-functioning". If it fails in a way which routes 15A of mains power into a network cable, redundancy might not help a whole lot.

Obviously that's not what happened, but it's not outside possibility for one device to take down an entire redundant system.

Re:testing and QA (0)

Anonymous Coward | more than 6 years ago | (#24239531)

if we add n redunndant fail-overs,

if we have n redundant fail-overs

the total system will fail with probability 1-p^n

the probability for a fail with n redundant fail-overs is

p^n

no?

Re:testing and QA (5, Funny)

tinkertim (918832) | more than 6 years ago | (#24239003)

Whatever happened to testing of installed hardware? You'd think they might csider that sort of thing important when it involves the lives of thousands of people. Then again, maybe they were drunk at the time.

Well, when we set up some cheap NAS boxes with redundant nics .. some load balancers and other goodies .. we tested it by yanking cables on the bonded nics and making sure everything still worked.

This was for an e-commerce site.. I would agree in hoping more testing with real failures would be done on systems that monitor air traffic.

Also, we were very drunk when yanking cables during our test .. so I don't think intoxication is really a factor. In fact, turning a drunken monkey loose in a data center with a clearance to pull cables is _very_ good fail over testing :)

Re:testing and QA (5, Insightful)

kitgerrits (1034262) | more than 6 years ago | (#24239213)

The problem is not that redundancy wasn't implemented.
The problem is that redundancy doesn't handle 'flapping' hardware very well.
The NIC intermittently failed, causing the redundancy to switch cards several times.
This can play havoc on systems that work on a LAN and assume the MAC address to stay the same.
Also, a NIC that does not report an error, doesn't fail completely and simply swaps a few bits around can be nigh-on impossible to diagnose.

This could have been caught with real-time hardware and log-monitoring, but I have to confess even I only check the logs daily, not real-time. While some monitoring systems can mail the admin in the event of failure, not all systems are usually configured that way ('workstations' being a prime candidate).

There is a line you draw between monitoring and cost-effectiveness. Every company takes a claculated risk in this and they got bitten.

Re:testing and QA (2, Insightful)

tinkertim (918832) | more than 6 years ago | (#24239381)

The problem is not that redundancy wasn't implemented. The problem is that redundancy doesn't handle 'flapping' hardware very well.
The NIC intermittently failed, causing the redundancy to switch cards several times.
This can play havoc on systems that work on a LAN and assume the MAC address to stay the same.

That's what got me curious, it looked like they were using takeover instead of bonding devices.

The most well engineered system in the world can not hope to escape a ~9 minute ARP cache upstream, which makes me wonder why it was designed the way that it was.

I'm not thinking in an antagonistic sense, I'm more wondering what changed in the network _after_ the system was deployed.

Re:testing and QA (1)

kitgerrits (1034262) | more than 6 years ago | (#24239803)

Indeed.

I've seen Network Bonding in RedHat Enterprise Linux with HP hardware use a 'fake' MAC address that is bound to several interfaces to avoid just this problem.
Unfortunately, it may confuse the switch it is connected to, because of said ARP cache (CAM table, ours was 16 hours).

Really-HA systems require genuine engineers with tons of real-life experience, just to know what bits work and what bits you want to avoid.
I hope to become one, one day ;-)

Re:testing and QA (2, Informative)

Phroggy (441) | more than 6 years ago | (#24239441)

The problem is, NICs can fail in all kinds of ways that yanking cables won't simulate. In this case it sounds like if they had yanked the cable, the backup system would have come online exactly like it was supposed to, but because the faulty NIC was kinda-sorta-almost-but-not-really working, it didn't. That's a difficult thing to test in the lab.

Re:testing and QA (1)

HJED (1304957) | more than 6 years ago | (#24239595)

Whatever happened to testing of installed hardware? You'd think they might csider that sort of thing important when it involves the lives of thousands of people. Then again, maybe they were drunk at the time.

Well, when we set up some cheap NAS boxes with redundant nics .. some load balancers and other goodies .. we tested it by yanking cables on the bonded nics and making sure everything still worked.

This was for an e-commerce site.. I would agree in hoping more testing with real failures would be done on systems that monitor air traffic.

Also, we were very drunk when yanking cables during our test .. so I don't think intoxication is really a factor. In fact, turning a drunken monkey loose in a data center with a clearance to pull cables is _very_ good fail over testing :)

it is point less to destroy a system testing it unless you have a big buget and can rebuild the system exactly the same
intoxcated monkey destroying data center != testing
intoxcated monkey destroying data center == (waste_of_money && destroying_data_center)
destroying_data_center == waste_of(time && money)
destorying_data_center != testing

Re:testing and QA (5, Insightful)

mcrbids (148650) | more than 6 years ago | (#24239027)

The very best planned of redundant systems can be brought to its knees by hardware that "mostly works".

It's not hard to have system B check that system A is on/off line, and step in if the latter is the case. But what happens when A is *mostly* or *sorta* online? Does system B check that ALL functionality done by A is being done appropriately? Almost never.

And that's why, even in the best, most carefully designed, fully redundant high-availability systems, you never, ever see 100% uptime. It's just not possible to anticipate everything that can go wrong.

So design a system that fails gracefully! That's what nature did.

Take a look at your own body. It's a gorgeous example of a high-availability, high-redundancy system. There are literally BILLIONS of cells in your body, each operating as a semi-independent unit, such that any of them can fail without bringing down the whole, or even affecting it noticeably. Your body is an excellent example of a cheap, redundant, high-availability system.

Yet catastrophic failures still occur. Whether by cancer, diabetes, or heart disease, even a well-designed, tested-for-millions-of-years high-redundancy system with billions of individual, replaceable parts fails catastrophically from time to time.

It's the nature of the beast.

Mother nature has compensated by making not only the system redundant, but the need for the system also redundant. Rapid reproduction is nature's friend! Not just redundancy, but redundant redundancy.

High availability - it's much, much, MUCH harder than you thought.

Re:testing and QA (2, Interesting)

seifried (12921) | more than 6 years ago | (#24239315)

Yes but if _one_ NIC can bring the entire system down what other single failures in a component could bring the entire system down? Obviously the system with the malfunctioning NIC can do any number of things that may result in a similar failure mode. Or what happens if the network switch it is attached to fails (I assume they use multiple paths... but if one nic can nuke it all, imagine if a switch went bonkers).

Re:testing and QA (4, Interesting)

jimicus (737525) | more than 6 years ago | (#24239695)

Yes but if _one_ NIC can bring the entire system down what other single failures in a component could bring the entire system down? Obviously the system with the malfunctioning NIC can do any number of things that may result in a similar failure mode. Or what happens if the network switch it is attached to fails (I assume they use multiple paths... but if one nic can nuke it all, imagine if a switch went bonkers).

You don't need to bring the entire system down to cause havoc. What if there's a hitherto unknown bug in one of the CPUs which under some very specific set of circumstances causes aircraft altitude to be misreported on the operator's screen? As the GP said, most redundant systems only ensure that the components appear to be broadly working. They seldom check that all the components are doing something sensible.

Re:testing and QA (2, Insightful)

Anonymous Coward | more than 6 years ago | (#24239369)

-- "The very best planned of redundant systems can be brought to its knees by hardware that "mostly works"."--

--NO, you are wrong there. What this indicates is that someone skimped. Techniques for processing and getting reliable signals through systems that only mostly work are very well known and used routinely. What this event means is that someone, either explicitly or implicitly assumed that NICs are binary - they either work or they don't, and designed accordingly.

What should have been used is multiple (more than two) parallel simultaneous communication paths with comparison voting at the far end to determine if the information received can be regarded as valid or not. Assuming that a NIC will fail gracefully is so boneheaded that in a safety critical application like this someone could likely be prosecuted for negligence.

Unfortunately, using the right techniques is expensive. Given that systems like this are provided by tender processes that favour low-bidders, it is not surprising that problems appear.

Of course, somebody may have done a cost-benefit analysis and decided the risk of one (or several) aircraft accidents didn't merit the extra expenditure. That's unlikely to be publicised 'though - although it would be a correct calculation to run.

As for testing - well running around pulling cables out at random doesn't really do it. Unplugging and plugging cables at various frequencies/intervals, swapping cables, plugging them into incorrect sockets, injecting noise, dropping the voltage on the power supply, overvoltage/spikes are all things that could and should be done - and in some cases mathematical formal proof that the system will work as required. All of this (and more) is done for safety critical applications.

Re:testing and QA (1)

mcrbids (148650) | more than 6 years ago | (#24239479)

And this is where the "cheap" part of my comment "cheap, redundant, high-availability system" comes into play.

See, the likelihood of failure in a redundant system goes *up* as the number of units increases. But as the number of units in a redundant system increases, the likelihood of a *complete* failure drops to a number never equalling zero. In other words, no matter how much redundancy you build in, you'll never achieve zero downtime over the long haul.

The human body achieves zero downtime over a few decades in many cases, and close to 5 nines (%99.999) over 6-7 decades in most cases. This is very, very good uptime and is very noteworthy, but requires BILLIONS of redundant units and expensive external intervention (AKA the "hospital") to achieve.

You'll never get 100%. So get off it, already. Instead, prepare for the 1% to 0.1% of downtime and call it a day!

Re:testing and QA (1)

diskis (221264) | more than 6 years ago | (#24239445)

> But what happens when A is *mostly* or *sorta* online?

You have dual NICs on system A, with a etherkiller [google.com] connected to the second card.

When B takes over, it then can make sure that A stays down :)

Re:testing and QA (0)

Anonymous Coward | more than 6 years ago | (#24239853)

My method is foolproof: I do processing with millionfold redundantly. I partition the nodes into segmented networks and when the results within a segment disagree (which is the usual case) I go with the plurality for that segment. The individual segments then get weighted scores proportionate to the number of units in that segment and finally the computation is performed by averaging the result of each segment weighted by the percentage of all nodes that are in that segment.

This computation is then compared (for historical reasons) with a separate result computed by nine "supreme" nodes, and finally I return the latter value, ignoring the former.

Foolproof, I tell you!

Re:testing and QA (1)

boaworm (180781) | more than 6 years ago | (#24239071)

Quite likely it did work at the time of FAT, SAT, Shadow operation and when going into live operation.

If it breaks down later on is another issue, that's not possible to test for beforehand. Isn't that pretty obvious? It is like testing a car to see if it will ever be in an accident. You sir, are the drunk one :)

Re:testing and QA (2, Interesting)

Valehru (1021601) | more than 6 years ago | (#24239197)

I had an engineer stuck in Germany for three days due to this stupidity. He got his fill of beer, good hotel rooms and sightseeing done, so in his mind it was a decent holiday. The insane thing was that this issue happened before a few weeks earlier, there was an investigation however it did not discover then faulty NIC then either.

Re:testing and QA (0)

Anonymous Coward | more than 6 years ago | (#24239497)

Yes, because as soon as we get up in the morning before we even have breakfast we have a pint of Guinness and by the time we get to work we can barely even stand, never mind operate a radar system. Then again it might have been because we left the job up to the leprechauns who were to foucsed on protecting their crock of gold.

My Own Meat Tastes Good - Balls Deep In Momma (-1, Offtopic)

Anonymous Coward | more than 6 years ago | (#24238917)

I can suck the clown paint off my own dick, bitch, just you watch..

yeah, that's good stuff!

Don't throw momma from the train, I'm goin balls deep in that snatch!

MOD PARENT UP (-1, Offtopic)

Anonymous Coward | more than 6 years ago | (#24238943)

It's more insightful than most of the shit here.

Re:MOD PARENT UP (-1, Offtopic)

oodaloop (1229816) | more than 6 years ago | (#24238981)

No, mod THIS parent up. That was way more insightful.

The way the summary reads (0, Redundant)

Centurix (249778) | more than 6 years ago | (#24238951)

Makes it sound like the NIC was fighting against The Man! Go NIC!

There's only one way to solve this (5, Funny)

anomnomnomymous (1321267) | more than 6 years ago | (#24238955)

Put all those NIC's on the terror watchlist!

Re:There's only one way to solve this (1)

eclectro (227083) | more than 6 years ago | (#24239063)

Put all those NIC's on the terror watchlist!

Why would anyone listen to you? Somebody who was just put on the terror watchlist by a bad NIC.

Re:There's only one way to solve this (-1, Offtopic)

Anonymous Coward | more than 6 years ago | (#24239347)

With using several kinds of ne2000 NIC's be damned sure THEY WILL BE putting them on the list (well, at least some of them then)!

Re:There's only one way to solve this (0)

Anonymous Coward | more than 6 years ago | (#24239789)

You joke, but we really should put the manufacturer on our personal list. It's a NIC! We should be able to rely on it. This is like paper turning black if you leave it in sunlight. Fuck them. Anyone know who the vender is?

Last time (-1, Troll)

Anonymous Coward | more than 6 years ago | (#24238977)

The last time I was on a flight which stopped in Dublin, we were delayed hours while little customs officials in green suits searched everyones bags. Apparently the Prime Minister had lost his Lucky Charms.

Re:Last time (1, Redundant)

HungryHobo (1314109) | more than 6 years ago | (#24239233)

we don't have a prime minister and I'm fairly sure customs don't wear green.

Re:Last time (0, Redundant)

JohnHegarty (453016) | more than 6 years ago | (#24239263)

Any I am not entirely sure what a Lucky Charm is ...

Re:Last time (0)

Anonymous Coward | more than 6 years ago | (#24239649)

Outside ireland, lucky charms [wikipedia.org] is an american-style (sickly sweet and artificial, featuring sort of freeze-dried marshmallow lumps) breakfast cereal, with a hollywood-irish-accented mascot "lucky the leprechaun" (akin to "tony the tiger" from "frosties"), with a surrounding advertising campaign [youtube.com] that could be considered vaguely offensive (on grounds of nauseating cutesyness if nothing else), at least if irish people were excessively thin-skinned (fortunately, they're generally not, and since they're also white-skinned people in america probably wouldn't care if they were upset anyway). It's not hugely offensive, it's not "Bloody Sunday Breakfast Snacks" or something, but all the same, it's not sold in the Republic of Ireland, and was withdrawn from the UK market fairly rapidly upon introduction.

As part of that campaign, the leprechaun obsessively worries about people trying to take his lucky charms, or at least used to, now he just seems to be resigned to thieving kids running off with it.

Re:Last time (-1, Redundant)

Anonymous Coward | more than 6 years ago | (#24239387)

Well, a taoiseach is a prime minister in all but name.

Re:Last time (0)

Anonymous Coward | more than 6 years ago | (#24239403)

we don't have a prime minister and I'm fairly sure customs don't wear green.

Lordy, someone's lost the craic! Quick get this man a pint of the black stuff and a plate of annoying stereotype! /me dances a jig to up-beat folk music.

More seriously: had I said Taoiseach the joke would have flown over everyones heads. Secondly taoiseach is just Irish for prime minister, so get off your high horse. Thirdly, I can exploit Irish stereotypes for a cheap joke, because I am Irish. Personally my favourite examples of Irish stereotypes are the Monkey Dust sketches: Diary of Anne Frank [youtube.com] and The Crusades [youtube.com] , always worth a plug.

Re:Last time (0)

Anonymous Coward | more than 6 years ago | (#24239503)

Customs do wear green, but it is hard to prove since the camera does not manage to capture them.

Beside that, it is not a area where photograping is allowed. If you do so anyway the X-ray machine will wipe your camera (It is not supposed to do that, but it will do anyway!)

redundant? (0)

Anonymous Coward | more than 6 years ago | (#24238993)

if they were really smart they would have two separate machines dedicated to this information for more redundancy.

Goatse (-1, Offtopic)

Anonymous Coward | more than 6 years ago | (#24239007)

Goatse [goatse.ch]

More scary stories. (2, Interesting)

rixster_uk (1216414) | more than 6 years ago | (#24239069)

People - I am trying to collect airport related scary stories. I haven't got many yet but if you have some then please let me know - you can email me at admin@scareports.com or just visit the site (blatant pimping) here [scareports.com] .

Intermittent problems are the worst (2, Interesting)

niks42 (768188) | more than 6 years ago | (#24239101)

I'd have to have some sympathy that it was an intermittent problem. They can really cause confusion to automated systems that are designed to cope with hard failures. I've had many occasions in my latter career in Service Delivery and support where it's taken human conviction to sort out issues caused by the cluster software trying to cope with intermittent connections

ten minutes (1)

Iamthecheese (1264298) | more than 6 years ago | (#24239157)

Ten minutes at a time? That doesn't sound like a "mostly broken" problem to me, that sounds like a 10 minute fail-over time. Shit happens, but if it takes you 10 minutes for your stuff to automatically start working again you're doing it wrong, especially since its all int one data center. And whatever hapened to redundant off-site systems? New law: As a conversation progresses, the chance of someone saying "terrorist" approaches 100%

Re:ten minutes (5, Informative)

wintermute000 (928348) | more than 6 years ago | (#24239249)

there are plenty of examples of 10 minute failover

Older cisco ATAs take 10 minutes to swing onto SRST if keepalives are lost to the callmanager cluster.

a complex routing protocol refresh (big BGP networks) can take many minutes

a faulty NIC can easily bring down a LAN segment, with or without redundant switching paths - and it makes it look like a router failure as the router overloads trying to deal with the broadcast storm

NICtzche (3, Funny)

cornjchob (514035) | more than 6 years ago | (#24239125)

if this piece of hardware was capable of "overc[oming] the built-in system redundancy", perhaps its ilk ought to be patrolling the transistorized wunderplatz of interconnected morsels governing our most hubris means of transportation? I, for one, would certainly feel safer.

Well its a step above the old AppleTalk (2, Interesting)

LM741N (258038) | more than 6 years ago | (#24239155)

When I was administering a small network in Marin, every time we had a small earthquake, all of the AppleTalk connectors would come loose. Took hours to find the faults and push them together. I guess we should have used duct tape.

I suppose at an airport as each jet came in creating vibrations, those same connectors would have dislodged.

It's a success story. (4, Funny)

Farmer Tim (530755) | more than 6 years ago | (#24239281)

"...an intermittent malfunctioning network card which consequently overcame the built-in system redundancy"

But it's one of the lucky ones.

Every year, thousands of NICs fall victim to built-in system redundancy; if you know a card whose activity indicators are darkened and lifeless, it may have a redundancy problem. With your support and donations, we at Ethernetics Anonymous can help more network cards beat the scourge of built-in system redundancy, and make them feel like a useful part of society again.

My idea of fault tolerance (1)

Bromskloss (750445) | more than 6 years ago | (#24239283)

in this case would be the ability to run air traffic control without all those fancy computrons, should the need arise.

Re:My idea of fault tolerance (2, Insightful)

a_real_bast... (1305351) | more than 6 years ago | (#24239365)

Unfortunately, this NIC's fault showed up as the radar not working. What were they supposed to fail-over to? Binoculars?

Re:My idea of fault tolerance (2, Interesting)

Bromskloss (750445) | more than 6 years ago | (#24239475)

Unfortunately, this NIC's fault showed up as the radar not working. What were they supposed to fail-over to? Binoculars?

I suppose so, if it's possible to do it that way. Also, have the planes do the old-fashioned "circle the airport and keep an eye out for other traffic" if that works with big, heavy planes. It sure gives you (the pilot) a nice sense of being a free and sovereign person anyway, like on small airfields. :-)

Re:My idea of fault tolerance (1)

a_real_bast... (1305351) | more than 6 years ago | (#24239525)

And gives the Comptroller a fit about the extra fuel expenditure? (",)

Re:My idea of fault tolerance (1)

clickety6 (141178) | more than 6 years ago | (#24239505)

[quote]What were they supposed to fail-over to? Binoculars?[/quote]

And a giant relief model of the airport with young ladies pushing around little model aircraft with billiard cues. And a big glass panel with people marking up aircraft positions with wax crayons.

Re:My idea of fault tolerance (1)

zmollusc (763634) | more than 6 years ago | (#24240017)

That is the stupidest plan ever, it is snooker cue _rests_ with which the ladies push the little model aircraft around.

In the queue (3, Funny)

davew (820) | more than 6 years ago | (#24239301)

I was due to fly the evening it all went wrong. Here's a lesson: if you're standing in a three-hour queue for the Ryanair desk, and they tell people to rebook on the web, and you take out a laptop and 3G modem, be prepared for a stampede.

Re:In the queue (1)

a_real_bast... (1305351) | more than 6 years ago | (#24239389)

You made two fatal mistakes:

1) You didn't do it where no-one could see you.
2) You flew Ryanair.

Re:In the queue (0)

Anonymous Coward | more than 6 years ago | (#24239585)

3. you didn't display your premium usage rates

Re:In the queue (1)

bernywork (57298) | more than 6 years ago | (#24239663)

On top of that..

You are flying ryan air, everything is an extra, I am suprised they don't charge to use the bathroom onboard (Having said that, they probably will now).

5 a person to rebook on your laptop, would have paid for a new laptop!

Re:In the queue (0)

Anonymous Coward | more than 6 years ago | (#24239755)

Serves you right for flying Ryanair ( or Not So Easyjet for that matter )
Ryanair bumped me off a flight earlier this year. No warning except a refusal to let me board.
Ok they offered me a later flight but my almost 5 month pregnant partner was already on the flight and we missed our connecting flight.
They then called security when I kicked up a stink.
They Suck royally

Re:In the queue (1)

caluml (551744) | more than 6 years ago | (#24239785)

You could make a pretty penny with that. £5 a shot, or whatever. Plus your keystroke logger would have tonnes of valid credit card information. :)

Re:In the queue (1)

heathen_01 (1191043) | more than 6 years ago | (#24239857)

Did your laptop do you any good?

The Ryanair website was unusable (for me) during that time.

First time? (0)

Anonymous Coward | more than 6 years ago | (#24239305)

âoeThales ATM stated that in 10 similar air traffic control Centres worldwide with over 500,000 flight hours (50 years), this is the first time an incident of this type has been reported.â

Is the LAX incident not of the same type, then?

Re:First time? (0)

Anonymous Coward | more than 6 years ago | (#24239509)

Thales don't have any comparable ATC systems in the US.

One card "overcame the redundancy"??? (3, Insightful)

gweihir (88907) | more than 6 years ago | (#24239319)

If they have good redundancy, they have two separate networks and two independent, preferrably different network cards, in all systems. Then they would do fail-over. Seems to me that if one card can bring this down, then the people that designed the redundancy screwed up badly.

Re:One card "overcame the redundancy"??? (1)

jacquesm (154384) | more than 6 years ago | (#24239533)

second that... sorry, I missed your post before I wrote mine. Whoever built the system goofed, and to screw up with flight control systems at this level should be grounds for termination and never ever to get work in mission critical systems again. There really is no room for error in systems like this.

I've worked a bit in the aerospace industry, specifically on software that would estimate the amount of fuel required for a flight taking into account alternative landing areas, winds and so on.

The amount of checking I did on that code bordered on the paranoid but I really could not live with some plane going down somewhere because of a stupid error in design.

Come to think of it, mission critical software should probably be open source, *always* so you can see what you're entrusting your life to and so that the 'many eyes' out there can point out the flaws. (assuming they're not eyes that will use that knowledge to bring down your system...).

Why!? (4, Funny)

damburger (981828) | more than 6 years ago | (#24239443)

I am flying to Florida tomorrow, it will only be my fifth plane flight in total and my first transatlantic flight. Despite being a rational scientist, who knows how safe it is statistically, I am having trouble suppressing my anxiety.

And at this point, fate sees fit to bombard me with horror stories about flying. This news about air traffic control comes on the heels of a headline I just saw on the front page of the Independent about pilots not reporting faults on aircraft and thus unsafe ones still flying about. I can't remember the exact wording because my brain parsed it as "TOMORROW YOU WILL DIE IN FLAMES"

Re:Why!? (3, Funny)

FrostedWheat (172733) | more than 6 years ago | (#24239593)

A long time ago I went on a school trip to London, and it was the first time I had ever been on a plane so I was a bit nervous. In the airport shop there was a magazine (can't remember which now) with a plane in flames on the front cover, with the large headline "Why Planes Crash". Whoever put them out must have had an evil streak too, they had spread them out to fill the entire top shelf.

Re:Why!? (1)

damburger (981828) | more than 6 years ago | (#24239611)

Damn thats cold

Your signature, however, gives me something else to focus on. Fucking software patents! Idiotic corporate pandering EU! Grrrrrr! I'm not afraid of flying, I'm angry about IP abuse!

Re:Why!? (0)

Anonymous Coward | more than 6 years ago | (#24239673)

When I took a trip over to the Emerald Isle last, the night before we were due to fly over to Blighty I made everyone in the party watch a Discovery Channel programme on Helios Flight 522.

Needless to say, anxiety has never been an issue for me.

I got away with it by pointing out that there was no conceivable way in which watching a TV show could alter the probabilities of the plane dropping out of the sky the next day.

Re:Why!? (2, Funny)

damburger (981828) | more than 6 years ago | (#24239689)

Depends, was the pilot at your house?

Re:Why!? (0)

Anonymous Coward | more than 6 years ago | (#24239953)

Then I guess this [scareports.com] won't help very much :)

See, this is why people are stupid, because all you see is this. Also, I've noticed that page contains too many "terrorists" words, really. Yes, there are bad things happening but those are the only ones that hit the news, the good stories where you were asked to go through the X-ray machine and you were just scared of X-rays so they let you pass don't make the news or they make the news as "guy walks through airport without being scanned! terrorists could do that."

Just ignore this /. story and move along because good things have always happened and will happen to you in the future and feeling good is all you should expect from flying. While you're up there, in stead of thinking of what could happen (and probably never will), think of what is really happening in that moment: you're flying!

Thanks for flying with us, have a nice trip ;)

that redundancy (0)

Anonymous Coward | more than 6 years ago | (#24239511)

was not 'built in' properly, the system should have been isolated after the first fault and not put back into service until the fault was diagnosed and fixed.

pretty sloppy if you ask me...

Truth? (1)

Fri13 (963421) | more than 6 years ago | (#24239527)

"[They] confirmed the root cause of the hardware system malfunction as an intermittent malfunctioning network card which consequently overcame the built-in system redundancy,' said an IAA spokeswoman."

And when we edit littlebit, can we have the truth?:

They confirmed the root caused the hardware system malfunction using an intermittent malfunctioning network card wich consequently overcame the build-in system redundancy.

Confusing terminology (5, Informative)

ddrichardson (869910) | more than 6 years ago | (#24239565)

I work in aviation and wonder if the terminology being used by the newspaper articles is correct.

It appears to be talking about mode S IFF (Interrogation Friend or Foe) or SIFF radar systems which identify aircraft and appends height data. The speed is the only thing that needs calculating, as it isn't encoded in the pulse train.

Why this is weird is because much older bus technologies are normally used to handle this data being transferred than current network technology, such as MIL-STD-1553 [wikipedia.org] .

This makes me wonder if it was one of two things - a system inputing to an ethernet PC system that calculates and displays the information or more likely they are talking about a DLTU type stub connector (or remote terminal) used in such typical buses. This is unlikely because the bus systems they are employed on, the bus controller would have picked up on the failure during continuous built in test and pulled in an alternative.

If its the former then someone needs shooting. ATC is a realtime application and the overhead involved here would be unacceptable. I'm not even sure of the benefit of a network, multiple self contained indiviual terminals would be safer.

Re:Confusing terminology (1)

ledow (319597) | more than 6 years ago | (#24239905)

A quick google turns up:

http://en.wikipedia.org/wiki/Avionics_Full-Duplex_Switched_Ethernet [wikipedia.org]

Which suggests that Ethernet-derived products are, indeed, used in critical systems (although this seems to be on-aircraft rather than in ATC). It (apparently) has seen wide deployment on common "famous" aircraft.

And the UK has been "upgrading" its air traffic control for years and years - so much so that they now appear to be nothing more than an office with some multi-head display if the footage shown on news-reports of a year or so ago are to be believed. It's concievable that this is truer than you would think.

However, I bow down to your knowledge as I know nothing about aviation at all.

Re:Confusing terminology (2, Interesting)

ddrichardson (869910) | more than 6 years ago | (#24240005)

While you're right, the key phrase from the article you give is:

ARINC 664 Specification which defines how Commercial Off-the-Shelf networking components will be used for future generation Aircraft Data Networks (ADN).

Specifically, this standard is aimed at use on aircraft not in ATC, in fact because of the weight reduction it offers.

Also not to split hairs but Dublin is not in the UK, this seems trite but is valid as there are different agencies involved. More over, the appropriation of new technologies is obsessive in the UK at present and has been for some time (except in the financial sector). There is a perception that newer is better and that answers to questions nobody asked are best solved by combining off the shelf components in a similar topology to older generation systems.

There is an argument to upgrade ATC due to higher volumes of aircraft but I can't help wonder if there is a bigger drive towards efficiency rather than safety.

annoying.. (0)

Anonymous Coward | more than 6 years ago | (#24239573)

This is the first time in 50 years that this has happenend. And the first time they had accurate information on screens was 10-15 years ago..

Irish Examiner, ha! (3, Funny)

PinkyDead (862370) | more than 6 years ago | (#24239759)

Everyone in Ireland knows that the Irish Examiner used to be the Cork examiner - and they never miss an opportunity to point out how Dublin is doing a bad job.

This is because Cork thinks that it's the centre of the friggin' universe. The 'Real Capital', my arse! Just a bunch of thunderin' ejits, living in their little Blarney fantasy land. Sure they can't even talk right. What the hell is a 'langer', anyway. They wouldn't even know how to spell NIC.

The fact that they are right is quite beside the point.

(For a North American cultural equivalent, please see http://en.wikipedia.org/wiki/South_Park:_Bigger%2C_Longer_%26_Uncut [wikipedia.org] )

Anyone who mods me down is from Cork - believe it!

Load More Comments
Slashdot Login

Need an Account?

Forgot your password?