Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Failed Software Upgrade Halts Transit Service

Soulskill posted about 10 months ago | from the ay-carumba dept.

Transportation 125

linuxwrangler writes "San Francisco Bay Area commuters awoke this morning to the news that BART, the major regional transit system which carries hundreds of thousands of daily riders, was entirely shut down due to a computer failure. Commuters stood stranded at stations and traffic backed up as residents took to the roads. The system has returned to service and BART says the outage resulted from a botched software upgrade."

cancel ×

125 comments

Sorry! There are no comments related to the filter you selected.

I Guess (3, Funny)

CheezburgerBrown . (3417019) | about 10 months ago | (#45496781)

They should have brought their skateboards to work.

Re:I Guess (2)

noh8rz10 (2716597) | about 10 months ago | (#45496795)

wow first it's the unions that are shutting them down and now a software update? I wonder what will happen next.

Re:I Guess (1)

CheezburgerBrown . (3417019) | about 10 months ago | (#45496825)

San Fran will turn into Detroit?

Re:I Guess (0)

Anonymous Coward | about 10 months ago | (#45496901)

We can only hope.

Lived there for 4 years.. Never again. Out of the five cities I lived in, it was the worst.

Re:I Guess (0)

Anonymous Coward | about 10 months ago | (#45496959)

You and me both. I grew up there and got out as fast as I fucking could after High School.

Re:I Guess (0)

Anonymous Coward | about 10 months ago | (#45497119)

Well, welcome to Detroit! Most strip malls in the successful suburbs are half to 90% empty. Hope you like coney dogs!

Re:I Guess (2, Interesting)

Trax3001BBS (2368736) | about 10 months ago | (#45496997)

San Fran will turn into Detroit?

While from Reddit posted a day ago, it's so on topic to your post I had to post it your reply

http://www.reddit.com/r/explainlikeimfive/comments/1r6f8w/eli5_americans_what_exactly_happened_to_detroit_i/ [reddit.com]
Very good read if you want to know about Detroit

Re:I Guess (-1)

Anonymous Coward | about 10 months ago | (#45497983)

San Fran will turn into Detroit?

Ah yes Detroit. 90% or more black due to White Flight. Ever since the blacks had overwhelming majorities and elected black officials for their black city, the city went bankrupt! Why its as though blacks can't do it on their own. Not like what happened in Haiti or most places in Africa huh? Oh wait, thats exactly the same thing. Its as though there's a common denominator here... oh yeah, majority blacks! Downmod me now, since you cannot argue with facts. There. Feel better? That -1 will make it all go away, all those nasty facts. Heaven Forbid you change your worldview to match them, that would not be PC! Come back into the Fold, sheep, believe the Right Beliefs, they are totally equal despite all evidence to the contrary.

Re:I Guess (1)

JustOK (667959) | about 10 months ago | (#45499961)

Yah, white people have NEVER fucked up a government.

Re:I Guess (1)

Anonymous Coward | about 10 months ago | (#45496877)

I wonder what will happen next.

People will buy cars. Only so much of this nonsense can be tolerated when it fucks with your livelihood. When the boss shows up and all the people with cars are getting it done and all the people with train tickets are at home making excuses... well, you shouldn't need any help figuring this part out, even if you don't like it.

Re:I Guess (1)

somersault (912633) | about 10 months ago | (#45497011)

Except the boss probably couldn't get to work either, unless maybe he has a bike.

Re:I Guess (1)

tompaulco (629533) | about 10 months ago | (#45497871)

Except the boss probably couldn't get to work either, unless maybe he has a bike.

What? The executive class condescend to ride in Public Transportation? Scoff!

Re:I Guess (1)

_Shad0w_ (127912) | about 10 months ago | (#45499495)

They do here. They just have First Class tickets instead.

And the ones who drive just get stuck on the M25 instead.

Re:I Guess (1)

OldeTimeGeek (725417) | about 10 months ago | (#45497277)

Never been to San Francisco, have you?

Let's say all of the BART riders start driving in. They will find themselves adding more traffic to an already congested highway system that will never, ever, get any larger. There simply isn't the space. And once they get to work, good luck finding some place to park...

Re:I Guess (-1)

Anonymous Coward | about 10 months ago | (#45497997)

Twice. Drove while I was there both times. Market street and everything. Got lost once and ended up at Haight and Ashbury.

So fuck you.

Re:I Guess (2)

milkmage (795746) | about 10 months ago | (#45497921)

most of them already have cars. BART serves the Bay Area. 50 miles south and east of SF.

the week long strike earlier this year caused havoc on the roads- people were on the road at 0400, and still late for work. extra busses, extra boats, not enough.

https://www.google.com/search?q=bart+strike+traffic&espv=210&es_sm=119&tbm=isch&tbo=u&source=univ&sa=X&ei=EhyQUtq2FYb9iQKq2oG4CQ&ved=0CDYQsAQ&biw=1354&bih=647 [google.com]

Re:I Guess (4, Funny)

RabidReindeer (2625839) | about 10 months ago | (#45497087)

wow first it's the unions that are shutting them down and now a software update? I wonder what will happen next.

Unionized software.

Ironic, isn't it? Silicon Valley commutes wrecked due to bad IT practices!

Re:I Guess (1)

mrchaotica (681592) | about 10 months ago | (#45497763)

You do realize you've just summoned an earthquake, right?

Re: I Guess (0)

Anonymous Coward | about 10 months ago | (#45497485)

I did, it's only a 3 mile skate to work...

Re:I Guess (1)

_Shad0w_ (127912) | about 10 months ago | (#45499491)

There's a guy who catches one of the trains I catch in the morning who always gets on with his skateboard. Although I work in North London.

Strange times (5, Insightful)

nightsky30 (3348843) | about 10 months ago | (#45496869)

Why was a weekday selected for this software update?

Re:Strange times (0)

Anonymous Coward | about 10 months ago | (#45496915)

They may just have a higher load on the weekend.

Re:Strange times (4, Informative)

TWX (665546) | about 10 months ago | (#45496935)

Well, based on my own experience with bureaucracies, there is some existing rule that ensures that certain types of staff have certain days off unless there's an emergency, and a software update probably didn't previously count as an emergency.

From one standpoint, it makes sense, especially if those doing the work need technical support from a vendor. On the other hand, it probably makes more sense to have a QA lab set up if one is going to operate this way, so that one can test a rollout in advance, hopefully forestalling such problems going live.

Re:Strange times (4, Insightful)

B33rNinj4 (666756) | about 10 months ago | (#45497211)

Man, my company hasn't had a QA environment that mirrored production in over a decade. I'd like to think that they had something set up, but the few state-run departments I've seen have been sorely lacking.

Not so much the bureaucracies (1)

rsilvergun (571051) | about 10 months ago | (#45497665)

it's more the contractors refusing to train and keep their hires. Nobody wants to keep someone around. They cost more every year. But for programmers that means nobody knows how anything works. It keeps profits high for the guy running the sub-contractor, but it means crummy software...

Re:Not so much the bureaucracies (0)

Anonymous Coward | about 10 months ago | (#45498013)

it's more the contractors refusing to train and keep their hires. Nobody wants to keep someone around. They cost more every year. But for programmers that means nobody knows how anything works. It keeps profits high for the guy running the sub-contractor, but it means crummy software...

As long as purchasing decisions are made by MBA types who do not understand the technology they are buying, it will remain this way. Cutting costs is only ever more important than EVERYTHING ELSE EVER! when your customers are too ignorant to distinguish quality from crap. Then they have no choice but to go with whatever the slick sales guy tells them.

Re:Not so much the bureaucracies (0)

Anonymous Coward | about 10 months ago | (#45499821)

Maybe in some specific cases, but in general how can they?

You're asked to supply resource X to work on Y for 6 months, then they contractually could stop paying you (assuming they didn't request that the contractors took an unplanned furlough due to a budget hole).

You are then going to pay X to hang around in case you do get a subsequent project on Y?

If you are going to outsource stuff which is time boxed and reconstructible fine, if you outsource stuff which you should keep in house, who's at blame?

Re:Strange times (5, Insightful)

girlintraining (1395911) | about 10 months ago | (#45498009)

On the other hand, it probably makes more sense to have a QA lab set up if one is going to operate this way, so that one can test a rollout in advance, hopefully forestalling such problems going live.

And that's pretty hopeful. The thing is, in the real world, you just don't test all your patches. You can't; in any non-trivially sized network you're going to have hundreds of them to go through every week, and the workload is the same for a small or large business. That's why large businesses tend to do better (strangely enough) than small ones when it comes to patch management. And this is an attitude that is backed up by the numbers -- I would say over 9 times out of 10, a break/fix patch has no consequences being pushed into the production environment. It goes out. The version increments. The end. It's that 1 time that screws everyone up -- but it happens infrequently enough that management doesn't update its policies.

Most managers operate under a triage approach to maintenance -- that is, throw resources at a problem when something breaks and complaints start coming in, rather than throwing resources at prevention. In the short run, this is the right approach -- in a crisis you want all hands on deck. The problem is that over time, neglecting preventative maintenance procedures, which show up only as a cost without a defined benefit, results in departments moving to a triage model all the time. Basically, the problem is short-term prioritization over long-term cost reduction.

And I've seen it in almost every IT department I've worked for. I've even sat down with managers and explained to them that when 35% of their workflow is emergency break/fix and that number is trending upwards, we have a process control issue. They invariably agree with me, but say they can't get out from under the workload. Of course, when I come back three months later and it's now at 47% and the workload is now a third higher, they say the same thing.

I would lay money that this is how project management is happening at BART, and it has now deteriorated to the point where its starting to impact its core business. The problem is, while it is still likely at a point where effective project management can right this sinking ship... it almost never happens. Unfortunately, the solution most of the time here is to throw someone under the bus, blaming them for the failure, and insisting that as the system has worked up until this point, it does not need an overhaul.

They couldn't be more wrong; But unfortunately it will take several people being thrown under the bus and a few more high-profile failures before senior management fires the mid-level manager responsible for the project and brings on someone with a strong background in project management and they restructure their department from the ground up following the best practices of change management. Of course, they'll over-do it in the attempt and the pendulum will have to start swinging back the other way, but... that's what happens.

Re:Strange times (0)

Anonymous Coward | about 10 months ago | (#45499641)

Wish I had mod-points.

I have been in the business over 20 years and your post exactly matches my own experience.
Right now the company I work for is on the other side of the pendulum.
Any change takes a metric fuck-ton of paperwork and you have to wade through 4 levels of project-managers for approval, bringing progress and innovation to a complete standstill.

It is going to be the end of the company I'm afraid.
You can't move in the mobile apps markets if it takes 4-6 months to approve and roll-out a new software development tool-chain.

I've got another job lined up already so personally I'm not worried, but many of my co-workers have been so hammered into the corporate mold they can't see the writing on the wall.

Re:Strange times (2)

DavidClarkeHR (2769805) | about 10 months ago | (#45496943)

Why was a weekday selected for this software update?

Should have been a tuesday. Then our windows updates and our transit updates would match! (... 14% ... for ... ever ...)

Re:Strange times (-1)

Anonymous Coward | about 10 months ago | (#45497091)

You know, I honestly don't give a fuck about global warming. I figure by the time it happens I'll already be dead. Fuck the future generations. And I don't give a fuck if Obama can see me post this. I'm going to shit debt, CO2 and eye soars on them. I'm living for me. Not some fucking little brat who keeps crying while I'm at a restaurant, trying to enjoy a simple fucking meal.
 
Fuck them. Fuck this planet.

Re:Strange times (-1)

Anonymous Coward | about 10 months ago | (#45497347)

> I honestly don't give a fuck about global warming

Typical Republican, but that doesn't explain why you people want to speed the process. Destroying the Earth is one thing, but to want it to happen faster makes no sense to us non-Repukians. Why are you people like that?

Re:Strange times (0)

Anonymous Coward | about 10 months ago | (#45497699)

> I honestly don't give a fuck about global warming

Typical Republican, but that doesn't explain why you people want to speed the process. Destroying the Earth is one thing, but to want it to happen faster makes no sense to us non-Repukians. Why are you people like that?

Actually, I'm a Democrat. (Or was. Never again.)

Re:Strange times (1)

Anonymous Coward | about 10 months ago | (#45498041)

You know, I honestly don't give a fuck about global warming. I figure by the time it happens I'll already be dead. Fuck the future generations. And I don't give a fuck if Obama can see me post this. I'm going to shit debt, CO2 and eye soars on them. I'm living for me. Not some fucking little brat who keeps crying while I'm at a restaurant, trying to enjoy a simple fucking meal. Fuck them. Fuck this planet.

This is a typical Baby Boomer. Imagine it. In all of American history, the Baby Boomers are the first generation to leave their children with a worse, more fucked-up world than what they had. This is more than a mere "fail at life". This is a fail at present AND future life. That's unprecedented in this country.

And the average Baby Boomer is so arrogant and entitled too. If I were them I'd be a lot more humble and try to stay out of the way and stop running up debt and stop ranting about the youth and try not to hold up traffic going 20 below the limit. Maybe eventually the younger generations are going to get fed up and will forcibly remind the Baby Boomers that the Boomers need the youngers, the youngers do not need the Boomers.

Re:Strange times (2)

x181 (2677887) | about 10 months ago | (#45497019)

so they can purposely botch it and justify the need to have human operators. in case you don't know, BART is currently going through a tense union battle resulting in a few worker strikes and contract disputes.

Re:Strange times (1)

Hamsterdan (815291) | about 10 months ago | (#45497189)

Why was a *production* system chosen to test the upgrade would be a better question. Why were there no fallbacks an even better one...

Re:Strange times (2)

s1d3track3D (1504503) | about 10 months ago | (#45497449)

Yes and I bet there was a least one developer saying the exact same thing who was overruled by mgmt who proceeded with the push regardless!

Re:Strange times (0)

Anonymous Coward | about 10 months ago | (#45497907)

Yes, of course, it's always clueless management ignoring the brave developer who warns of catastrophe.

It's just as likely the developers hung around with their thumbs up their butts, waiting to be told what to do.

Re:Strange times (2)

causality (777677) | about 10 months ago | (#45498047)

Yes, of course, it's always clueless management ignoring the brave developer who warns of catastrophe.

If management wants the power in the form of the final decisions (which they have), and the ability to take most of the credit (which is often the case), then they also get to keep the responsibility.

Sounds fair to me. Power and responsibility should never be separated. Ever.

Re:Strange times (0)

Anonymous Coward | about 10 months ago | (#45498725)

It's not the developers who raise red flags. It's the systems people. Been there, done that 2 days ago.

Re:Strange times (3, Funny)

Salo2112 (628590) | about 10 months ago | (#45497641)

Patch *Tuesday*. Duh.

Re:Strange times (1)

SeaFox (739806) | about 10 months ago | (#45498849)

Why was a weekday selected for this software update?

The same reason your cable company does maintenance in the middle of the day when at night they would disrupt far fewer customers -- the managers are tightwads and don't want to pay the rank-and-file employees for the extra hours outside their normal schedules, and the ones on salary are among that group that refuses to work outside 9-5 M-F.

BART has drivers. (0)

Anonymous Coward | about 10 months ago | (#45496911)

BART has real drivers and I would assume a legacy intercom system. Why do they need computers at all? It's just another thing to go wrong and break down.

Re:BART has drivers. (1, Interesting)

Anonymous Coward | about 10 months ago | (#45497127)

Because there is no means in the "cockpit" to actually make the train go. There are three buttons in a BART rail car:

Open Doors
Go to next stop
Emergency Stop

Not even a "close doors" button - that is handled by door sensors and the computer when "Go to the next stop" is pressed.

Everything is automated. A chimpanzee could operate a BART train.

Re:BART has drivers. (-1)

Anonymous Coward | about 10 months ago | (#45497187)

A chimpanzee could operate a BART train.

Yey, it's a jobs program for minorities!

Re:BART has drivers. (0)

Anonymous Coward | about 10 months ago | (#45497411)

See that's exactly the problem! Complicated mass-transit rail systems existed before computer-control, and they worked quite well. People seem to insist on deploying technology whether it actually solves problems or just introduces more problems.

Re:BART has drivers. (4, Interesting)

bluemonq (812827) | about 10 months ago | (#45497965)

You've almost certainly never ridden BART, much less seen the driver's cab. Why do I say this? Because there's a section of the BART system (the Oakland Wye, bane of commuters who want to get anywhere during rush hour) where drivers are instructed to go to manual control, limited to 25 MPH. It's the result of your vaunted "automated" system designed in the '60s never having worked properly in the past 50 years, and one of the contributing factors to a crash in 2009 (thankfully no one was seriously injured). There are many well-documented incidents of entire train sets disappearing from the computer system, as well as "ghost" trains randomly appearing.

Here is what an actual BART cab looks like:
http://i.imgur.com/IbYtYTa.jpg [imgur.com]

computers run the track swtichs (2)

Joe_Dragon (2206452) | about 10 months ago | (#45497979)

computers run the track switches

Hello, IT. (3, Funny)

tech.kyle (2800087) | about 10 months ago | (#45496923)

Have you tried turning it off and on again?

Re:Hello, IT. (1)

gagol (583737) | about 10 months ago | (#45497735)

Reynholm Industries, successful makers of [insert_your_guess_here]. Great quote!

Never upgrade (0)

Anonymous Coward | about 10 months ago | (#45496947)

This is why I don't upgrade shit. If it isn't broke, don't fix it.

Re:Never upgrade (1)

s1d3track3D (1504503) | about 10 months ago | (#45497463)

So your posting from an un-patched windows 98 box? Or are you still on 3.1?

Re:Never upgrade (0)

Anonymous Coward | about 10 months ago | (#45499011)

Well, you got it right when you guessed Windows 3.1. It allows me to run the full Office suite and a lot of games from the Windows Entertainment Pack. I have been planning the 3.11 upgrade just to get the workgroup features. But would I use those features that much? It's not worth the risk, considering what else it might break in the process.

Re:Never upgrade (1)

bluemonq (812827) | about 10 months ago | (#45497975)

It was broke (and remains so) decades ago. The automated system never really worked properly.

Thanks Obama (-1)

Anonymous Coward | about 10 months ago | (#45497017)

Did the make the healthcare website too?

BART (5, Interesting)

Anonymous Coward | about 10 months ago | (#45497027)

BART is run by the dumbest people on Earth. First off, it's takes a special kind of stupid to create a rail system that goes almost, but not quite all the way to the airport. 30 years later they extended to one of them but you still have to transfer to a bus for the last mile on another. Then you have to wonder what kind of idiot puts light carpet and cloth seating on public transport. 35 years later they start testing non-porous flooring/seating and maybe in another five years all of the trains will be switched over. Then, some bean counter got a bonus when they closed all the station bathrooms when 9/11 happened, ostensibly for security. Now a fifth of the escalators are out of service at any one time because they are clogged with human shit.

I also heard there was some sort of labor dispute.

Re:BART (3, Insightful)

Jane Q. Public (1010737) | about 10 months ago | (#45497161)

"BART is run by the dumbest people on Earth."

Well, you really do have to wonder when they say they worked through the whole night only to discover that this new, mysterious problem was caused by the updated they'd made the night before.

I mean, wow. Wouldn't that be the first thing that popped into your mind?

Re:BART (2)

gagol (583737) | about 10 months ago | (#45497755)

To suspect something is one thing, to be sure of it you need to gather and analyse data at best. A night to confirm it is reasonable. And bathroom in a metro is a luxury, how many undergrounds have those facilities (dont know, none in montreal, canada)?

Re:BART (0)

Anonymous Coward | about 10 months ago | (#45497837)

Budapest, Hungary had bathrooms in the subway. And they were staffed by mean looking women outside who took payment. I didn't go in one, but the women I was traveling with said they were so-so. Considering how horrible public restrooms normally look in the states, so-so in Hungary (which is still climbing out of a Soviet-era infrastructure hole) sounds pretty nice.

I think that's fundamentally the problem--we don't have attendants.

Considering that few nations will be able to attain the civic good nature of Japan or their colonial off-spring, Taiwan; we don't want to give the death penalty for infractions like in Singapore; and because technology still can't close the gap; it makes sense to staff public restrooms with people who can police behavior. But in most developed countries wages are too high.

Re:BART (1)

phantomfive (622387) | about 10 months ago | (#45498989)

Japan has them all over the place in Tokyo.

Re:BART (2)

xaxa (988988) | about 10 months ago | (#45499683)

London Underground toilet map [tfl.gov.uk] (not so great in the centre, but pretty good elsewhere).

They're in probably half of European underground stations, on average. Expect to pay 0-50c, depending on the country.

My local station (in London) has one, it's always very clean. I don't think many people use it.

Re:BART (4, Informative)

MrEricSir (398214) | about 10 months ago | (#45497241)

The Bart-SFO extension was a matter of politics, you can't blame the people who run Bart for that. You also can't blame the initial designers for not building the OAK extension, since OAK was a much smaller airport in those days (and had very few passenger flights.)

The train design was done by an aerospace company with absolutely no rail experience, which explains Bart's quirky design elements. But you can't blame Bart current management for construction contracts awarded in the 1960's.

Re:BART (5, Insightful)

Anonymous Coward | about 10 months ago | (#45497523)

Plus, BART is not exactly a metro system like in Boston, Chicago, or New York. It's somewhere between a metro and commuter rail, but closer to the latter. It's a product of 1960s thinking, where people were trying to deal with the population shift out of the urban core. So part of the idea was to create high-speed transit from bed-room communities to downtown Oakland and San Francisco.

Connecting the airports probably never figured much into the equation. It wasn't built to supplement the transportation needs of carless San Francisco residents. It was built to shuttle people around the Bay Area. If you needed to get to the airport, you got there like everybody else--you drove your car.

Re:BART (1)

drinkypoo (153816) | about 10 months ago | (#45498771)

It wasn't built to supplement the transportation needs of carless San Francisco residents. It was built to shuttle people around the Bay Area. If you needed to get to the airport, you got there like everybody else--you drove your car.

But this just comes right back to how BART is stupid. Because when you build public transportation, it's going to be used by people who don't have cars, and to not take them into account is fucking stupid. Also, it's just stupid not to have the rail be able to take commuters from an airport to downtown no matter how you slice it. That should have been an initial design goal.

Re:BART (2)

SeaFox (739806) | about 10 months ago | (#45498895)

If you needed to get to the airport, you got there like everybody else--you drove your car.

But this just comes right back to how BART is stupid. Because when you build public transportation, it's going to be used by people who don't have cars, and to not take them into account is fucking stupid.

Maybe the assumption was if you couldn't afford a car, you probably couldn't afford to be going on many flights either. Keep in mind air fare was a bit pricier in the 60's and gas was quite a bit cheaper. Financial bar for car ownership was lower.

Re:BART (2)

drinkypoo (153816) | about 10 months ago | (#45498947)

Well, what I meant was that they should have taken both classes of passenger into account.

Ideally this means having lines segregated by socioeconomic status. You don't want to go to the airport and the ghetto.

Re:BART (0)

Anonymous Coward | about 10 months ago | (#45498945)

Exactly.

The car that you took onto the aircraft with you, and drove out of your destination airport so you didn't need public transportation either side.

That's what we have all been doing since 1959 or so.

Re:BART (3, Funny)

Anonymous Coward | about 10 months ago | (#45497365)

So people take a dump while riding the escalator? That's actually a cool idea.

Re:BART (2)

gagol (583737) | about 10 months ago | (#45497777)

Let us know how it went for you!

Re:BART (1)

Anonymous Coward | about 10 months ago | (#45498275)

It was certainly a moving experience; quite uplifting. The person behind me didn't seem to fully appreciate the view; or having to climb backwards when I stopped at the top to wipe --- especially once certain stairs came 'round again full loop. I suppose if I wasn't a Republican, I might have cared about their distress --- but, screw it, shitting on people just feels so good. Made riding on the peons' transit system feel totally worth it.

Re:BART (0)

Anonymous Coward | about 10 months ago | (#45498111)

They only shit on the executive one.

Re:BART (2)

bluemonq (812827) | about 10 months ago | (#45498015)

> 30 years later they extended to one of them but you still have to transfer to a bus for the last mile on another.

Pity you didn't have a spare $100 million a couple decades ago. I'm SURE you'd have been willing to pay for it, right? The extension to SFO wasn't built until recent times because back in the '60s San Mateo County quit the BART project, and the money wasn't around until the tech bubble started growing; ground was broken in 1997. The Oakland extension wasn't started until recently (opens in 2014) because again, there wasn't any money for it. The only reason it's getting built now is because Feds are footing a good chunk of the bill. OAK wasn't even all that popular an airport until last decade, after their renovation.

This is really surprising to me. (1)

tlambert (566799) | about 10 months ago | (#45497093)

This is really surprising to me.

For all the "can not fail" systems I've worked on, there has been an identical set of hardware, along with other hardware to simulate load, on which you could try upgrades before you put them on a live system and cost the local economy tens of millions of dollars by screwing up.

Re:This is really surprising to me. (1)

DexterIsADog (2954149) | about 10 months ago | (#45497935)

Most of the "cannot fail" and "mission critical" and "we're betting the company on this" systems I have seen have one (1) production environment, and one (1) development environment that sort of looks like production, with light servers on each developer's system.

I recently attempted to test the implementation of a client unlike any of those we had previously hosted, and the CIO and his Development VP told me, "we don't have the resources for that, we'll test it in production". It failed in production. I'm still picking up the pieces.

Software Has No Union Rep (1)

Bob_Who (926234) | about 10 months ago | (#45497101)

I guess you can't always save by eliminating humans and their expensive unions. Although, I'm sure the software was intended to pick up the financial slack for all of those expensive peeps. Don't worry, Wall Street is highly motivated to eliminate the humans with the software, eventually...

Snapshots? (2)

Neo-Rio-101 (700494) | about 10 months ago | (#45497131)

First I'm not going to plug any VM vendor.... but with certain VM backends, snapshots are possible, and it's a godsend when crap like this happens.

Re:Snapshots? (2)

Runaway1956 (1322357) | about 10 months ago | (#45497503)

You have to realize how few people even know what a VM is. Or a snapshot. Where I work, there is one backup made each week, on the server. No other machine has a snapshot, a disk image, a backup, there are no VM's - nothing. If/when a disk fails, that machine comes to a halt until a vendor is called in to replace the disk, the OS, and all the software.

We have some fool who is referred to as "the IT guy". I can't even say that with a straight face. This is one of those who got a Microsoft-centric education, and proved to be pretty adept at accomplishing Microsoft-centric tasks - and just happens to be related to the company president.

I know that our situation isn't unique.

Re:Snapshots? (1)

rubycodez (864176) | about 10 months ago | (#45497525)

you can do snapshots by other means than having VM software. Many volume managers and filesystems can do it, and some disk array controllers have that built in

Re:Snapshots? (2, Interesting)

Anonymous Coward | about 10 months ago | (#45497561)

No. Just no.

Have you ever actually tried this on a production system? I haven't (I'm not stupid enough to do that), but I've seen many others try. In almost every case, the resulting mess from "rolling back" a VM was greater then the mess of a botched software update to begin with. In one particular case, I witnessed a certain VM running some very expensive enterprise software totally hose itself and then proceed to blow away the majority of a database hosted on another VM after it was restored following a broken update. Despite their attempts to restore both VMs and bring them back in sync, they eventually determined that the data couldn't be trusted on either and the entire system had to be restored from backup. The downtime this cost them was greater then the downtime would have been had they simply called the vendor and said "your update broke our stuff, fix it" (they had the support contracts and the fix would have taken 10 minutes instead of 8 hours).

Another time I saw someone restore a VM that was running a network daemon for a cluster of hardware locks attached to one of the nodes (of course, this VM was locked to that particular node since it required passthrough access to the USB dongles). That was a good one- not only did none of the licenses get checked back into the network daemon (so they basically lost all the capacity they had in use at the time of restore), but the licensing software freaked out and shat itself when the time stamps coming off the hardware were suddenly in the future (as the clock had not yet been synchronized back to local time). It took those guys several days of pleading with the software vendor to send them new keys and get the licensing system sorted out and working again (snapshots were permanently disabled on that VM thereon after).

Now, it's an awesome feature to have for testing and development stuff- but for production, you should have procedures in place to deal with this kind of thing rather then reaching for the Big Red Button and nuking everything from orbit. I keep hearing about this kind of thing- "oh just restore the VM from snapshot in prod", and it makes me cringe every time I hear it. You don't restore a server from tape unless you absolutely have to. I fail to see why anyone thinks that restoring a VM from snapshot is any different- the only difference is that it takes seconds to complete, instead of hours.

Re:Snapshots? (0)

Anonymous Coward | about 10 months ago | (#45497971)

So the first example you gave was with a replicated database, that's really an obvious case where you don't use a snapshot. Microsoft tells you the same thing when deploying AD controllers in a virtual environment.

The second example you gave could have easily happened outside of a virtual environment. Imagine somebody did a restore from backup, or accidentally fucked up the system clock - the same thing would have occurred. That is just shitty software and not a problem related to virtual machines.

Don't blame the technology just because you don't understand it or because you have shitty software.

Re:Snapshots? (1)

Anonymous Coward | about 10 months ago | (#45499107)

> The second example you gave could have easily happened outside of a virtual environment. Imagine somebody did a restore from backup, or accidentally fucked up the system clock - the same thing would have occurred. That is just shitty software and not a problem related to virtual machines.

Because people just love to take down a system for hours restoring from tape at random? My point was that they restored the VM from snapshot because it was a quick and easy process. The system itself went down for about a minute (the clients didn't even notice until the licensing manager started to refuse floating license checkouts) and then it was back online. The snapshot was recent (only 6 hours old), so no harm done, right? Wrong.

In my experience, VM snapshots are dangerous precisely because they're so easy to implement, use, and abuse. Right clicking on a VM and selecting "Restore Snapshot" is infinitely easier then firing up your backup package and waiting hours for a server to restore from tape. The end result is mostly the same, save for the fact that VM snapshots will also store the CPU and RAM state to disk. Yet, people are rarely hesitant to roll back an entire VM when they should be treating it as a really quick full system restore.

I'm not saying it's a bad feature. I use it a lot. Lots of people I know use it a lot. But we all use it responsibly, and it is never, EVER the answer to "something broke" or "something isn't quite working properly". It is one of if not the last resort after attempting to troubleshoot the problem properly inside the VM itself.

Re:Snapshots? (2)

Todd Knarr (15451) | about 10 months ago | (#45498637)

Gods, no. Just... no. Think for a minute. If your VM's running a database server and you roll back to a snapshot, what happens? Well, the snapshot doesn't know anything about the database since that's an application-level thing, so it'll roll back to being mid-operation (times however many database operations were in progress). The problem is that since the clients haven't been rolled back to the same moment down to the nanosecond, the database is now mid-operation while the clients that're supposedly performing those operations... aren't. From here things proceed to go pear-shaped in a big way.

It can be done safely, but it requires either intimate knowledge of the application by the VM host or bringing the applications to a safe idle state before starting the snapshot. Basically snapshots are far less useful than they're made out to be because the problem you're trying to solve is far more complex than just taking a snapshot.

Good redundancy (1)

bob_super (3391281) | about 10 months ago | (#45497183)

"assistant general manager for operations, said the system's backup computer had gone down at the same time its central supervisory computer crashed."
Redundancy is not just running two boxes... How many times do we need to point out that there's a reason true redundancy is hard and expensive?

TFA (sorry for reading it) states that the problem showed up 12 hours after the upgrade. That's why it's time-consuming to test hi-rel stuff, whatever bean counters say...

arrrrrgh (-1, Troll)

Rixel (131146) | about 10 months ago | (#45497227)

I was on my way to sign up for Obamacare!!!!!!!!

I put my hand upon your hip When I dip you dip we (0)

Anonymous Coward | about 10 months ago | (#45497231)

I put my hand upon your hip When I dip you dip we dip

I put my hand upon your hip When I dip you dip we dip
I put my hand upon your hip When I dip you dip we dip
I put my hand upon your hip When I dip you dip we dip

fire them (-1)

Anonymous Coward | about 10 months ago | (#45497239)

all - lying union scum

What else would you expect... (0, Offtopic)

Anonymous Coward | about 10 months ago | (#45497289)

From a transit authority born from a constellation of institutions based on a bunch of "educated" people all telling each other they are right. The fact is most of todays institutions are completely out of touch with reality.

a) the medical profession
b) the legal profession
c) academia
d) state and federal law enforcement
e) etc...

Everything from dietary recommendation which have led to an increase in diabetes and cancer, recommending yet more carbs and less fats, etc. Most doctors couldnt find their way through a human metabolic map. Anyway let them build a robotic army, we will see who ends up in control of them.

my .0..1 BTC ;)

Looks like Terry Childs had a point (4, Funny)

Somebody Is Using My (985418) | about 10 months ago | (#45497349)

See what happens when you give these guys root access? ;-)

Re:Looks like Terry Childs had a point (1)

bluemonq (812827) | about 10 months ago | (#45498025)

BART is a metropolitan transit system. The city government of San Francisco has practically nothing to do with day-to-day operations.

It's not the software upgrade (0)

Anonymous Coward | about 10 months ago | (#45497607)

It's the lack of a decent rollback plan and making sure they had enough time and resources to rollback.

Manual operation (2)

manu0601 (2221348) | about 10 months ago | (#45497657)

I have seen quite efficient manual train network operation, but the workers behind the success could explain it was only possible because they had a few old timers who where still able to organize train flows using paper and pencil. Younger workers had always worked with computers, and when all the old timers will all be retired, the know-how will be lost.

Re:Manual operation (0)

Anonymous Coward | about 10 months ago | (#45498709)

Nah, they still teach it in middle school: Train X is heading towards Train Y at 60 mph. Train Y is heading at Train X at 25 mph. At a distance of 30 miles, when and where will they hit?

Though we were never given the other problems: Train X is heading north at 80mph. Train Y is north and heading north at 30 mph. Train Y switches tracks in 20 miles and Train X research the same switch in 70 minutes. Will Train Y make the switch off before it gets hit by Train X?

So you're probably right.

So does somebody go to jail? (1)

dbIII (701233) | about 10 months ago | (#45497677)

Terry Childs was locked up on the off chance that something far less disruptive than this would happen. At least that was the excuse.

Re:So does somebody go to jail? (1)

bluemonq (812827) | about 10 months ago | (#45498033)

BART is not under the governance of San Francisco.

Terry Childs pissed off the city and he worked for (1)

Joe_Dragon (2206452) | about 10 months ago | (#45498155)

Terry Childs pissed off the city and he worked for them.

Likely in this case some out side vendor / contractor messed up.

Re:So does somebody go to jail? (0)

Anonymous Coward | about 10 months ago | (#45498313)

Terry Childs was locked up on the off chance that he was an arrogant little unprofessional fuck who thought he had private ownership over his employer's infrastructure.

FTFY.

wtf? (0)

Anonymous Coward | about 10 months ago | (#45497867)

Using sharepoint?

Only one letter difference (0)

Anonymous Coward | about 10 months ago | (#45498151)

Only one letter difference between outage and outrage.

I Lov The SF Bay Area Don't YOU (0)

Anonymous Coward | about 10 months ago | (#45498211)

It is like a magical mystery tour of 1960's technology, including communications and information, and mind-think on display for all to enjoy.

A Dodgy software update. Ah! Bart IT runs on COBAL (beloved of US Federal DoD contractors), the IT of the Future! See. It is all very simple. Bart is Light-Years ahead of the mere humans who try to "run" it by leaps and bounds. Yet the Bart IT team encourages Sabots age, i.e. the tossing of Sabots into the gears of the "machine." We will have to wait for the intrepid Bart IT engineers to evolve to a sufficient brain capacity and comprehension level to understand the IT of the Future, COBAL.

COBAL, a gift from the GODs themselves no doubt.

QED

adventure tours in vietnam (0)

Anonymous Coward | about 10 months ago | (#45498563)

Well, you really do have to wonder when they say they worked through the whole night only to discover that this new, mysterious problem was caused by the updated they'd made the night before.

  adventure tours in vietnam [vietnammot...etours.org]

Load More Comments
Slashdot Login

Need an Account?

Forgot your password?

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>