Beta

Slashdot: News for Nerds

×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

State of Virginia Technology Centers Down

Soulskill posted more than 3 years ago | from the your-tax-dollars-at-work dept.

Government 190

bswooden writes "Some rather important departments (DMV, Social Services, Taxation) in the state of Virginia are currently without access to documents and information as a technology meltdown has caused much of their infrastructure to be offline for over 24 hours now. State CIO Sam Nixon said, 'A failure occurred in one memory card in what is known as a "storage area network," or SAN, at Virginia's Information Technologies Agency (VITA) suburban Richmond computing center, one of several data storage systems across Virginia.' How does the IT for some of the largest departments in a state come to a screeching halt over a single memory card? Oh, and also, the state is paying Northrup Grumman $2.4 billion over 10 years to manage the state's IT infrastructure." Reader miller60 adds, "Virginia's IT systems drew scrutiny last fall when state agencies reported rolling outages due to the lack of network redundancy."

cancel ×

190 comments

HA fail (4, Insightful)

Anonymous Coward | more than 3 years ago | (#33393622)

How does a fault in a single SAN controller cause an outage of the entire data storage network? Expensive SAN solutions are expensive & highly redundant for reason. This smells like a "Let's buy the cheaper solution" and/or an infrastructure design fail.

Re:HA fail (0)

Anonymous Coward | more than 3 years ago | (#33393772)

It is far worse than that. The summary says it is a meltdown! I don't know how IT could cause that, but terrorism must be involved. From what I've heard, they are evacuating New Jersey and calling in the National Guard.

Re:HA fail (4, Interesting)

cgenman (325138) | more than 3 years ago | (#33394116)

Also, this can happen when you hire an external firm to manage something that you should be managing yourself. External managers for projects like this are motivated by extracting as much money as possible from you. Internal departments of technology, by comparison, are motivated by convincing co-workers to not shout at them.

Re:HA fail (2, Funny)

Daniel Dvorkin (106857) | more than 3 years ago | (#33394514)

Also, this can happen when you hire an external firm to manage something that you should be managing yourself. External managers for projects like this are motivated by extracting as much money as possible from you. Internal departments of technology, by comparison, are motivated by convincing co-workers to not shout at them.

B-b-but you're saying that the bloated corrupt government that takes money from people at gunpoint and has no incentives for efficiency might have done a better job than a private contractor that works on the God-given free enterprise system that rewards efficiency and punishes waste! That's unpossible!

Re:HA fail (2, Interesting)

Even on Slashdot FOE (1870208) | more than 3 years ago | (#33394138)

Step 1) Design system so a single SAN controller is the only thing keeping the network running.
Step 2) Use money saved by not adding redundancy/designing the system correctly to give self money.
Step 3) Expect one component to last long enough for you to leave the job before it fails.
Step 4) ????
Step 5) Profit anyway because they don't get the concept of failures==bad things and keep paying you.

Re:HA fail (1)

Wyatt Earp (1029) | more than 3 years ago | (#33394540)

Did the dude from the City of SF design this network so that if he wasn't there to SSH in with a modem he had hidden in his toaster over, the ram in a SAN would bring the whole network down?

Re:HA fail (1, Insightful)

g0bshiTe (596213) | more than 3 years ago | (#33394326)

I live in Virginia, it's more like business as usual for a Commonwealth.

Re:HA fail (0)

Anonymous Coward | more than 3 years ago | (#33394674)

Agreed. Cheaper is better and worth 1-2 days downtime a year. How does the Cost per Downtime compare to other states?

Re:HA fail (1)

NotBornYesterday (1093817) | more than 3 years ago | (#33394426)

Because there was more than one failure. FTFA:

The system was built with redundancies and backup storage. It was hailed as being able to suffer a failure to one part but continue uninterrupted service because standby parts or systems would take over. But when the memory card failed Wednesday, a fallback that attempted to shoulder the load began reporting multiple errors, Nixon said.

Cheap solution problem? Possibly. Infrastructure design fail? Possibly, but not likely. Couldn't critique it without seeing their setup, but it sounds like they designed some redundancy in. I wonder what kind of "memory card" failed. From the description, it sounds like it might be a cache module.

Re:HA fail (1)

Local ID10T (790134) | more than 3 years ago | (#33394498)

Because there was more than one failure. FTFA:

The system was built with redundancies and backup storage. It was hailed as being able to suffer a failure to one part but continue uninterrupted service because standby parts or systems would take over. But when the memory card failed Wednesday, a fallback that attempted to shoulder the load began reporting multiple errors, Nixon said.

Cheap solution problem? Possibly. Infrastructure design fail? Possibly, but not likely. Couldn't critique it without seeing their setup, but it sounds like they designed some redundancy in. I wonder what kind of "memory card" failed. From the description, it sounds like it might be a cache module.

Regular testing of redundant systems is critical. Anyone who has done disaster planning knows this.

Re:HA fail (0)

Anonymous Coward | more than 3 years ago | (#33394480)

Too many Quantum BigFoot drives

Re:HA fail (3, Informative)

Wyatt Earp (1029) | more than 3 years ago | (#33394520)

Sweet Zombie Jesus.

If the RAM in our 8TB Netgear SAN fries it doesn't blow up my office, what the hell are they and Northrup Grumman doing?

Re:HA fail (0)

Anonymous Coward | more than 3 years ago | (#33394658)

>If the RAM in our 8TB Netgear SAN fries it doesn't blow up my office, what the hell are they and Northrup Grumman doing?

If I had to speculate, I'd guess they are thinking far beyond the 'single office' blowup scenario and dealing with an order of magnitude more volume.]
At least I hope so. If the whole state government used a single, consumer-grade SAN, that wouldn't really surprise me but it bothers me a lot that people get billion dollar contracts for things where I'd do a far better job at $200/hr for a few months.

Re:HA fail (1)

donnyspi (701349) | more than 3 years ago | (#33394638)

Yeah really. Before we got away from traditional hardware (NAS, SAN, etc.) we had piece of crap Dot Hill arrays and they had redundant power supplies and redundant controllers. There must be more to this story.

card? (1)

The Lyrics Guy (539223) | more than 3 years ago | (#33393624)

What is a SAN memory card?

Re:card? (3, Insightful)

snookerhog (1835110) | more than 3 years ago | (#33393644)

sounds like nobody in Virginia knows either

Re:card? (3, Informative)

Culture20 (968837) | more than 3 years ago | (#33393666)

A technically correct term, albeit against normal colloquialism which calls them memory chips. Memory chips are the black things on the cards.

Re:card? (2, Informative)

NotBornYesterday (1093817) | more than 3 years ago | (#33394446)

From the awkward phrasing, my completely uninformed guess is they are referring to a cache module on a controller somewhere.

It's always money (2, Interesting)

Anonymous Coward | more than 3 years ago | (#33393636)

I'll tell you exactly how. Some manager somewhere said that it cost too much to add redundancy. It's happened over and over at my extremely large company, and it will continue to happen as long as money is the prime concern.

Re:It's always money (1)

jsnipy (913480) | more than 3 years ago | (#33393724)

at least now they can quantify thier (bad) descision with thier loss of productivity and perhaps loss of revenue.

Re:It's always money (1)

snookerhog (1835110) | more than 3 years ago | (#33393732)

why do we need redundancy when the MTBF is 500000 hours? That's more than 57 years! Surely we will replace the whole system in less than that time, so why bother with redundancy?

Re:It's always money (-1, Offtopic)

bluefoxlucid (723572) | more than 3 years ago | (#33393792)

I've got a bitch in brooklyn that wants to sleep with you, too.

Re:It's always money (0)

Anonymous Coward | more than 3 years ago | (#33393898)

No, we don't want to sleep with your mom.

Re:It's always money (2, Funny)

Anonymous Coward | more than 3 years ago | (#33394670)

Oh yeah? Well YOUR momma so stank, she lay down on train tracks and nothing happened 'cause not even the train would hit that.

Re:It's always money (1)

jeffmeden (135043) | more than 3 years ago | (#33393844)

What does mean mean again? Ah nevermind. Odds are 2 out of 3 it will fail outside of business hours anyway. And if that's the case, no one will notice!

Re:It's always money (2, Interesting)

cgenman (325138) | more than 3 years ago | (#33394172)

I love how people can determine a MTBF of 50 years after testing a piece of hardware for a month.

For my money, the only computer that should be able to claim a 50 year MTBF is the Univac. And that's really, really not accurate.

Re:It's always money (1)

jeffmeden (135043) | more than 3 years ago | (#33394312)

What does mean mean again? Oh, that's right. If you want a MTBF of 50 years, you can either get one unit and run it for 50 years to prove yourself, or you can get 100 units and run them for 6 months... To be sure, it doesn't automatically take into account mechanical wear but any engineer worth their salt can extrapolate acceptable wear rates with 6 months of data (and that's only if you are talking about systems with moving parts)...

Re:It's always schedule (2, Interesting)

rwa2 (4391) | more than 3 years ago | (#33393754)

Heh, it shouldn't be about the money, though... they should have specified high availability from the very beginning. They often throw it out during the prototyping stage, saying they need to Keep It Simple Stupid just to get things working, but then all the software is never designed to be able to handle redundancy, and shoehorning it in later becomes pretty much like starting again from scratch.

Also, designing in redundancy is usually worse than having no redundancy at all if it's never tested. There should be a pretty simple test plan, where, say, the CTO comes in and is allowed to pull any single random wire or component out of the rack and see how the system reacts / recovers. But unfortunately people are usually using the system by that time, and it's too much of a hassle to come in off-hours and pay everyone overtime for such a test.

Re:It's always money (3, Insightful)

Daniel_Staal (609844) | more than 3 years ago | (#33393764)

Add in politics: Get a couple of representatives arguing over where the money (if any) should be spent, and all possibility of real redundancy and fault-tolerance go out the window.

It's true in larger government organizations than this. The failures just haven't occurred yet.

Re:It's always money (1, Insightful)

Anonymous Coward | more than 3 years ago | (#33393804)

There's not a lot of money left over for redundancy after you take out the kickbacks, graft and bribes.

Re:It's always money (3, Interesting)

geekoid (135745) | more than 3 years ago | (#33394528)

This is a private sector failure. NG is the culprit here, not the government.

This is why you should be very wary of bidding out work to 3rd party. They don't care about your city. They are not thinking about how their decision impact the city in 10-20-50 years.

and while infrastructures is far more complex and expensive then people who don't deal with it realize, 2.5 billion of 10 years? 240million a year? That is a price where they should have a tested redundancy system. I single point SAN failure? Shame on NG.

I hate to burst your preconceive bubble, but in my years in the private sector and public sector as taught me, most government agency are far better at keeping there own infrastructure. More reliable and long standing.

Re:It's always money (2, Insightful)

Daniel_Staal (609844) | more than 3 years ago | (#33394678)

My 'preconceive bubble' is based on my current job for the US government, and the situation we have in our department.

It might be true on average that government agencies are better at keeping their own infrastructure, especially if they can manage to keep their accounting and design of that infrastructure at a lower level. However, once those decisions pass the level from the internal to the external (or: From those hired for the job, to those elected/appointed into it), that long-term planning appears to break down, in favor of political squabbles.

Re:It's always money (0)

Anonymous Coward | more than 3 years ago | (#33394594)

Yeah, the failures have occurred, but they've been spun better.

Re:It's always money (1)

joebok (457904) | more than 3 years ago | (#33394030)

Money is always the prime concern for a business. If the cost of adding redundancy is higher than the expected cost of dealing with network failure, then why would a business do it?

That being said, I often see the cost of dealing with a significant network interruption being underestimated - either the $ cost or the probability of it happening.

Re:It's always money (4, Insightful)

cgenman (325138) | more than 3 years ago | (#33394250)

Everyone seems to think that a network outage is no big deal, until the network goes down. That's when people start thinking of the burn rate of an entire organization sitting on their thumbs while that network of off-the-shelf Linksys routers is replaced by some kid at Best Buy. Or how that 5k dollars per year for a backup external line suddenly pales in comparison to the 5k dollars per hour your organization is wasting because you were a cheap bastard.

Northrup Grumman (1)

elrous0 (869638) | more than 3 years ago | (#33393638)

Northrup Grumman already runs the U.S. military. Might as well turn over IT to them too.

Re:Northrup Grumman (1, Informative)

Anonymous Coward | more than 3 years ago | (#33393880)

Northrop Grumman is actually only the 4th largest player in defense contracting. If you want to look at the big players, look at Boeing, Lockheed Martin, and BAE.

Northrop Grumman, to some of the other contractors, is also known to be a screw-up that puts out mediocre quality work with a high price tag.

LMT, Raytheon, SAIC, and Honeywell would have all been better choices for making quality products.

Re:Northrup Grumman (0)

Anonymous Coward | more than 3 years ago | (#33394002)

Yeah, that Lunar Module they used to put men on the moon was a real piece of shit.

Re:Northrup Grumman (0)

Anonymous Coward | more than 3 years ago | (#33394106)

Lunar module, right. Got any examples that aren't pushing 40 years old?

Re:Northrup Grumman (1)

brainboyz (114458) | more than 3 years ago | (#33394182)

B2 Bomber?

Re:Northrup Grumman (1)

Amouth (879122) | more than 3 years ago | (#33394406)

Rephrase - Got any examples that aren't more than 20 years old?

Re:Northrup Grumman (1)

Wyatt Earp (1029) | more than 3 years ago | (#33394588)

USS Ronald Reagan (CVN-76) and USS George H.W. Bush (CVN-77) are two things they build that don't break easily.

Re:Northrup Grumman (0)

Anonymous Coward | more than 3 years ago | (#33394630)

The construction of your moms titanium composite dildo practically reinvented the field of tolerance engineering. That has to count for something.

They need a better network admin (4, Funny)

Nemesisghost (1720424) | more than 3 years ago | (#33393664)

Maybe they should hire Terry Childs, at least he won't let their network go down for something like this.

Re:They need a better network admin (0)

Bogtha (906264) | more than 3 years ago | (#33393856)

Who modded this insightful? The problems that Virginia are experiencing right now are caused by a single point of failure. Terry Childs, regardless of your opinions of his conviction, made himself a single point of failure. So out of every sysadmin in the entire fucking world, you picked the absolute least appropriate person for the job.

Re:They need a better network admin (2, Insightful)

Anonymous Coward | more than 3 years ago | (#33394334)

That's insane. Terry Childs failed (he was arrested and unable to make changes to the network)--and the city kept running.

Well..... (0)

Anonymous Coward | more than 3 years ago | (#33393686)

HAHAHAHHAHAHHAHHA - stupids

"This is supposed to be the best system you can buy, and it's never supposed to fail, but this one did," he said

And iv'e got a bridge for sale in San Francisco...

Re:Well..... (2, Funny)

jeffmeden (135043) | more than 3 years ago | (#33393806)

HAHAHAHHAHAHHAHHA - stupids

"This is supposed to be the best system you can buy, and it's never supposed to fail, but this one did," he said

And iv'e got a bridge for sale in San Francisco...

Throw in your city's cisco-powered WAN and I'll take it!

Redundancy (2, Funny)

CmdrPorno (115048) | more than 3 years ago | (#33393708)

Silly state, expecting to get redundancy for only $2.4 billion dollars. Don't they realize they're going to have to pay a lot more than that to get a reliable network?

Re:Redundancy (2, Interesting)

Wonko the Sane (25252) | more than 3 years ago | (#33393892)

What makes you think that the legislators expect redundancy? When that kind of money changes hands the only thing they care about is getting favors and campaign contributions.

Even funnier (3, Interesting)

SteveFoerster (136027) | more than 3 years ago | (#33394398)

As a leftover from when Virginia-headquartered AOL was the king of connectivity, you see license plates here in Virginia touting us as the Internet Capital [virginia.gov] .

Offer (1)

XanC (644172) | more than 3 years ago | (#33394536)

I'll do it for $2.3 billion!

Re:Redundancy (1)

Darth_brooks (180756) | more than 3 years ago | (#33394722)

Your sig makes that comment *that* much more hilarious.

Awful. (4, Insightful)

boneclinkz (1284458) | more than 3 years ago | (#33393716)

Our primary concern should be a complete audit of World of Warcraft server hardware, to ensure that this vulnerability does not exist in other, more vital networks.

Sorry, has to be said... (2, Funny)

Omega Hacker (6676) | more than 3 years ago | (#33393780)

I think the id10ts who pulled off this stunt are rather DIMM....

Question. (2, Insightful)

U8MyData (1281010) | more than 3 years ago | (#33393786)

Umm, so what's the point of having a SAN if it weren't redundant? Me thinks there is more to this story.

Re:Question. (1)

bswooden (1574815) | more than 3 years ago | (#33393886)

Umm, so what's the point of having a SAN if it weren't redundant? Me thinks there is more to this story.

from working with VITA on a daily basis, I can assure you there is probably not much more to the story than this. I have never seen a more disorganized bunch of clowns in my life.

Re:Question. (2, Insightful)

MightyMartian (840721) | more than 3 years ago | (#33393902)

Probably involving executives vacationing in nice tropical locales by rewarding themselves with hefty bonuses. Meanwhile some poor IT guys weren't given the budget that reflected how much the State was paying out, and had to cobble together a SAN solution, or pick the cheapest one off the shelf. The IT guys will, of course, be the patsies for this whole episode, with the CEO and CTO all huffing and puffing and vowing to State officials and lawmakers that they're doing everything they can to get to the bottom of this.

Re:Question. (1)

EmperorKagato (689705) | more than 3 years ago | (#33394018)

Even the cheapest solutions $20k to $40k range have redundancy.

It's not a SAN if it has 1 point of failure, it's just a virtual storage box or NAS. Hell they could have spent just $10k and just run a windows file server with a bunch of disks in a redundancy configuration.

I run a SAN network and you have no idea how much I'm raging over the stupidity of this incident right now.

Re:Question. (1)

Necron69 (35644) | more than 3 years ago | (#33394190)

Ditto. I tests SAN configurations for a living, and I'm stumped by this one. I'd love to know some details.

Necron69

Re:Question. (2, Funny)

jeffmeden (135043) | more than 3 years ago | (#33394370)

"What could possibly be the difference between raid0 and raid1? Come on, who would put those radio button choices so close together if they really meant opposite things!"

Re:Question. (3, Informative)

MightyMartian (840721) | more than 3 years ago | (#33394380)

Well, as Sherlock Holmes' greatest axiom goes "When you have eliminated the impossible, whatever remains, however improbable, must be the truth." Using that logic, the answer is simple. They're not using a SAN. Somewhere along the line someone is bullshitting, and my gut tells me its management. A lot of folks who get government contracts pretty much view them as an opportunity to skim off the top. Why, take what should be a $50,000 solution and mock something up for $10,000, and that's $40,000 profit.

Re:Question. (0)

Anonymous Coward | more than 3 years ago | (#33394134)

The main server boots from a Sandisk SD card, because the admin had an old one on his camera and the brass wouldn't pay for something more reliable.

Re:Question. (0)

Anonymous Coward | more than 3 years ago | (#33394486)

It's not actually all that odd to boot a server from a cheap USB key. HP blades come with a little internal USB cradle just for this purpose. If all you're doing is booting the machine into some sort of appliance E.g. a NAS controller or a VM host, the root filesystem doesn't get touched once it's booted. The integrity of the root filesystem is only an issue when you reboot the machine, which shouldn't be happening often in the above scenarios.

Re:Question. (1)

Locke2005 (849178) | more than 3 years ago | (#33394324)

You'd think they'd at least do RAID 1 Mirroring. Then they could just hot swap in another drive, sync it, and be on their merry way. Why centralize your data services if you're not going to do it right?

Re:Question. (3, Interesting)

Darth_brooks (180756) | more than 3 years ago | (#33394582)

Depends on the SAN. The article (as most tech articles are) is very short on scope & details. So "one chip" went bad. Should that bring everything to a screeching halt? The answer should be "no" but in practice we can all say that it's more often a case of "not usually." From TFA:

It was hailed as being able to suffer a failure to one part but continue uninterrupted service because standby parts or systems would take over. But when the memory card failed Wednesday, a fallback that attempted to shoulder the load began reporting multiple errors, Nixon said.

So Array Alpha shits the bed. You follow your failover procedures and start running on Array Zappa. That immediately starts throwing errors. Ok armchair QB's, let me switch to my Keeanu Reeves voice and ask "What do you do?" You built a pretty damned redundant system there and you're still down. Sure, it'd be nice if they had a backup in another DC they could fail to, but they don't. Doesn't matter, eventually you're playing the double / triple / quadruple hulled oil tanker game. Either way, Redundant SAN's aren't cheap and aren't all that easy (it's not exactly a "the bosses nephew who 'knows all about computers' set it up last weekend" level of complexity.) The TFA also has these points:

Full function may not be restored until Monday.

Experts who examined the system determined that no data were lost except for those being keyed into the system at the moment it failed, Nixon said.

Other than the fact that proofreading and the usage of proper grammar are no longer a requirements to work for a Virginia newspaper, what do those points tell us? Sounds to me like they hit the last line in the DR procedures: Restore from backup. Depending on what their backup strategy is (maybe they're splitting several terrabytes across a tape robot that only supports 200/400gig tapes because that robot is the only device the vendor supports.) and how truly important the affected system is (This may be a system where the powers that be said "fsck it, they can process renewals by hand and we'll bring everything back up on Monday after we test on Saturday") a return to business on Monday might be SOP. But that wouldn't sell newspapers (or make talking points with the voters...) now, would it?

Maybe there was a major screwup here. Maybe they never tested their failovers and maybe that 2nd SAN was bad out of the box. I'm a little more willing to cut some slack and say "man, that sucks. Glad it's not my ass on the line." Karma's a bitch like that. I like to take these stories as an opportunity to rethink my own single points of failure are rather than point & laugh and tell everyone how I'll never lose and data because it's I'm running RAID 5......

Re:Question. (1)

geekoid (135745) | more than 3 years ago | (#33394600)

Her is an educated guess:

When getting the bid, NG promised redundancy.
NG stalled and then was behind schedule.
the redundancy system became less 'important' due to time
NG went live
NG let a bunch of contractors go
NG says there in house staff will take care of it.
NG new hires get stuck at the end of the project, do enough to consider it 'done'. Several amateur mistakes were made.

What happening right now:
People who work for the state IT are showing everyone the email the they got from NG saying the system was redundant.
They then show all the emails clearly detailing why it wasn't.
politician don't want to blame anyone that they may have to deal with give lip service until American Idol is back on.

Safely Travel in VA (0)

Anonymous Coward | more than 3 years ago | (#33393854)

Excellent, I guess this means I'll be able to safely travel through Virginia without risking getting picked up on all my outstanding warrants.

Money (0)

Anonymous Coward | more than 3 years ago | (#33393872)

The only thing I can think of is that they decided it cost too much money. This is the problem with letting penny-counters make these decisions. "Oh, this one costs a fraction as much, and they're pretty much all the same. Right???"

When are people going to stop trusting business people for technical decisions? When are they going to figure out that they hired us for our knowledge, and not to just push buttons? We don't talk about backups and failovers just to sound cool. We're trying to save their butts from a meltdown like this. My advice is that if you're in a position that you have a penny-counter telling you what to buy, then just point at this story to give your opinion more weight -- especially if you've been trying to tell them for years or something.

I guess I'm lucky that I have a boss who used to work in IT, and so she gives my opinion a lot more weight than most supervisors do. We have several redundant backups, and we have two servers that can each pick up the slack of the other at a moment's notice (it's not that big a network). Not the best solution, but far better than the State of Virginia, apparently. We've already had a couple of hiccups that this arrangement worked great through. The users didn't even notice.

I'm not saying this to brag. We have a non-profit-sized budget (read: shoestring budget). If we can do it on our budget, then so should a US state.

Re:Money (1)

PaulIsTheName (1646771) | more than 3 years ago | (#33394146)

When are people going to stop trusting business people for technical decisions?

The moment tech people accept that taking risk of system failure to save cost is an acceptable business decision sometimes. I agree that this story proves that you need reasonable risk assessment to do that.

We don't talk about backups and failovers just to sound cool.

Yes we do. Too.

Single points of failure... (1)

forkfail (228161) | more than 3 years ago | (#33393882)

.... rrrr bad, m'kay?

Typical liberal overreaction (5, Funny)

BitHive (578094) | more than 3 years ago | (#33393930)

Guys, accidents happen. This "Northrop Grumman", whoever they are, will no doubt be fired and not receive any more contracts once word of this gets out. This will put pressure on them to provide better services, or be out-competed by other entrepreneurs. Our free market system works, you just need to expect this kind of thing when it's government doing the hiring.

Re:Typical liberal overreaction (1)

Ben Chu (24542) | more than 3 years ago | (#33394054)

How is not incorporating basic redundancy into your SAN an "accident"?

Re:Typical liberal overreaction (1)

Fjandr (66656) | more than 3 years ago | (#33394112)

Wooooooooosh!

Re:Typical liberal overreaction (1)

Mr.Intel (165870) | more than 3 years ago | (#33394176)

Woooooosh!

Re:Typical liberal overreaction (0)

Anonymous Coward | more than 3 years ago | (#33394188)

They put in the bid for a non-redundant network. They won the bid and began building. Some people got worried about the non-redundant thing asked how much it would cost to add redundancy and got quoted a huge number.

The system was born a clusterfuck. Still a clusterfuck. Can never be changed from a clusterfuck. Cultural forces beyond my comprehension insist on a Clusterfuck.

Typical Republican Corruption (0)

jedidiah (1196) | more than 3 years ago | (#33394102)

> Guys, accidents happen. This "Northrop Grumman", whoever they are, will no doubt be fired
> and not receive any more contracts once word of this gets out. This will put pressure on
> them to provide better services, or be out-competed by other entrepreneurs. Our free market
> system works, you just need to expect this kind of thing when it's government doing the hiring.

What? Are you joking? Do you even know who these people are?

At worst they will get a pat on the back after this. They are
an incestuous government contractor. That's why they got this
job and someone else didn't to begin with. The real IT outfits
can't because the great advantage that legacy players have here.

Re:Typical Republican Corruption (1)

oodaloop (1229816) | more than 3 years ago | (#33394230)

Yes, he was joking, I there's a whoooosh around here somewhere for you.

Re:Typical Republican Corruption (0)

Anonymous Coward | more than 3 years ago | (#33394268)

Actually, even though this sub-contract has been enthusiastically supported by Republicans all along the way, the original privatization of the Virginia IT infrastructure was spear-headed by Democratic governor Mark Warner [blogspot.com] (now the senior senator from Virginia), and has been supported just as enthusiastically by Democrats. Also, it will be very hard to "crack the whip" on Northrop Grumman, since the present Virginia administration bent over backwards to get NG to locate their world headquarters in Virginia rather than Maryland, both of which are near Washington DC, where the real money lives, as opposed to the chump change 2.4 billion contract with VA. "Incestuous" in this case is far too mild a word for what is actually going on.

Re:Typical Republican Corruption (0)

Anonymous Coward | more than 3 years ago | (#33394388)

Wait, what color is this state again? How about it being political corruption, not Republican corruption. Last I checked the Democrats had more lead news stories over corruption recently than Republicans, and that's saying something. If there's a politician involved, it's probably corrupt or on it's way to being corrupt.

Note: I'm not Democrat or Republican, I just hate the idiotic pot calling the kettle black BS.

Re:Typical liberal overreaction (1)

plbowler (813973) | more than 3 years ago | (#33394158)

uhhhhhhhh This is a joke right? If you really don't know who they are, then I can understand why you think they are at risk of losing business over this. And on what planet does the U.S. operate in a free market system? wow

Re:Typical liberal overreaction (0)

Anonymous Coward | more than 3 years ago | (#33394160)

How do you know Grumman didn't already advise them of the problem and was ignored?

Re:Typical liberal overreaction (1)

EmperorKagato (689705) | more than 3 years ago | (#33394256)

How do you know Grumman didn't already advise them of the problem and was ignored?

I bet you the warning was brushed aside several times.

Re:Typical liberal overreaction (1)

idiotnot (302133) | more than 3 years ago | (#33394618)

They get all defensive (no pun intended) when you point out that this was one of Mark Warner's crowning achievements as governor......

Re:Typical liberal overreaction (1)

mounthood (993037) | more than 3 years ago | (#33394620)

Guys, accidents happen. This "Northrop Grumman", whoever they are, will no doubt be fired and not receive any more contracts once word of this gets out. This will put pressure on them to provide better services, or be out-competed by other entrepreneurs. Our free market system works, you just need to expect this kind of thing when it's government doing the hiring.

The problem is that it's the government selecting the vendor. If the government would just get out of the vendor-hiring-business maybe the Free Market could fix this mess.

Re:Typical liberal overreaction (0)

Anonymous Coward | more than 3 years ago | (#33394730)

Given Northrop Grumman's history with the state of Virginia, they will probably be rewarded with an extension to their contract and a bonus, rather than being fired.

Work in DMV (0)

Anonymous Coward | more than 3 years ago | (#33394048)

I work in the DMV with each jurisdiction, it is sad but Virginia is head and shoulders above Maryland and DC. Maryland's access to criminal records goes down weekly for extended periods. DC has been working to update their system to NCIC 2000 standards for 10 years. Virgina has put in more money then either jurisdiction and usually they are the most coordinated.

Re:Work in DMV (1)

EmperorKagato (689705) | more than 3 years ago | (#33394282)

Meanwhile in the 49 other states....

Ok, this really sucks!!!!!!! I know why and can (1)

Anon-Admin (443764) | more than 3 years ago | (#33394132)

not say. The F***Ing NDA stops me from saying anything about the stuff I saw in NGC's IT.

Well, I guess I can say it is BROKE NOW and you have to fix it. Told you so!

Re:Ok, this really sucks!!!!!!! I know why and can (1)

Ironhandx (1762146) | more than 3 years ago | (#33394454)

NDAs are such a bitch.

I think you should talk to Julian Assange at Wikileaks so that those of us that want the juicy details can get them.

P.S. Theres a fat unmarked manila envelope in it for you. We all chipped in. Its a really nice envelope.

Re:Ok, this really sucks!!!!!!! I know why and can (1)

NeutronCowboy (896098) | more than 3 years ago | (#33394488)

There's a Post Anonymously button for that reason. Given the state of their IT department, I doubt they'll be able to figure out who broke their NDA, even if police manages to give them an IP.

Re:Ok, this really sucks!!!!!!! I know why and can (1)

Wyatt Earp (1029) | more than 3 years ago | (#33394634)

This. Go down to a Starbucks post anonymously and dish.

Their network is down, 2.4 billion dollars had everything running through a single DIMM in a netgear box they got off Ebay, they won't figure out who it was.

Northrop Grumman? Thats why... (1)

Nadaka (224565) | more than 3 years ago | (#33394192)

My company works on a project that N G lost on a re-compete bid. I can not go much into details, but suffice it to say: I am not at all surprised that they screwed up maintenance and management based on what I have had to deal with on the software they developed.

Northrup Grumman (1)

fermion (181285) | more than 3 years ago | (#33394234)

This is what you get for hiring a military contractor to do a civilian persons job. All 2.5 billion gets you in the military is a manger and toilet seat. You don't start getting functional hardware until the budget reaches 100 billion.

What brand? (1)

CambodiaSam (1153015) | more than 3 years ago | (#33394244)

Anyone know what brand of SAN went down? My company had a similar issue where our SAN had a major outage, and the vendor claimed it was "an error that never happens, we swear".

Re:What brand? (0)

Anonymous Coward | more than 3 years ago | (#33394580)

I have a confession: I slept with your wife. But it NEVER happens, I swear! You believe me, right? Surely you can't be mad until there are *two* smoking guns...

Northrop hiring event (1)

confused one (671304) | more than 3 years ago | (#33394308)

Funny that I should receive an email today inviting me to a Northrop Grumman Information Systems Hiring Event. The event occurs on the 25th of August and I received the email on the afternoon of the 27th. Failed there too!

Who is the vendor? (0)

Anonymous Coward | more than 3 years ago | (#33394336)

I wonder if they are using 3PAR.

It's not always the bureaucracy (1)

roc97007 (608802) | more than 3 years ago | (#33394462)

Ok, in this case it probably is the bureaucracy at fault. But it isn't in all cases. In my previous job we had an architect who would take it upon himself to "value engineer" a vendor's solution, with unpredictable results. I'm not sure why -- we had budget. Maybe it was his way of seeming more valuable? This led to "solutions" like a SAN cobbled together from disk arrays, controllers and switches from three different vendors that were not meant to work together, had never been tested in the chosen configuration, and had to be integrated and maintained in-house. Word rapidly got around that if you wanted reliable access to your data, you didn't put it on the corporate SAN.

What I don't fully understand is how NG could get what amounts to a quarter billion dollars a year to manage the state's IT infrastructure and still allow a situation like this to occur. I mean, I understand how it can HAPPEN, I don't understand why it's allowed to. Over and over again I've seen companies who have outsourced their infrastructure enter into a "battered wife" relationship with the vendor, lacking anyone with the authority, cojones and understanding to bring the vendor to heel and get the uptime they've paid for. Instead corporate IT management will often enter into a dark relationship with vendor sales management to spin downtime to the stockholders as teething issues, inadequate documentation, out of scope, or some other hand-waving to explain why the savings from outsourcing has been more than offset by loss of revenue, IT management essentially working for the vendor while drawing a paycheck from the company. But don't get me started...

Hardware fails, Salesman fail and budgets=tight (1)

cjdavis618 (1809874) | more than 3 years ago | (#33394684)

While there is more to this story than meets the eye, there is no excuse for not having redundancy if you are a state body. It could be a case that there is a backup of the data and maybe that the needed parts to fix the issue are not availible yet or haven't been delivered. Nevertheless, without details of the infrastructure, we cannot jump to conclusions of why this happened. Most contracting companies like that clearly state that they will design the system to run and build it, but backup managment is the responsibilty of the customer. That is the normal CYA tactic. We don't know that NG was even tasked with the Backup or redundancy. I will be looking for more of this to come to light in regards to the actual cause and resolution.
Load More Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Create a Slashdot Account

Loading...