Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

More Uptime Problems For Amazon Cloud

Soulskill posted more than 2 years ago | from the stormy-weather dept.

Cloud 183

1sockchuck writes "An Amazon Web Services data center in northern Virginia lost power Friday night during an electrical storm, causing downtime for numerous customers — including Netflix, which uses an architecture designed to route around problems at a single availability zone. The same data center suffered a power outage two weeks ago and had connectivity problems earlier on Friday."

cancel ×

183 comments

Sorry! There are no comments related to the filter you selected.

Cloud takes down cloud (5, Funny)

AlienIntelligence (1184493) | more than 2 years ago | (#40505785)

Nuf said

Largest non-hurricane related power outage ever (5, Informative)

Anonymous Coward | more than 2 years ago | (#40505797)

I live in the affected area and that's what they're saying. May take 7 days for the last person to have their power restored.

Re:Largest non-hurricane related power outage ever (5, Interesting)

jrmcferren (935335) | more than 2 years ago | (#40505897)

That really shouldn't matter though as long as the Data center's generators are running and they can get fuel. It seems that they are not performing the proper testing and maintenance on their switchgear and generators if they are having this much trouble. The last time the data center in the building where I work went down for a power outage was when we had an arc flash in one of the UPS battery cabinets and they had to shut the data center (and the rest of the building's power for that matter) down.

Re:Largest non-hurricane related power outage ever (4, Insightful)

John Bresnahan (638668) | more than 2 years ago | (#40505939)

Of course, the network only works if every router in between the data center and the customer has power. In a power outage of this size, it's entirely possible that more than one link is down.

Re:Largest non-hurricane related power outage ever (1)

Anonymous Coward | more than 2 years ago | (#40505989)

If an individual customer has a power outage AND a failure of their backup power, that's not Amazon's fault. As far as who to blame, I highly doubt that EVERY network provider had both failures in grid power AND failures in their backup systems. So this is likely still Amazon's fault.

The problem is that a lot of people cheap out on their backup power. Generators and UPSes are expensive. In a generating system, often the most common single point of failure is the automatic transfer switch. It's quite possible that they had a single generator feeding both A and B sides of power in all or part of the data center - the failure of the transfer switch (or the failure of the generator itself coming on) would cause total loss of power after the UPS(es) drained their batteries.

Re:Largest non-hurricane related power outage ever (5, Informative)

jrmcferren (935335) | more than 2 years ago | (#40506039)

The automatic transfer switch(es) would be the first component I would check even without knowing anything. In order to maintain the UL listing on the transfer switch, it must be tested monthly. The idea is, if it is tested monthly, everything is operated and is less likely to seize and fail than if the device is not tested. Modern systems can be designed that the generators can start BEFORE the transfer switch operates when in test mode to reduce the impact of the test (miliseconds without power versus 30 seconds or so).

Re:Largest non-hurricane related power outage ever (2)

ILongForDarkness (1134931) | more than 2 years ago | (#40506049)

I don't know if the state or even just the city is without power it is quite possible the ISPs are borked in the area. After all why bother with too much redundancy if you customers don't have power for their computers than they aren't using the internet anyways. Then Amazon plops down a 200M datacentre in town and ... shit happens.

with cable the nodes need power and there batterie (1)

Joe_Dragon (2206452) | more than 2 years ago | (#40506089)

with cable the nodes need power and there batteries will run down and then the cable co needs to have on site portable generators at the nodes with no power.

The phone systems have RT (less of them then cable systems) that are the same way.

Re:with cable the nodes need power and there batte (1)

ILongForDarkness (1134931) | more than 2 years ago | (#40506371)

Why exactly would a cable operator bother with backup power? I mean if the neighborhood has now power than people aren't running T.V.s or computers (unless laptops but still their modem would be down). It is probably a different beast with something the size of a Amazon datacentre though, they probably can go to the ISP and say "hey look we'll by 5M a month of internet for you but we need redundancy. Piss on all your home users for all we care but we get internet no matter what.".

Re:with cable the nodes need power and there batte (1)

Joe_Dragon (2206452) | more than 2 years ago | (#40506533)

well there are long runs from the headend to the each neighborhood so some area may have power but hours later the cable goes not as the lines pass though areas that don't have power.

Re:with cable the nodes need power and there batte (2)

Alex Zepeda (10955) | more than 2 years ago | (#40506681)

Why exactly would a cable operator bother with backup power?

Because that cable operator also provides phone service.

Re:Largest non-hurricane related power outage ever (3, Interesting)

fuzzyfuzzyfungus (1223518) | more than 2 years ago | (#40506119)

The problem is that a lot of people cheap out on their backup power. Generators and UPSes are expensive.

I wonder, in comparing the price/performance numbers on the invoices from Dell and the invoices from APC(hint, one of these has Moore's law at its back, the other... Doesn't.) what it would take in terms of hardware pricing and software system reliability design to make these backup power systems economically obsolete for most of the 'bulk' data-shoveling and HTTP cruft that keep the tubes humming...

Obviously, if your software doesn't allow any sort of elegant failover, or you paid a small fortune per core, redundant PSUs, UPSes, generators, and all the rest make perfect sense. If, however, your software can tolerate a hardware failure and the price of silicon and storage is plummeting and the price of electrical gear that is going to spend most of its life generating heat and maintenance bills isn't, it becomes interesting to consider the point at which the 'Eh, fuck it. Move the load to somewhere where the lights are still on until the utility guys figure it out.' theory of backup power becomes viable.

Re:Largest non-hurricane related power outage ever (2)

GPLHost-Thomas (1330431) | more than 2 years ago | (#40506471)

Data center redundancy isn't "cheap" to write for a complex software. So you got risk a lot of money per hours of down time to invest in that. I don't think that's something a lot of companies can afford, unless they start their software design with this in mind to begin with. So the problem to me, is that data center redundancy is often an after though, and IaaS hardly has easy answers to this problem yet.

Re:Largest non-hurricane related power outage ever (1)

turbidostato (878842) | more than 2 years ago | (#40506527)

"So the problem to me, is that data center redundancy is often an after though, and IaaS hardly has easy answers to this problem yet."

It won't. For a very basic physical reason: it's always cheaper to move data near than far away. If you have a given piece of data in one place you either will lose it if that place goes nuts or you will need to go expensive to make sure such data piece is replicated out of that place fast enough.

IaaS can help comoditizing compute and storage resources but has nothing to offer with regards to move data cheaply from place A to place B and not all business (not even a minority) have the luck of managing mostly low value (i.e. Google) or read-only (i.e. Netflix) data.

Re:Largest non-hurricane related power outage ever (1)

TubeSteak (669689) | more than 2 years ago | (#40506823)

it becomes interesting to consider the point at which the 'Eh, fuck it. Move the load to somewhere where the lights are still on until the utility guys figure it out.' theory of backup power becomes viable.

The answer mostly depends on the cost of downtime for you.
The real problem is getting your (customer) data to the same place as your failover solution.
Some websites generate enormous amounts of data and it's not trivial or cheap for them to constantly keep it backed up at another data center.
A station wagon full of hard drives is still faster than any link 99% of us could afford

Re:Largest non-hurricane related power outage ever (3, Informative)

Salgak1 (20136) | more than 2 years ago | (#40506697)

Well, as of current reports. . . . 2.5 million are without power in Virginia [foxnews.com] , 800 Thousand in Maryland [chicagotribune.com] , 400+ thousand in DC [wtop.com] . I've seen numbers in the 3.5 million region between Ohio and New Jersey. We got power back early this morning ~0400, but we STILL don't have phone, net, or cable at home. The real question, since some areas in DC Metro are not supposed to get power back for nearly a week is. . . . do the emergency fuel generators have sufficient fuel bunkers ???

Re:Largest non-hurricane related power outage ever (2)

thePowerOfGrayskull (905905) | more than 2 years ago | (#40506559)

But then the question must be asked...

[queue Psycho screeching violins]
How are you posting this now!

Re:Largest non-hurricane related power outage ever (0)

Anonymous Coward | more than 2 years ago | (#40506687)

Largest ever must be qualified somehow. At the least https://en.wikipedia.org/wiki/North_American_ice_storm_of_1998 was bigger.

Infrastructure (5, Insightful)

TubeSteak (669689) | more than 2 years ago | (#40505807)

We need to invest trillions in roads, water, and electrical infrastructure to keep this country going.
If you let the basic building blocks of civilization rot, don't be surprised when everything else follows suit.

Re:Infrastructure (4, Insightful)

rubycodez (864176) | more than 2 years ago | (#40505935)

war is the basic building block of our particular civilization. if we waste money on your frivolities, how will we afford war & keep war machine shareholder value?

Re:Infrastructure (1, Insightful)

Anonymous Coward | more than 2 years ago | (#40506073)

Governments don't engage in war to make sure bullets sell. They engage in war to gain control of the natural resources the other country has.

The distinction is subtle, but significant.

Re:Infrastructure (2, Insightful)

Anonymous Coward | more than 2 years ago | (#40506321)

I would say Laos would argue otherwise... The most bombed country in the world because America felt like it and had a lot of extra stock! Oh and they were officially a neutral country.

GO USA!

Re:Infrastructure (1)

tukang (1209392) | more than 2 years ago | (#40506465)

(Defense and energy) Companies get governments to engage in war to make sure bullets sell and to gain control of the natural resources the other country has.

Re:Infrastructure (2)

AliasMarlowe (1042386) | more than 2 years ago | (#40506817)

They engage in war to gain control of the natural resources the other country has.

The distinction is subtle, but significant.

Tell us again what natural resources the US wished to control when it engaged in war against Grenada [wikipedia.org] in 1983, or when it engaged in war against Panama [wikipedia.org] in 1989, or when it engaged in war against Afghanistan [wikipedia.org] starting in 2001.

There are many reasons for one state to go to war against another. Gaining control of natural resources is only one (e.g. Iraq's invasion of Kuwait [wikipedia.org] ), and is not the commonest.

Re:Infrastructure (1)

Anonymous Coward | more than 2 years ago | (#40505981)

It seems like many of these jumbo datacenters built to support the top web sites are located in rural areas chosen for the ability to minimize costs (real estate acquisition, taxes, energy, cooling). Surprise... they may be more vulnerable to tornadoes and more isolated from repair crews, compared to an in-campus data center.

Re:Infrastructure (0)

Anonymous Coward | more than 2 years ago | (#40506101)

Being in a rural area does not make you statistically more likely to be hit by a tornado.. Tornadoes don't have any sort of inborn preference. Tornado danger is a function of geography, not population density.

The only drawback of being in the sticks is it is harder to access multiple power feeds. A good data center will have at least two feeds coming in from different directions. You can still do it, but it costs more since those power lines are being run just for YOU.

Quick action from power repair crews is not really an issue. If your generators are maintained and functioning properly, they should be able to run for weeks with a steady supply of fuel. Barring a major disaster that inhibits access to refuel your generators or a much bigger regional catastrophe, it's a non-issue.

Re:Infrastructure (1)

turbidostato (878842) | more than 2 years ago | (#40506489)

"Being in a rural area does not make you statistically more likely to be hit by a tornado.. Tornadoes don't have any sort of inborn preference. Tornado danger is a function of geography, not population density."

You can't be so dense, can you? Do you think that being a tornado area might have something to do with people avoiding such a place -specially given that due to needed geography, tornado areas tend to be in the middle of nowhere?

"The only drawback of being in the sticks is it is harder to access multiple power feeds [...] You can still do it, but it costs more since those power lines are being run just for YOU."

So you are going to expend a big chunk of the savings of placing your datacenter in the middle of nowhere with the recurring costs of an utilitily that you rarely if ever will need for a service that *on purpose* relies on multiple placements to be able to serve out of out-of-the-mill hardware and capabilites.

"If your generators are maintained and functioning properly, they should be able to run for weeks with a steady supply of fuel."

Which is easier to say than do when a) your site is in the middle of nowhere and b) the "steady supply of fuel" crew is diverted to hospitals, banks, and other important places *not* in the middle of nowhere.

It's is the cloud, you fool! Why do you think companies like Amazon expend a lot to be able to offer to you a *distributable* service? If your service is minor and you can't re-deploy it on another datacenter in some few hours, you are doing it *WROOOONG*. If your service is producing a lot of money 24x7 and you can't reroute on the fly out of a failing datacenter, you are doing it *WROOOONG*. In the end, if you believe the weasels that sold you that having a virtual private server (or a few) in a (unnamed) datacenter will magically protect you from a failure in that (unnamed) datacenter just because "it's a cloud provider", you are doing it *WROOOONG*.

Re:Infrastructure (2, Insightful)

Anonymous Coward | more than 2 years ago | (#40506423)

Dude, if you think a datacenter in Northern Virginia was plopped down here because of the insanely attractive price of real estate or energy, or because of the business-friendly tax rates you're out of your freaking mind. Datacenters are built here because of pre-existing backbone access. Period.

Re:Infrastructure (0)

Anonymous Coward | more than 2 years ago | (#40506009)

What sort of public investment do you think we need to make in the power grid to prevent this type of situation from happening in the future?

Should we be investing in hardened power lines that can stand tornadoes and large trees falling on them? I'm not exactly sure what you're looking for here.

The blame for this outage falls squarely on Amazon. Did they have multiple feeds from the grid? Do they have multiple generators? Multiple UPSes? Do they test their backup systems regularly and thoroughly?

Chances are Amazon made some kind of call to save money and cheaped out somewhere in their electrical facilities.

Re:Infrastructure (0)

Anonymous Coward | more than 2 years ago | (#40506443)

...Should we be investing in hardened power lines that can stand tornadoes and large trees falling on them?...

Yes. They're called underground utilities. Tree falling = who the hell cares. Tornado = so what. (OK, maybe an F5 would dig up a little real estate, but aside from that... I live in a neighborhood in NoVA with underground utilities and my only power interruptions for the last 20 years have been 100% based on an above-ground failure somewhere upstream from me.

Re:Infrastructure (1)

DarkTempes (822722) | more than 2 years ago | (#40506669)

I live in a hurricane prone area. In my experience with massive power outages like this it's typically high voltage transmission towers going down.
It's not really economical to bury those.
Fixing something like this [nola.com] is apparently not easy and takes time.

Re:Infrastructure (1)

Anonymous Coward | more than 2 years ago | (#40506043)

Hell yea! US should privatize the infrastructure including maintaining roads, electricity and internet. Down with the pot holes, down with the evil socialists in Europe who manages to do these things cheaper and more affordable using their communist-era ideology.

Re:Infrastructure (0)

Anonymous Coward | more than 2 years ago | (#40506069)

we also need to invest trillions in gas powered dildos but you dont hear the legislators complaining, do you?

Re:Infrastructure (0)

roman_mir (125474) | more than 2 years ago | (#40506921)

You can't invest into shit worth of infrastructure if you don't produce anything that can pay for that expense, and all of your credit is used to buy wars and also foreign made goods.

Until you restructure the debt and get gov't out of business of regulating business and doing all of this stuff (including wars, infrastructure, business regulations, all the nonsense that has been going on for over 100 years now), you won't have any new infrastructure that will make any sense.

Oh, sure, you can have gov't come up with work projects, but none of them will be sustainable and useful, they will put you more into debt and won't give you any competitive advantage since that infrastructure won't be built to satisfy real demand, only to have gov't create more make shift work and spend more.

Seems like anything takes down the cloud... (5, Interesting)

Anonymous Brave Guy (457657) | more than 2 years ago | (#40505809)

It seems that recently, anything can take down the cloud, or at least cause a serious disruption for any of the major cloud providers. I wonder how many more of these it takes before the cloud-skeptics start winning the debates with management a lot more often.

You can only argue that the extra costs and admin involved with cloud hosting outweigh the extra costs of self-hosting and paying competent IT staff for so long. If you read the various forums after an event like this, the mantra from cloud evangelists already seems to have changed from a general "cloud=reliable, and Google's/Amazon's/whoever's people are smarter than your in house people" to a much more weasel-worded "cloud is realiable as long as you've figured out exactly how to set it all up with proper redundancy etc." If you're going to pay people smart enough to figure that out, and you're not one of the few businesses whose model really does benefit disproportionately from the scalability at a certain stage in its development, why not save a fortune and host everything in-house?

Re:Seems like anything takes down the cloud... (1)

Anonymous Coward | more than 2 years ago | (#40505853)

Storms bring down clouds all right. It rains and everyone is miserable, except the greenies. :P

Cloud computing brings availability to the "small guys". It also allows for quick scalability. You can't really accomplish similar things in-house unless you use 100s of servers, but then you have logistical issues as you have to ship these servers all over the place. If you just have one data drop to your 100 servers at one location, guess what? Your infrastructure is no better than hosting everything with a 3rd party in one data center.

Re:Seems like anything takes down the cloud... (2)

Anonymous Brave Guy (457657) | more than 2 years ago | (#40505933)

Cloud computing brings availability to the "small guys". It also allows for quick scalability. You can't really accomplish similar things in-house unless you use 100s of servers

Sure, but probably 99% of small businesses don't actually need to scale that fast, or anywhere close. The cloud hosting proposition for most (not all, but most) small businesses is an appeal to wishful thinking, like the bank guy who tells you how they can give you a starter current account today, but they do have several tiers of service and once you're making over 10,000,000 in a year you'll have a dedicated account manager available to make you a coffee any time you want one.

Re:Seems like anything takes down the cloud... (1, Interesting)

Anonymous Coward | more than 2 years ago | (#40505881)

You realise that this took out one data center? That is, all of those other AWS data centers are working still just fine? If anything, this is proving the reliability of cloud providers!

Why not save a fortune and host everything in-house?

You really think hosting your own hardware in your own data centers spread across the world will save you a fortune? Have you even bothered to run those figures?

Even if you have more money than sense, once you've got your hardware spread across the globe, you've still got to build the systems on top to survive an outage in one of them I.e. exactly what you have to do if you use a cloud provider anyway. So what have you saved, precisely?

Re:Seems like anything takes down the cloud... (1)

Anonymous Brave Guy (457657) | more than 2 years ago | (#40506003)

You realise that this took out one data center? That is, all of those other AWS data centers are working still just fine?

Well, OK then, next time I'll just tell all of those people who can't use their home-grown Heroku-based apps for a few hours to go watch a movie on Netflix instead. It's probably just the little guys who got in trouble on this one, and it's their own dumb fault for not setting up more than one AZ or using different regions or something. Oh, no, wait, loads of people couldn't watch the movie either, and Netflix are HUGE AWS customers with an army of people to maintain a redundant infrastructure.

You really think hosting your own hardware in your own data centers spread across the world will save you a fortune?

False dichotomy. Most on-line businesses don't need redundant access in data centres all over the world to avoid a problem like this. Having a primary and a stand-by in different geographic locations would have done just fine, and we've been doing that since long before the marketing people invented terms like "cloud computing".

Have you even bothered to run those figures?

Several times and for multiple businesses. Have you?

Re:Seems like anything takes down the cloud... (0)

Anonymous Coward | more than 2 years ago | (#40506047)

So your argument is: Netflix fucked up, so cloud is shit? Brilliant.

Several times and for multiple businesses. Have you?

Yes. Cloud is usually cheaper and easier at small to medium scale I.e. the vast majority of use cases.

Re:Seems like anything takes down the cloud... (1)

Anonymous Brave Guy (457657) | more than 2 years ago | (#40506147)

So your argument is: Netflix fucked up, so cloud is shit?

No, my argument is that saying this only affected one AWS data center and people elsewhere are fine is clearly not the whole story.

Cloud is usually cheaper and easier at small to medium scale

Cheaper and easier than what? Cloud technologies are basically useful for two things: outsourcing hardware and staff resources so you can adapt to very fast changes in the level of requirements, and being a glorified CDN. What proportion of small/medium businesses ever need to scale so fast that doing it in-house is impractical, or need the generalised capabilities of services like Amazon's rather than a straight-up CDN provider like Akamai?

Re:Seems like anything takes down the cloud... (1)

MrBandersnatch (544818) | more than 2 years ago | (#40506107)

"Several times and for multiple businesses. Have you?"

I'd actually be interesting in hearing your analysis and experience. I'm looking at this myself and finding that cost advantages differ depending on scenario - there just doesn't seem to be a clear cut point at which one solution costs less than the other for all but the most trivial scenarios.

Re:Seems like anything takes down the cloud... (1)

Anonymous Brave Guy (457657) | more than 2 years ago | (#40506495)

OK. Obviously I'm posting pseudonymously so I can't give a lot of specifics, but FWIW...

I agree that this isn't a straightforward question, and I think one big problem is that people sometimes start by assuming a false dichotomy: either we're hosting in the cloud or we're kitting out a whole new server room. In reality, there is a broad scale to consider, with all kinds of managed hosting and colo options where a lot of the sysadmin overhead can be outsourced but you basically get to use real hardware with proper root access at a much more sensible cost-per-resource-unit than any cloud hosting provider is going to offer.

For a lot of small/medium sized businesses (anyone who is going to run Netflix 2 successfully doesn't need advice from me ;-)) the sweet spot seems to be somewhere in the middle. If you can find a service provider with geographically diverse hosting facilities and sensible connectivity, you can either lease machines from them or buy your own and use their colo services, and basically make the hosting service into your on-site IT people. If you're just starting out and don't have dedicated IT people yet, a lot of these services will also offer basic sysadmin support for a nominal fee, to help with installing/patching your OS or standard cloned images, and setting up things like firewalls, load balancing, database replication, distributed filesystems and all that stuff that you probably don't care about if you're trying to build a new service that actually does something useful.

The key thing seems to be finding a host who will let you outsource the mundane stuff that you would do via a console in a cloud-based system -- chances are that's basically what they've set up on their own systems anyway -- but keeping the increased flexibility and lower cost-per-resource-unit of leasing/buying your own dedicated hardware with real root access. This approach seems to work pretty well up to a scale of dozens/hundreds of machines, as long as your resource needs grow reasonably predictably or you an afford a day or two to catch up in the event of an unexpected spike.

Of course if you need a global CDN then you're probably not going to beat a real CDN provider this way, but you can combine that with some sort of managed hosting/colo arrangement in various sensible ways. And if you really do need to scale up and down within a matter of minutes/hours, perhaps because your service has wildly different usage patterns at different times of day, then probably Amazon-style cloud hosting is your only viable option without spending a fortune on hardware you won't be using efficiently.

Not sure if that just repeats things you already figured out, but I hope it helps.

Re:Seems like anything takes down the cloud... (1)

lucifuge31337 (529072) | more than 2 years ago | (#40506519)

"Several times and for multiple businesses. Have you?"

I'd actually be interesting in hearing your analysis and experience. I'm looking at this myself and finding that cost advantages differ depending on scenario - there just doesn't seem to be a clear cut point at which one solution costs less than the other for all but the most trivial scenarios.

Because it really depends on the business and the application. It also depends on how much bandwidth you use and if you have geographical limitations which would make accessing that bandwidth more costly in one or more locations.

If you are in it for the long haul, why not have control over your own cheap commodity machines and "scale into the cloud" for overages until you acquire more hardware? Then you can actually hav control of those little things that let you switch between datacenters easily like.....you know, your BGP and other trivial things like that.

There's definitely no one sized fits all for this, but the bulk of the statrups I see that are cloud based appear to be 1.) a bunch of developers first and foremost, so not data center or network engineers at all and 2.) not capitalized well enough in the beginning to be able to afford leasing and equipping space in multiple data centers. And there's nothing at all wrong with that. It's a valid choice if you recognize the reality of what you are buying rather than believing the marketing hype hook line and sinker.

How many private clouds went down? (1)

Anonymous Coward | more than 2 years ago | (#40506151)

Amazon is a huge target - but how many other data centers went down in the Virginia area also? Did they come back up as fast as Amazon?

And Netflix is an Amazon Cloud customer... What's the matter with them? Are they just too dumb to host in house?

Re:Seems like anything takes down the cloud... (1)

girlintraining (1395911) | more than 2 years ago | (#40505883)

It seems that recently, anything can take down the cloud,

It wasn't just anything that took down the cloud: it was another cloud.

Re:Seems like anything takes down the cloud... (3, Interesting)

tnk1 (899206) | more than 2 years ago | (#40505889)

And this is ridiculous. How are they not in a datacenter with backup diesel generators and redundant internet egress points? Even the smallest service business I have worked for had this. All they need to do is buy space in a place like Qwest or even better, Equinix and it's all covered. A company like Amazon shouldn't be taken out by power issues of all things. They are either cheaping out or their systems/datacenter leads need to be replaced.

Re:Seems like anything takes down the cloud... (2)

girlintraining (1395911) | more than 2 years ago | (#40505901)

How are they not in a datacenter with backup diesel generators and redundant internet egress points?

Something about maximizing profits... by cutting corners... perhaps.

it seems like the switching system failed (3, Informative)

Joe_Dragon (2206452) | more than 2 years ago | (#40505973)

it seems like the switching system failed and or the back up power generators did not kick on.

Maybe natural gas ones are better. The firehouses have them. I also see them at a big power sub station as well.

Re:it seems like the switching system failed (1)

tnk1 (899206) | more than 2 years ago | (#40506109)

While failure of the backup systems is a possibility (just look at Fukushima), the backup systems are usually fairly redundant and tested as well. I know most datacenters I have been in test their generators periodically, something like every month or two. Unless there's a fairly large natural disaster, or someone sets off a very large bomb, backup power should be available for at least 24-48 hours. At that point, things could start breaking down because you have to start getting fuel shipped in, but after last night's storm, power should be have been up in a matter of less than an hour to those sites. It's not like Virginia is serviced by Pepco.

Re:it seems like the switching system failed (1)

Deekin_Scalesinger (755062) | more than 2 years ago | (#40506659)

Hehe - normally I'd agree, but Pepco did all right last night as far as am concerned. I flickered for about 2 seconds last night, but I'm in the downtown Capitol Wastelands - I don't know what grid I'm on but it seems to be a good one. Oh - and yay for personal UPSes! They did what they should have done.

Re:it seems like the switching system failed (1)

Relayman (1068986) | more than 2 years ago | (#40506213)

Natural gas fails when there is an earthquake. Depending where your data center is located, diesel may be a better choice.

Re:it seems like the switching system failed (1)

drinkypoo (153816) | more than 2 years ago | (#40506729)

Natural gas fails when there is an earthquake.

Natural gas generators (or even fuel cells) are commonly used within city limits for a broad number of reasons. First and foremost, you're not permitted to store quantities of flammables in most cities. Another is that the emissions are relatively benign.

OUTSIDE of a city, you can use a propane generator, which can be a converted gasoline generator if you prefer. You can even convert one to be dual-mode so it will run on either gasoline or propane, but that's quite a bit more work. Common dual-mode generators run on natural gas or propane, which practically is a bit like saying high or low octane. It takes only minor changes to convert an appliance from one to the other. (Your car can run on one or the other with a timing change...)

uh forgot something important (1)

drinkypoo (153816) | more than 2 years ago | (#40506737)

whoops, I forgot to say OUTSIDE of a city you can use a propane generator FROM A PROPANE TANK. Which, of course, means it can still function after a 'quake. And if you live in someplace where it's legal to have a tank AND where you can get city gas, you can get the best of both worlds.

Re:Seems like anything takes down the cloud... (2)

ILongForDarkness (1134931) | more than 2 years ago | (#40506099)

They expect the customers to pay for the redundancy by using multiple servers in different geographical locations. People buying one server or a bunch only in one datacentre are taking a risk already. I'm assuming someone in Amazon said lets build a few datacentres and skimp on the redundancy at each one. The redundancy is at the multi-datacentre level not at the multi-UPs multi-connection etc level at each datacentre.

Re:Seems like anything takes down the cloud... (5, Insightful)

hawguy (1600213) | more than 2 years ago | (#40505907)

It seems that recently, anything can take down the cloud, or at least cause a serious disruption for any of the major cloud providers. I wonder how many more of these it takes before the cloud-skeptics start winning the debates with management a lot more often.

I think it's more because a cloud outage affects thousands of customers, so it has more visibility. When Amazon has problems, the news is reported on Slashdot. When a smaller collocation center has an accidental fire suppression discharge taking hundreds of customers offline, it doesn't get any press coverage at all.

But the biggest takeaway from this is - never put all of your assets in one region. No matter how much redundancy Amazon builds into a region, a local disaster can still take out the datacenter. That's why they have Availability zones *and* regions. I have some servers in us-east-1a and they weren't affected at all. If they were down, I could bring up my servers in us-west within about an hour. (I could even automate it, but a few hours or even a day of downtime for these servers is no big deal)

Re:Seems like anything takes down the cloud... (1)

MrBandersnatch (544818) | more than 2 years ago | (#40506159)

Almost spot on - in fact don't even put all of your assets into the same cloud even because the day IS going to come when an infrastructure issue takes out even the largest of providers.

Re:Seems like anything takes down the cloud... (1)

fuzzyfuzzyfungus (1223518) | more than 2 years ago | (#40505977)

While the nimbostratus salesweasels are(obviously, these are salesweasels) lying, an incident where a datacenter gets taken down good and hard by weather won't do the in-house guys much good either... 'Cloud' or not, a datacenter(and probably a fair few smaller ones, and a veritable legion of various converted-broom-closet small business setups) was taken down by weather.

It certainly has become increasingly hard to hide that most of the 'cloud' providers do, er, rather less magic-distributed-reliability than their glossy brochure might insinuate. The decent ones generally make it possible; but they generally leave making it happen up to the customer. Anybody who expects 'cloud' to magically save them is naive or lying. However, that doesn't change the fact that it does make buying capacity in other regions, on short notice, convenient, so long as you can bully the vendor into admitting what you are actually buying.

However, the in-house approach is in largely the same boat, only more visibly. Anybody's in-house operation in that part of the electrical grid would also have been good and hosed without redundancy in some other region. Whether it is cheaper/easier to provide that redundancy via traditional means or by purchasing the requisite 'cloud' stuff is a different issue...

Re:Seems like anything takes down the cloud... (1)

andy1307 (656570) | more than 2 years ago | (#40506261)

I wonder how many more of these it takes before the cloud-skeptics start winning the debates with management a lot more often.

This sort of thing never ever happens when you host everything in-house?

Re:Seems like anything takes down the cloud... (1)

lucifuge31337 (529072) | more than 2 years ago | (#40506549)

I wonder how many more of these it takes before the cloud-skeptics start winning the debates with management a lot more often.

This sort of thing never ever happens when you host everything in-house?

Obviously they do. But at least you have some control over the recovery, rather than sitting around watching for carefully-worded email and Twitter updates from Amazon about when you just might get access to the shit you are paying for again. That makes communicating real information to your customers a bit easier.

Of course, you can always use the excuse that it's not your fault and blame Amazon ("see...look at all the other people who are down"). But that's largely a marketing decision I suppose.

Re:Seems like anything takes down the cloud... (0)

MobileTatsu-NJG (946591) | more than 2 years ago | (#40506315)

They'd only 'win' the debate until a power failure at their location. Or a hardware failure... Or a malware outbreak... etc.

What, you thought "cloud" meant "no outage"? (4, Insightful)

ebunga (95613) | more than 2 years ago | (#40505821)

Cloud computing is nothing more than 1960s timesharing services with modern operating systems. Unless you design for resilience, you're not resilient to problems.

Re:What, you thought "cloud" meant "no outage"? (2)

rubycodez (864176) | more than 2 years ago | (#40505949)

The laugh is that those 1960s sytems had, for additional money, configurations for 24x7 uptime. Here we supposedly design for that with the cloud architecture, and fail. I would not be surprised at all if the modern mainframe were a cost effective alternative to this bloated expensive cloud.

Re:What, you thought "cloud" meant "no outage"? (1)

Anonymous Coward | more than 2 years ago | (#40505983)

The laugh is that those 1960s sytems had, for additional money, configurations for 24x7 uptime.

If you cut all the power to those "redundant" systems, they went down.

Funnily enough, that's what's happened here. Except the other AWS data centers are all working, unlike your 1960's system.

Are people on Slashdot really too stupid to understand cloud, or are they just deliberately disingenuous?

Re:What, you thought "cloud" meant "no outage"? (1)

rubycodez (864176) | more than 2 years ago | (#40506125)

Are you too stupid to research before spouting off? cutting "all the power" was rather difficult, as it came from two utilities and onsite generation.

Re:What, you thought "cloud" meant "no outage"? (1)

dkf (304284) | more than 2 years ago | (#40506619)

Are you too stupid to research before spouting off? cutting "all the power" was rather difficult, as it came from two utilities and onsite generation.

Never underestimate the power of the universe to shit on you. It's still quite possible to get a perfect storm of problems that takes things offline, such as the main onsite generator being down for scheduled maintenance that overruns, the backup generator only having limited capacity, and a major storm wiping out the power grid completely for 20 miles. At that point, stuff will go down, and at some point it becomes cheaper to have insurance to deal with the losses arising (including reputational losses) instead of building the vastly complex infrastructure that can't fail in ever less likely scenarios.

The other big change is that the vast majority of work now requires that datacenter be online (in a network sense), and at that point you're vulnerable to someone else being the weakest link. Doing it all yourself is fantastically expensive, and incredibly hard too given the number of different skills involved.

Mind you, most of Amazon's service provision didn't even bat an eyelid. They can lose a whole datacenter and only some customers are affected, and those customers can (if they adequately prepared) get back up and going in minutes. They could even have arranged things so that their customers would have hardly seen a thing at all, but that is admittedly more easily done with some types of service than others. Still, it's not something that Amazon fixes for you (and they explicitly tell you they don't if you read their docs; you've got no excuse there). One of the genius things about the Cloud is that these things are not totally hidden from you (as a direct customer of the main service providers, of course); it allows prices to be lower and it allows you to deal with issues at the application level (usually the easiest place). It lets you get the benefits of having multiple globally-distributed datacenters without the hassle of physically building out all over the world, but it isn't magic. Just engineering and business.

Re:What, you thought "cloud" meant "no outage"? (1)

ILongForDarkness (1134931) | more than 2 years ago | (#40506135)

I think most are just cheap bastards that are upset that their one server $30/month setup didn't by a redundant datacentre and that opps maybe they should have listened went people said that geo-redundancy: "It's a good thing" TM.

Re:What, you thought "cloud" meant "no outage"? (1)

lucifuge31337 (529072) | more than 2 years ago | (#40506567)

I think most are just cheap bastards that are upset that their one server $30/month setup didn't by a redundant datacentre and that opps maybe they should have listened went people said that geo-redundancy: "It's a good thing" TM.

Yeah....Netflix is totally one of those places. Oh...wait....no they aren't and they were down anyway.

Re:What, you thought "cloud" meant "no outage"? (1)

MrBandersnatch (544818) | more than 2 years ago | (#40506281)

I suspect there is a lot of resistance to the concept due to the general early experiences of SaaS and hosting solutions being cloud-washed....

Re:What, you thought "cloud" meant "no outage"? (0)

Anonymous Coward | more than 2 years ago | (#40506469)

I think Amazon deserves some share of the blame for not being able to keep the DC online. Once they realized they were going to be in a long-term outage they should have moved some of the EC2 instances to other sites (Oregon and California are the other two iirc) while the generators were still running.

Most of the blame does fall on the customers though. If you pay to be hosted in only one site, you should know that there's risk you'll go down. This is true whether you host your own servers or have them hosted "in the cloud". I find it surprising these companies don't divide their capacity between sites just over bandwidth concerns. Being in just one site opens you up not only to downtime but also to complete data loss. Suppose a number of these servers were fried? I'm not sure how elaborate their data backups are, but hopefully those aren't single-site too.

Re:What, you thought "cloud" meant "no outage"? (1)

ColdWetDog (752185) | more than 2 years ago | (#40505965)

Cloud computing is nothing more than 1960s timesharing services with modern operating systems. Unless you design for resilience, you're not resilient to problems.

Cool. Can we get those old Teletype terminals back? The clattering ones that left little round bits of paper all over the place?

And 8-track tapes while we're at it.

Re:What, you thought "cloud" meant "no outage"? (4, Funny)

dkf (304284) | more than 2 years ago | (#40506653)

And 8-track tapes while we're at it.

We need those tape machines. Stick them in front of the real machines and get something hacked from a Raspberry Pi to spin them back and forth in an interesting pattern, with some extra blinkenlights for good measure, and we'll be able to once again prove to all the management types that we're doing serious computing so they can leave us alone and go back to their golf handicap.

Re:What, you thought "cloud" meant "no outage"? (1)

sweatyboatman (457800) | more than 2 years ago | (#40506189)

Cloud computing is nothing more than 1960s timesharing services with modern operating systems. Unless you design for resilience, you're not resilient to problems.

Cloud computing a little more than 1960s timesharing services. Some miniscule differences such as being accessible from anywhere in the world, providing enormously more power and exponentially more capacity, and priced by they penny, but those are tiny differences that matter. Not to mention that as other commenters have mentioned, the Amazon Cloud does provide more redundancy, the people using it just didn't want to pay for it.

The parent is the single stupidest comment possible for this thread and it's modded +5 insightful.

Re:What, you thought "cloud" meant "no outage"? (0)

Anonymous Coward | more than 2 years ago | (#40506373)

No, it really isn't. Modern day cloud computing isn't much more advanced than it was in the 1960's.

Re:What, you thought "cloud" meant "no outage"? (1)

dkf (304284) | more than 2 years ago | (#40506719)

No, it really isn't. Modern day cloud computing isn't much more advanced than it was in the 1960's.

All except for the data volumes, timescales, connectivity and pricing. In the '60s, timesharing services didn't ever have to deal with anything like the volume of data that would be found on a modern PC. They'd have a turnaround time of a few days, and connectivity was by courier if you were in a hurry, or driving over there yourself with your stack of punched cards (or paper tape) otherwise. I suppose it would be possible to think that pricing was comparable, especially if you were to ignore inflation, but really there's no comparison at all.

The net effect of these things is that people use the concepts of a timesharing service differently to back then. Human activity is not time- or space-scale invariant.

Re:What, you thought "cloud" meant "no outage"? (0)

Anonymous Coward | more than 2 years ago | (#40506539)

It certainly doesn't mean "reliability", does it?

Of course a lot of people will take the oppotunity to say cloud sucks, but it seems the argument is (or should be) whether the reduction in costs/increse in scalability is worth the hassle of a less dependable system, and whether it really *is* less dependable... and whether that's just because it's still gaining acceptance.

If struggles by early adopters meant the whole concept is bad we'd still be riding carriages.

Millions of dollars spent for nothing. (5, Interesting)

Anonymous Coward | more than 2 years ago | (#40505845)

So this is the second time this month Amazons cloud has gone down, there should be serious questions being asked of the sustainability of this service given the extremely poor uptime record and extremely large customer base.

They would have spent millions of dollars installing diesel or gas generators and/or battery banks and who knows how much money maintaining and testing it, but when it comes time to actually use it in an emergency, the entire system fails.

You would think having redundant power would be a fundamental crucial thing to get right in owning and operating a data centre, yet Amazon seems unable to handle this relatively easy task.

Now before people say "well this was a major storm system that killed 10 people, what do you expect", my response is that cloud computing is expected to do work for customers hundreds and thousands of kilometres/miles from the actual data centre so this is a somewhat crucial thing that we're talking about - millions of people literally depend on these services; that's my first point.

My second point is it's not like anything happened to the data centre, it simply lost mains energy. It's not like there was a fire, or flood, or the roof blew off the building, or anything like that; they simply lost power and failed to bring all their millions of dollars in equipment up to the task of picking up the load.

If I were a corporate customer, or even a regular consumer I would be seriously questioning the sustainability of at least Amazons cloud computing, Google and Facebook seem to be able to handle it but not Amazon - granted they don't offer identical products the overall data centres seem to stay up 100 or 99.9999999% of the time unlike Amazons.

Re:Millions of dollars spent for nothing. (2)

turbidostato (878842) | more than 2 years ago | (#40505937)

A datacenter is a datacenter is a datacenter. You are not in "the cloud" if you can't scape from a datacenter-level incident.

Given that there is no "cloud" provider (not yet, at least) that will automagically protect your services from a datacenter-level incident, is up to you, the customer, to do it.

It's certainly possible with current technology but it's neither cheap nor straightforward, no matter what the "cloud" providers insist in sell and the PHBs in believe.

Re:Millions of dollars spent for nothing. (0)

Anonymous Coward | more than 2 years ago | (#40506241)

Someone mod parent up. "The Cloud" simply forces you to engineer reliability correctly. You can no longer throw money at a high-end single point of failure (a storage system or server or switch) and just hope that if it costs enough it will never fail. That option is gone in the cloud; all components are commodity and considered cheap and prone to failure.

Ultimately, whether you used high-end/low-failure-rate stuff or the cheap/crap stuff within one datacenter, you'll still need to engineer *real* reliability between multiple datacenters if you want to survive natural (and man-made) disasters. If you're doing that properly, there's no sense wasting money on high-end componentry within a single datacenter anymore.

Building architectures that scale and handle failure correctly and globally (deployed in multiple locations on commodity cheap stuff, virtually never goes down anyways) is the way of the future whether you run your own hardware or not. It's The Right Way To Do Things. At a certain scaling level it probably makes sense to build your own "cloud". For many companies, it makes more sense to use something like EC2, because they really aren't at a scaling level where they can do it as reliably and cheaply. I'd guestimate the cutoff is somewhere in the vicinity of having a permanent baseline need of ~5K+ instances. Either way, it's the same architectural goals for your applications to get the same reliability and scalability.

If your app fails due to a single Amazon datacenter failing, you're not architecting things correctly. Amazon has nothing to do with that; the same rules apply anywhere else.

Re:Millions of dollars spent for nothing. (0)

Anonymous Coward | more than 2 years ago | (#40506345)

It's certainly possible with current technology but it's neither cheap nor straightforward, no matter what the "cloud" providers insist in sell and the PHBs in believe.

Any decent IaaS cloud provider will offer CDN and GSLB products at a reasonable price. It's totally possible to build a system on top of a cloud that can survive a data center ("Region") outage, using the services the cloud provider offers.

Re:Millions of dollars spent for nothing. (1)

turbidostato (878842) | more than 2 years ago | (#40506563)

"Any decent IaaS cloud provider will offer CDN and GSLB products at a reasonable price."

Which helps you with your authoritative dynamic data exactly how?

And even with your mostly read-only data you will get only the lowest advantage if going the "automagical" route: to take most benefit from CDN or GSLB you need to engineer and develop you apps with those services in mind -which is exactly what I already said.

Re:Millions of dollars spent for nothing. (5, Informative)

hawguy (1600213) | more than 2 years ago | (#40505967)

So this is the second time this month Amazons cloud has gone down, there should be serious questions being asked of the sustainability of this service given the extremely poor uptime record and extremely large customer base.

They would have spent millions of dollars installing diesel or gas generators and/or battery banks and who knows how much money maintaining and testing it, but when it comes time to actually use it in an emergency, the entire system fails.

You would think having redundant power would be a fundamental crucial thing to get right in owning and operating a data centre, yet Amazon seems unable to handle this relatively easy task.

Well, the entire system didn't fail, my servers in us-east-1a weren't affected at all.

Hardware fails, even well tested hardware... especially in extreme conditions - don't forget that this storm has left millions of people without power, killed at least 10, and caused 3 states to declare an emergency. Amazon may have priority maintenance contracts with their generator and UPS system vendors and fuel delivery contracts, but when a storm like this hits, they vendors are busy keeping government and medical customers online. Rather than spend millions more dollars building redundancy for their redundancy (which adds complexity that can cause a failure itself), Amazon isolates datacenters into availability zones, and has geographically disperse datacenters.

Customers are free to take advantage of availability zones and regions if they want to (which costs more money), but if they chose not to, they shouldn't blame Amazon.

Re:Millions of dollars spent for nothing. (1)

ahodgson (74077) | more than 2 years ago | (#40506329)

ELB issues last night did cause problems to services with zone redundancy. We had services with zone redundancy that were experiencing issues because the ELB addresses being served were not functional even though they had working instances connected to them.

Amazon has also had at least one other outage in the last 18 months that affected more than one availability zone.

Region redundancy would be good. But it's quite a bit more complex and costly, what with security groups and ELBs not crossing regions and having to pay external data charges for every byte moved between regions. We do it for important services, but it is a pain.

Re:Millions of dollars spent for nothing. (5, Informative)

dbrueck (1872018) | more than 2 years ago | (#40506387)

Sorry, but "Amazon's cloud has gone down" is wildly incorrect. From the sounds of it, *one* of their many data centers went down. We run tons of stuff on AWS and some of our servers were affected but most were not. Most important of all is that we had *zero* service interruption because we deployed our service according to their published best practices, so our traffic was automatically handled in different zones/regions.

Having managed our own infrastructure in the past, it's these sort of outages at AWS that make us grateful we switched and that continue to convince us it was a good move. It might not be for everybody, but for us it's been a huge win. When we started getting alarms that some of our servers weren't responding, it was so cool to see that the overall service continued on its merry way. I didn't even bother staying up late to babysit things - checked it before bed and checked it again this morning.

Firing up a VM on EC2 (or any other provider) != architecting for the cloud.

I live nowhere near Va (4, Interesting)

bugs2squash (1132591) | more than 2 years ago | (#40505891)

However "Netflix, which uses an architecture designed to route around problems at a single availability zone." seems to have efficiently spread the pain of a North Eastern outage to the rest of the country. Sometimes I think redundancy in solutions is better left turned off.

not just netflix, and not just "electrical storm" (2)

acroyear (5882) | more than 2 years ago | (#40505961)

Instagram's servers in that cloud server were also affected, and more people griped about that on my facebook feed than netflix.

as for "an electrical storm", that's a bit of an understatement. The issue was actually more the 80 mph wind gusts as well as the lightning continuing on for 2 hours after the wind and rain had passed (meaning crews couldn't get out there overnight).

The result is some 2 million people without power, 1 million around DC alone. Dominion Power (which services the area where the data center resides, about 5 miles from my house) lost power for more than half of its northern virginia customers, and even now has only restored power to about 60,000* out of 461,000 that lost it. On the Maryland/DC side of the potomac, half a million people may be without power for days through a 100 degree each day heat wave (and more storms like last nights coming...).

* fortunately that would include me...though i'm writing this via my sprint phone as a wifi hotspot 'cause our cable modem is still down ;-)

Re:not just netflix, and not just "electrical stor (1)

Onuma (947856) | more than 2 years ago | (#40506747)

You lucked out, then. I've driven around Fairfax, Arlington and PG counties as well as DC today. I haven't seen a major road without some kind of debris blocking it, nor an area which has 100% power restored at this point.

This was a bad storm, but could certainly have been far worse. Even still, the grocers and stores are out of ice and people are swarming out of their homes like rats abandoning ship in some areas. These same people would be fucked if the S really HTF.

Wasn't even a big storm (4, Informative)

gman003 (1693318) | more than 2 years ago | (#40506001)

I was in it - it was not a particularly bad storm. Heavy winds, lots of cloud-to-cloud lightning, but very little rain or cloud-to-ground lightning. I lost power repeatedly, but it was always back up within seconds. And I'm located way out in a rural area, where the power supply is much more vulnerable (every time a major hurricane hits, I'm usually without power for about a week - bad enough that I bought a small generator).

According to TFA, they were only without power for half an hour, and that the ongoing problems were related to recovery, not actual power-lossage. So their problems are more "bad disaster planning" than "bad disaster".

Still, you'd think a major data center would have the usual UPS and generator setup most major data centers have - half an hour without power is something they should have been able to handle. Or at least have enough UPS capacity to cleanly shut down all the machines or migrate the virtual instances to a different datacenter.

Re:Wasn't even a big storm (1)

Anonymous Coward | more than 2 years ago | (#40507067)

As reported on Wunderground the storm was a derecho, a storm at with a track at least 240 miles long with winds above 58 mph. The derecho started in Northwest Indiana, and tracked all the way to offshore Deleware and New Jersey. a bunch of places had 80+ mph gusts. So exactly how bad the storm was depended on how much wind one got.
Now the issue of data center protection and backup, the question is where else along the route were data centers, and did they survive?

Poorly run datacenter (0)

nurb432 (527695) | more than 2 years ago | (#40506059)

If they don't have proper backup generators, they have no business running a data center.

Re:Poorly run datacenter (1)

turbidostato (878842) | more than 2 years ago | (#40506401)

"If they don't have proper backup generators, they have no business running a data center."

*Or* they are in a business that recognizes that shit happens, even at the datacenter level, and provide services so you can spread your load out of more than one datacenter, making the x10 expenditures needed to go from a "decent" datacenter to a "top notch" one moot and avoidable.

Hey, doesn't that look like this funny "cloud" concept they are waving so oftenly?

Sooner or later ... (-1)

PPH (736903) | more than 2 years ago | (#40506259)

... you'll all be under water [slashdot.org] anyway. Time to start packing and heading for high ground.

double hit (0)

slashmydots (2189826) | more than 2 years ago | (#40506515)

Not only did I get annoyed for like 3 whole minutes last night at the tail end of the netflix downtime but I also couldn't download an important software patch form a vendor on Friday because it was hosted on the amazon cloud download service thing. By the way, Netflix apparently doesn't have a damn thing for single point of failure adaptation, seeing as how their entire website itself was down and wouldn't even respond to a ping. They can't even load a freaking "sorry, we're having problems" page on a backup host? Yeah, real adaptive. Oh and good call hosting your site on the same service that your videos stream from. That's really smart.

How did this happen anyway? The cloud is magic...MAGIC!!!!!! You cannot destroy magic! It must have been a dark wizard. That or all the cloud product salesmen are full of shit.

My instance was down for 9hrs... (2)

geekymachoman (1261484) | more than 2 years ago | (#40506557)

Which is the problem. Not the power outage itself.
If the power outage happened, and the servers where back let's say ... in 30 minutes, 1hour... alright, but 9 freakin' hrs ?

In my specific case I didn't suffer as much because I have another instance in different zone with db replication and all that, serving as a backup server, and my project there, although very critical (20 people are getting wages out of it) is very low on resource usage... I can imagine there where quite a lot of people that lost quite a lot of money because of this. It's really unacceptable for a DC to have a 9 hrs downtime, whatever the reason is... because.. that's just the standard people are used to.
I never experienced anything like this at any other company in the last 10 years I'm working as a linux admin.. although at all those companies, I used real servers.

Re:My instance was down for 9hrs... (4, Interesting)

PTBarnum (233319) | more than 2 years ago | (#40506701)

There is a gap between technical and marketing requirements here.

The Amazon infrastructure was initially built to support Amazon retail, and Amazon put a lot of pressure on its engineers to make sure their apps were properly redundant across three or more data centers. At one point, the Amazon infrastructure team used to do "game days" where they would randomly take a data center offline and see what broke. The EC2 infrastructure is mostly independent of retail infrastructure, but it was designed in a similar fashion.

However, Amazon can't tell their customers how to build apps. The customers build what is familiar to them, and make assumptions about up time of individual servers or data centers. As the OP says, it's "the standard people are used to". Since the customer is always right, Amazon has a marketing need to respond by bringing availability up to those standards, even though it isn't technically necessary.

What Datacenter? (0)

Anonymous Coward | more than 2 years ago | (#40506771)

What datacenter was this? Was it a private Amazon datacenter or was it someone else's?

What real datacenter can't operate for a week without power? That's ridiculous!

Inevitable (0)

Anonymous Coward | more than 2 years ago | (#40506859)

All my data and media are still accessible. I never swallowed the cloud Kool-aid, though.

Defined My Saturday Morning (1)

pgn674 (995941) | more than 2 years ago | (#40506865)

My company uses Amazon Web Services to host some of our product, and I got a call at 7 am to help bring our stuff back up. A bunch of our instances were stopped, and a bunch of Elastic Block Store volumes were marked Impaired. We're working on making our environment more "cloudy" to make better use of multiple availability zones, regions, and automation to better survive an outage like this, but we're not there yet.
Load More Comments
Slashdot Login

Need an Account?

Forgot your password?