Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Amazon Outage Shows Limits of Failover 'Zones'

timothy posted about 3 years ago | from the my-cloud-smells-like-cat-food dept.

Cloud 125

jbrodkin writes "For cloud customers willing to pony up a little extra cash, Amazon has an enticing proposition: Spread your application across multiple availability zones for a near-guarantee that it won't suffer from downtime. 'By launching instances in separate Availability Zones, you can protect your applications from failure of a single location,' Amazon says in pitching its Elastic Compute Cloud service. But the availability zones are close together and can fail at the same time, as we saw today. The outage and ongoing attempts to restore service call into question the effectiveness of the availability zones, and put a spotlight on Amazon's failure to provide load balancing between the east and west coasts."

cancel ×
This is a preview of your comment

No Comment Title Entered

Anonymous Coward 1 minute ago

No Comment Entered


For a little extra money... (0)

Anonymous Coward | about 3 years ago | (#35898834)

You can sail on my new ship. It's got redundant hulls, a few feet apart.

Re:For a little extra money... (2)

techsoldaten (309296) | about 3 years ago | (#35898866)

For a little extra money, you can get a seat in my biplane, with the extra wings.

Re:For a little extra money... (0)

Anonymous Coward | about 3 years ago | (#35899982)

For a little extra money, you can get a seat in my biplane, with the extra wings.

For even just a little bit more extra money, you can get a seat in my twin-engined biplane ;-)

Twice the engines plus twice the wings.

I even have parachutes too!

Re:For a little extra money... (1)

lgw (121541) | about 3 years ago | (#35899158)

Amazon: where failover meets overfail.

It has to be embarassing that a single incident broght down multiple "availability zones" (at least for EBS, maybe other parts of EC2), as that's just what they were supposed to be safe from. Hmm, "overfail", I like it.

Yay (1)

recoiledsnake (879048) | about 3 years ago | (#35898886)

Re:Yay (1)

calderra (1034658) | about 3 years ago | (#35899000)

So setting up a server to remote desktop into your home computer is cloud computing? I have a sinking feeling that "cloud computing" is a lot like web2.0, aka "broadband Geocities".

Re:Yay (1)

DigiShaman (671371) | about 3 years ago | (#35899656)

Cloud Computing generally implies redundancy and non-locality 24/7. Computer hardware that makes up the cloud would normally be provisioned to acts as a resource and not a point of failure for the entire infrastructure. The idea with Cloud Computing is that the Cloud is an organ while the hardware acts as cells. A few could die off and/or be replaced without any disruption to the user.

Unfortunately, everyone has their own idea and implementation to creating Cloud based content and services. So we end up with a lot of bullshit marketing and thus rendering the entire concept to nothing better than a buzzword. Do you feel like trusting it now?

Re:Yay (1)

marcello_dl (667940) | about 3 years ago | (#35899964)

The incident might be eye opening for some people but the cloud cannot theoretically work because it's not a paradigm. Grid computing is a paradigm. Cloud computing is, as you said, marketspeak describing how providers organize their resources internally. Well that's irrelevant because the provider is the single point of failure. Piss off Amazon for whatever reason, your data becomes unavailable no matter how cloudy it was. It's more "cloudy" to simply replicate data locally and on two different providers.

Re:Yay (1)

Fulcrum of Evil (560260) | about 3 years ago | (#35900070)

the cloud cannot theoretically work because it's not a paradigm.

What the fuck is that supposed to mean? It's got words in it, but is entirely vacuous.

Re:Yay (1)

DigiShaman (671371) | about 3 years ago | (#35902654)

He's wrong about Cloud not being able to work. But I think I understand his POV. If I'm right, he's basically saying what I've stated. That is to say, Cloud computing is a business solution based idea with the word coined for a marketing purpose. However, Cloud computing is not required to use any specific paradigm to achieve that goal. Grid computing is such a paradigm, and I believe it to be the proper one to use for Cloud computing.

Re:Yay (0)

Anonymous Coward | about 3 years ago | (#35903668)

I think he's pointing out that "Cloud Computing" is marketing-speak, so defining what it is would be like figuring out where Margarita ville is, to borrow an example from Jimmy Buffet. Cloud computing is whatever you want it to be.

Re:Yay (2)

dkf (304284) | about 3 years ago | (#35900254)

Cloud Computing generally implies redundancy and non-locality 24/7.

No. It generally implies that you can hire resources (cpu, disk) on short notice and for short amounts of time without costing the earth. You can build high-availability systems on top of that, but HA is not trivial to set up and typically requires significant investment at many levels (hardware, system, application) to attain. Pretend that you can get away with less if you want; I don't care.

Re:Yay (1)

schnikies79 (788746) | about 3 years ago | (#35899756)

No need to have a sinking feeling, it's always been that way. The "Cloud" is a buzzword, nothing more, nothing less.

Let us learn from Xzibit (4, Funny)

Anonymous Coward | about 3 years ago | (#35898892)

Amazon should put their cloud in a cloud, so the cloud will have the redundancy of the cloud.

Re:Let us learn from Xzibit (1)

Anonymous Coward | about 3 years ago | (#35903690)

That's funny. But this whole thing is funny. What the blazes is the cloud for, if it fails? Is it not a cloud but a lead balloon? Sure, I understand no system is perfect, but to steal a line from Seinfeld (you're a car rental agency - you're supposed to *hold* the reservation): Amazon WS - you are a cloud! You are supposed to have 99.99% uptime! That's *all* you are supposed to do! Especially when mainframes have 99.9999% uptime, I believe.
  Even distributed systems - which your average web site has had available for at least ten years, and even that wasn't new technology. What is it, we move to the "cloud" and forget all the stuff we learned before? Sometimes watching these chaps, like twitter, facebook and google, reinvent and rewrite and relearn basic, standard, industry practices, you've got to really be puzzled. Let's put all our logic in the view layer, etc.

Re:Let us learn from Xzibit (0)

Anonymous Coward | about 3 years ago | (#35903980)

Ever heard of vaporware, du0d?

Doh! (0)

Anonymous Coward | about 3 years ago | (#35898896)

Oh, you wanted failover for your web site! I thought you meant failure!

Cloud computing (5, Funny)

stopacop (2042526) | about 3 years ago | (#35898906)

Not ready for the desktop ;-)

Obligatory (0)

Anonymous Coward | about 3 years ago | (#35898918)

In Soviet Russia failover zones you.

sounds like TWCs DNS servers (1)

dltaylor (7510) | about 3 years ago | (#35898976)

I lose access to TWC's DNS servers regularly (yes, I will be setting up my own, when it becomes annoying enough). Although you can do a quick-and-dirty load-balancing by setting them up as follows, there's no redundancy for the customers when there's a link failure.

search socal.rr.com

Re:sounds like TWCs DNS servers (2)

michaelhood (667393) | about 3 years ago | (#35899428)

or just use and or and

Re:sounds like TWCs DNS servers (1)

DaftDev (1864598) | about 3 years ago | (#35899492)

Or OpenDNS:

Re:sounds like TWCs DNS servers (2, Informative)

samkass (174571) | about 3 years ago | (#35899624)

...and get slow performance on anything delivered via Akamai or similar services which try to use regional data centers.

OpenDNS and Google DNS are hacks that work increasingly badly.

Re:sounds like TWCs DNS servers (3, Interesting)

trapnest (1608791) | about 3 years ago | (#35899750)

Not that you're wrong, but that's not the fault of the DNS servers, Akamai should be using geolocation by IP, not by the location of DNS servers.
Infact, I'm not sure how they could be doing geolocation by the client's DNS servers... are you sure about that?

Re:sounds like TWCs DNS servers (1)

Anonymous Coward | about 3 years ago | (#35902040)

I'm personally completely sure. I run a recursive DNS server at work for DNS lookups, and I get very different answers for www.akamai.com when I manually query vs our own recursive DNS server.

It doesn't help that a RTT to the IP returned by is over 200 ms, but the latter is around 10 ms. (I'm in New Zealand, 200 ms RTTs to popular, US based, websites is very normal. Heck, I have 228 ms RTT pinging slashdot.org right now on my home DSL. Clearly, using our own recursive DNS, I'm hitting an in-country node of the akamai network.)

Of course, ISP provided recursive DNS servers are usually fairly unreliable - hence, why we run our own. (When were were previously connected to a large, multinational provider (that has basically since pulled out of the country), the resolving DNS server they gave had a considerably larger RTT time than the authoritative DNS servers for .nz domains themselves!)

Re:sounds like TWCs DNS servers (2)

The Bean (23214) | about 3 years ago | (#35903862)

Typically your computer asks your firewall/router for a DNS lookup. It relays that to your ISP's DNS server. Your ISP looks up the DNS server responsible for the domain and contacts that server and sends your original request. That request doesn't include your IP however, so Akamai's DNS servers are returning regional specific servers based on your ISP's DNS server IP/geo-location. That's usually perfectly acceptable, since presumably your ISP's DNS server would be located on a good route with a low ping.

So if you replace your ISP's DNS server with those of OpenDNS, google or whatever else, it is that server which determines your location when Akamai's DNS servers decide which IPs to give you.

You should be able to replace your DNS server with your own locally hosted one as well. ie, you contact the root-servers, hunt down the responsible server, then contact it directly for the IP. I'm not sure what the implications of that is though. The intent of the typical setup is that the ISP DNS servers can cache things and reduce the load on the central root servers.

Re:sounds like TWCs DNS servers (2)

guruevi (827432) | about 3 years ago | (#35899838)

Actually, when you're on TWC you might get BETTER performance with OpenDNS than with their own DNS. When using the TWC DNS I can't get a 1080p without 10m of loading time or even a non-stuttering 720p stream from YouTube or Netflix. With OpenDNS or Google DNS I get much better performance. Also, if you're an AT&T Business customer, OpenDNS works much better with DNS-based RBL's like Spamhaus which AT&T blocks.

Re:sounds like TWCs DNS servers (1)

Anonymous Coward | about 3 years ago | (#35901406)

No. DNS-based geo-location caching schemes are the culprit. It works off a bad assumption that makes using an alternative DNS server a pain. I Don't like my ISP's DNS servers. They hijack domain typos as a revenue stream, so I consider them hostile and ignore them when I can.

Using google or opendns, however, will cause havoc for a couple of surprisingly common things I've experienced problems with:
Rackspace hosted exchange service

Fortunately you can configure your network's DNS server to forward requests to your ISP's DNS servers (nstead of your 3rd party DNS service) for specific domains (like *.apple.com *.netflix.com). This fixed the above issues where I work.

have your own servers (4, Insightful)

Dan667 (564390) | about 3 years ago | (#35898978)

or use a completely different company for redundancy. I think that is the lesson here.

Re:have your own servers (4, Insightful)

rudy_wayne (414635) | about 3 years ago | (#35899316)

This incident illustrates once again why you need to put your stuff on your own servers and not someone else's. All computer systems will fail occasionally. There's no such thing as 100% uptime. However, when your own servers fail you can get your own people working on it right away and it's their number one priority. When your stuff is on someone else's servers, you're at their mercy. It will get fixed when they get around to it, and, they have more customers than just you, so you might not be first on the priority list. Or second. Or third. Or tenth.

Re:have your own servers (1)

DdJ (10790) | about 3 years ago | (#35899528)

This incident illustrates once again why you need to put your stuff on your own servers and not someone else's.

Well. Or put your stuff on your own servers as well as someone else's. Cloning your services into various clouds isn't insane as a tool for handling some types of unplanned scaling requirements or some types of unplanned outages. Relying on those clouds introduces risks that were just demonstrated.

Re:have your own servers (2)

vrmlguy (120854) | about 3 years ago | (#35900236)

This incident illustrates once again why you need to put your stuff on your own servers and not someone else's.

Well. Or put your stuff on your own servers as well as someone else's. Cloning your services into various clouds isn't insane as a tool for handling some types of unplanned scaling requirements or some types of unplanned outages. Relying on those clouds introduces risks that were just demonstrated.

It's probably worth noting that EMC makes a cloud storage product called Atmos with an API essentially identical to Amazon's S3 service. The main difference is that the HTTP headers start with x-emc instead of x-amz, so a properly written application running on non-Amazon servers could switch fairly easily between the two for load balancing or redundancy.

Re:have your own servers (1)

dkf (304284) | about 3 years ago | (#35900352)

This incident illustrates once again why you need to put your stuff on your own servers and not someone else's.

Hosting everything yourself? Can we sell you a contract for us to build you a datacenter? Then there's the ongoing costs of actually operating it.

Or were you thinking that a scavenged rack in a old closet previously only used by the janitor was a substitute?

Re:have your own servers (2)

hey! (33014) | about 3 years ago | (#35900518)

Nah. It shows that when you buy a product or service you need to understand what you are paying for, not extrapolate from a buzzword like "cloud".

You can't make a blanket statement one way or another about using something like EC2 without considering the user's needs and capabilities. There may be users who'd find the recent outage intolerable ;they probably shouldn't be using EC2. But if they have good reasons to consider EC2 chances are they are goig to spend more money.

Re:have your own servers (2)

LordLimecat (1103839) | about 3 years ago | (#35902580)

Pop quiz:
Youre a small company that does software development. You need servers to do deployment testing, basically just apache and the customized package. Uptime is a must, and your budget is limited.

Do you...
A) Spend tens of thousands on servers, plus backup power, plus racks, plus redundant switches, plus dual WAN links, plus a backup solution (for 10 servers, so far youre looking at ~$35k, plus a thousand a month on WAN links)
B) Trust that Amazon will have FAR better uptime than you could EVER dream of architecting on a budget, with far greater convenience, and a lower price tag to boot (the "cloud" is generally billed on CPU usage and bandwidth, which will be low for testing)?

Every time Google or Amazon or Rackspace suffers an outage, people start hollering that the cloud is a menace, a curse, a sham, or whatever. But if you look at the length of, for example, Googles outages over 8 years, their record is head and shoulders above anything that slashdot armchair engineers could throw together, especially given the load they carry.

Unless I missed a news story, this will be Amazon cloud's first outage, and the beauty here is that none of the "rebuild" or "restore from backup" burden will be on their customers. They have to pay technicians to come out and replace hardware; they have to provide the hardware. The downside, of course, is that their services are unavailable; but of course you would be facing that if your own setup failed, and you would be footing the bill to boot.

The real lesson here, I suppose, is that if you really really really need 100% uptime, you should be prepared to fire up a hot- or cold- standby system of your own, or that you should get a rather large budget approved to build a real redundant system-- but not that you can out-architect Amazon on anything less than a large budget.

NB-- I say this as a technician typically dealing with networks up to 100 users and up to 30 servers. If you have multi-million dollar budgets, certainly go ahead and build out that server room.

Re:have your own servers (1)

Anonymous Coward | about 3 years ago | (#35902988)

Really, it depends on the financial hit your organization will take by downtime/lost productivity/lost business/lost confidence in the ability of your organization to be able to deliver your product. If being down for 12-15 hours or more will cost you more than $35k then yes it makes sense for you to roll your own solution or to use traditional dedicated hosting providers in a H/A configuration. Every organization needs to perform their own risk analysis. If downtime that is out of your control is acceptable then a cloud provider is for you. If it isn't then keep it in house.

What really gets me is the marketing that goes into the "cloud"/SaaS/PaaS solutions. Sure they can save you a significant amount of money up front. However, to achieve the best uptime I have always subscribed to the KISS philosophy. Moving something to a public cloud solution is inherently adding a significant layer of complexity to delivering your application (which should be your primary objective, you can't make money if people can't use your application/service). Even Google has downtime on their services from time to time (although I don't know of a time when search has been unavailable).

For what its worth the services that I host in-house rarely/if ever have unscheduled downtime. In the few instances of unscheduled downtime we were able to recover in minutes not hours because we have complete control over the environment. The one service that we have hosted has been down over 30 hours this year already (obviously not my choice, political decision by management).

Re:have your own servers (2)

The Bean (23214) | about 3 years ago | (#35903892)

Reddit's downtime has been a bit of a running joke for a while now, which most (all?) of it being blamed on Amazon.

The way they implemented things is one of the big issues. For example, things like setting up RAID volumes across multiple EBS volumes. They just magnified their exposure to any issues in the cloud. Any one machine goes down the system gets hosed and needs recovery. They also are constrained to a single availability zone in order to get the performance they need from their setup. (This is not intended to be a factual statement. ie, I didn't confirm the details, but I believe it captures the essence of the issue.)

To get the most from the "cloud" you need to build your infrastructure accordingly. You can't take old systems and throw them in the cloud and expect it to scale. Neither can you take all the old ideas, new tools will require new techniques which the industry will learn as things mature.

Re:have your own servers (2)

outsider007 (115534) | about 3 years ago | (#35903080)

No. If a zone goes down in CA, I can have a new server up in Virginia within minutes. I would rather be on ec2 when I go down. I guarantee I will be back up faster than you.

Re:have your own servers (1)

craigbeat (706827) | about 3 years ago | (#35903948)

The company I work for hosts on another very large company (that had a lot of downtime for another reason a few years back), on dedicated servers. Believe me when I say we have as many problems with them. So far, there have been no problems for us using Amazon. I think it depends on your needs. Multiple redundancy is probably a better solution, but nothing is perfect yet.

Re:have your own servers (4, Informative)

gad_zuki! (70830) | about 3 years ago | (#35899480)

So wait. The cloud sales pitch is "no more servers-save money-cut IT staff" but now its:

1. Virtualized servers in zone 1
2. Virtualized servers in zone 2
3. Virtualized servers from a different company altogether.

So I went from one solid server, good backups, maybe a hot backup, and talented staff running the show to outsourced to 3 different clouds with hour-long hold times with some Amazon support monkey? Genius.

Re:have your own servers (1)

Fulcrum of Evil (560260) | about 3 years ago | (#35899516)

One? Lose a raid stack and you're toast. It's always been at least N+1 redundancy for your tier one crap. The cloud stuff is there so you can scale up quickly. Shouldn't be base load or anything.

Re:have your own servers (1)

Artifex (18308) | about 3 years ago | (#35900156)

So I went from one solid server, good backups, maybe a hot backup, and talented staff running the show to outsourced to 3 different clouds with hour-long hold times with some Amazon support monkey? Genius.

I hope that one good server is in a disparate geographical location from its hot backup, using a separate transit provider, each server has redundant power supplies, and your talent has a bus factor of (#servers)+1 or more. You're gonna need backup for any load balancing as well, and whether that should be in yet another location is, well, something to consider.

Cloud services should give you the redundancy you need, as well as being easily scalable. Why are you trying to say the whole concept is bad just because Amazon's implementation is flawed?

Re:have your own servers (1)

dhasenan (758719) | about 3 years ago | (#35901394)

I would expect Amazon's marketing to indicate that these units within a region are a way to get fast communication between them without wholly losing redundancy. As such, it's a middle-tier option, not best at anything (you'd have the machines in the same data center if they really needed the bandwidth, and in separate regions if you really needed the redundancy). If I'm wrong about that, then the marketing people who handled that should be dismissed.

Re:have your own servers (0)

Anonymous Coward | about 3 years ago | (#35900822)

Genius for Amazon and genius for the moron IT director who shifted your company's entire IT infrastructure to the cloud and has since retired.

Re:have your own servers (1)

fermion (181285) | about 3 years ago | (#35899532)

If one can afford that kind of redundancy, then sure. Two independent lines coming in from two independent providers that individually will adequately handle all traffic for an extended period of time. Independent arrays of pc computers hooked to independent load balancers that will not fall over if something happens to one line or a large numer of computers. One could also have big iron with a 6 nine reliability hooked to redundant lines. In any case backup power to keep all the equipment up for a long period of time is critical. I knew companies that did these kind of things back in the day. It was expensive and had to fund.

If this is one day out of the year that EC2 is offline, then that is probably better reliability than a home spun server. It is better reliability than I have ever gotten with any of the shared hosting companies I have dealt with. For affordable, or ad sponsored, high profile internet services 3 nines is probably all we are going to get. it is probably good enough. The services that were down were not critical. Real quality of life was not meaningfully effects for significantly large group of people. What this means to the common person is that one should have a redundant service. Use Gowalla and Foursquare. You may not be able to get to a dashboard, but maybe can get to the services. It was not like google was down.

Re:have your own servers (1)

Fulcrum of Evil (560260) | about 3 years ago | (#35900130)

If one can afford that kind of redundancy, then sure. Two independent lines coming in from two independent providers that individually will adequately handle all traffic for an extended period of time.

Why would you do that? It's enough to do things like run two DCs that can each handle 60% load or three that each handle 40% load. Not that much more expensive, and downtime turns into "the site is slow". There are architectural concerns, especially with data replication, but this is definitely doable, and it doesn't cost a mint.

philosophical POV (2, Insightful)

Anonymous Coward | about 3 years ago | (#35899244)

I'll take the philosophical point of view on this and say failures are the best way to find and diagnose systemic weaknesses. Now Amazon knows the weakness in the AZs and can fix it.

Zones in different continents (0)

Anonymous Coward | about 3 years ago | (#35899290)

How about one zone in Europe and one in North America? Isn't that better?

Re:Zones in different continents (0)

Anonymous Coward | about 3 years ago | (#35899376)

That's a great plan... until the entire northern hemisphere is wiped out by an arctic EMP attack.

Nice business model (-1)

Anonymous Coward | about 3 years ago | (#35899358)

If you want more money, let a failure happen in order to encourage them to pay more for a more reliable service.

Kind of like letting an Assasination happen, you dont have to orchestrate them, you just let them happen by standing down or looking the other way.

And thus the gullible managers who ignored IT... (2)

gestalt_n_pepper (991155) | about 3 years ago | (#35899384)

and only heard, "Cloud, cloud, cloud! It's new and shiny and cheaper than those annoying internal IT guys so I get a bonus!" learn to pay the stupidity tax.

Next up, learning just how *much* of your cloud data has been stolen and resold by those trustworthy souls in China and India.


Gullible manager doesn't care (2)

JaredOfEuropa (526365) | about 3 years ago | (#35899646)

Outsource IT and you outsource responsiblity as well. If your own department fucks up, the top brass will come looking for you. However, If you outsource and the service provider messes up, you can shift the blame to them especially in case of big disasters like these. As long as you can show that you've managed the SLA's well and that it's them who didn't keep to their promises, you're good. More likely you'll find that those SLA's were crap to begin with, which is also fine, because it's likely your boss and his boss signed off on the deal as well. Pass the buck...

Re:And thus the gullible managers who ignored IT.. (0)

Anonymous Coward | about 3 years ago | (#35900014)

Yes thats the idea, and if you don't like it, you won't go higher in your company.

Business is business. Accept it or work for a Charity.

Re:And thus the gullible managers who ignored IT.. (1)

Tackhead (54550) | about 3 years ago | (#35900354)

And thus the gullible managers who ignored IT... and only heard, "Cloud, cloud, cloud! It's new and shiny and cheaper than those annoying internal IT guys so I get a bonus!" learn to pay the stupidity tax.

C'mon. All managers love cloud!

What rolls down stairs, fails over in pairs,
Leaks data when it's allowed?
A stupidity tax, it replaces your racks,
It's cloud, cloud, cloud!

It's cloud! It's cloud! It's new, it's shiny, it's cheap!
It's cloud! It's cloud! It's down, and now you'll weep.

Everything's in the cloud! You're gonna love it, cloud!
Outsource it to the cloud! Everyone needs a cloud!

Cloud! It goes blammo!

Re:And thus the gullible managers who ignored IT.. (1)

xtracto (837672) | about 3 years ago | (#35900720)

It is a sad joke. Even for sites like Reddit whose administrators are supposed to know better, the Amazon shit hit. And the terrible thing is that it is not the first time that Amazon's service has broken, this has happened quite a lot in the last months, and people still *pay* for the service. Crazy.

redundant (0)

Anonymous Coward | about 3 years ago | (#35899390)

i have a fully redundant cloud. its got all the best and greatest technologies, and i have it in this box... whoops i just dropped it. having your backup (cloud apps or files) in the same datacenter, is a lot like backing up your data to a separate partition on the same hard drive. im looking at you time machine.

Turning lemons into lemonade or...... (1, Funny)

i_want_you_to_throw_ (559379) | about 3 years ago | (#35899446)

maybe the failure was on purpose to promote another revenue stream. Hmmmmmmmmm......................

Re:Turning lemons into lemonade or...... (3, Insightful)

elohel (1582481) | about 3 years ago | (#35899672)

Okay, I had to log in simply to comment on the stupidity of this statement. Aside from now being in violation of their own ToS (probably, at least in transgression of up-time guarantees), they're undoubtedly fiscally liable for refunding payment for the period of time in which services were unavailable or degraded. Additionally, this dramatically hurts their brand name - I know if I ever have to host anything on 'the cloud' (I can't believe I said it), this incident will be on my mind when the time comes for me to choose a provider. And before I stop beating this dead horse - think about what kind of liability Amazon would have, fiscally, for intentionally dropping services for revenue producing sites. One would imagine that Amazon would be fiscally liable for revenue losses during that downtime if this outage was intentional. That's no small amount of coin.

Taking it too serious. (1)

Singularity42 (1658297) | about 3 years ago | (#35899926)

This "i_want_you_to_throw_" has an overactive ego and wants to appear clever. "i_want_you_to_throw" should take DXM.

Re:Turning lemons into lemonade or...... (1)

klui (457783) | about 3 years ago | (#35903824)

Reddit has been down for approx 24 hours--it's been on RO mode for most of the day. Pretty bad PR for Amazon.

Clouds are stupid. (-1)

Anonymous Coward | about 3 years ago | (#35899548)

All that data ready to be blown away. Reddit was just one big bubble ready to burst anyway. Without jailbait it's marketshare would have dropped by now. After my /r/loseit spreadsheet finishes in July I will be leaving reddit forever unless it can afford to use real redundant hardware instead of cloud crap.

What if Amazon was down instead for Reddit, etc.? (1)

Anonymous Coward | about 3 years ago | (#35899638)

Do you think Amazon would allow its own sales and services to be impacted for 12 hours (and running) under any circumstances short of the recent disaster in Japan? EC2 customers, on the other hand, appear to be second-class citizens.

Availibility zones must be done PERFECTLY (1)

sirwired (27582) | about 3 years ago | (#35899748)

"Availibility Zones", "Failure Domains", etc. must be done with absolute perfection if you do them at all. If your gargantuan application has some single tiny side-feature that is not replicated across domains, your whole app is going down.

True Story: I was doing some consulting work for a large bank after they had a bunch of problems. Their main website had all the super-available trimmings: Oracle RAC, mutli-site server clustering, storage mirroring, all the fancy, expensive, highly-available crap you could ask for. This is all well and good except... some dinky stylesheet (or something like that) for the bank's homepage resided on some dinky 1U non-clustered fileserver. When it went down, the pages simply would not display. Whoops! All that grand effort was for nought because there was one "leakage" that killed the whole app.

Re:Availibility zones must be done PERFECTLY (1)

sockonafish (228678) | about 3 years ago | (#35899942)

You could code your application to be tolerant of those kinds of outages. The services backing feature X aren't available? Then don't render the controls for feature X on the page.

You mean, a company hyped something? (0)

Anonymous Coward | about 3 years ago | (#35900040)

You mean, a company hyped something as better than it really was, and too many people took it at face value? Oh, perish the thought! (sarcasm).

They're called "FAILOVER" zones for a reason... (1)

numbsafari (139135) | about 3 years ago | (#35900042)

You're supposed to FAILOVER between them, not load balance between them.

You can't hold amazon accountable for your own stupidity.

Beyond that, you have to ask yourself the question: how many outages would you have had with your own facility in the past year compared to this outage? Did you apply the same approach to your use of EC2 as you would to your own facility?

Reddit's down, guess I'll check out slashdot. (1, Funny)

finkployd (12902) | about 3 years ago | (#35900244)

It's been like 7 years, how's everyone doing? :)

Re:Reddit's down, guess I'll check out slashdot. (0)

Anonymous Coward | about 3 years ago | (#35900422)

better drink your own piss

Re:Reddit's down, guess I'll check out slashdot. (0)

Anonymous Coward | about 3 years ago | (#35900640)

great, just don't say you're an Apple user.
posting anon for obvious reasons. (I'm on a mac.)

Re:Reddit's down, guess I'll check out slashdot. (0)

Anonymous Coward | about 3 years ago | (#35900830)

Doing well, thanks. But I seem to have forgotten my login here, hmmm....

Re:Reddit's down, guess I'll check out slashdot. (0)

EdotOrg (18710) | about 3 years ago | (#35902018)

Well, hey, nice to see you too. I stumbled over here in desperation too. :)

Re:Reddit's down, guess I'll check out slashdot. (0)

srsguy (1809414) | about 3 years ago | (#35902214)

I'm not sure I like this comment overlay over reddit's. They're all hidden by default and nobody's making any puns.

Re:Reddit's down, guess I'll check out slashdot. (0)

Anonymous Coward | about 3 years ago | (#35903452)

Yea, because reddit NEVER goes down.

Re:Reddit's down, guess I'll check out slashdot. (0)

Anonymous Coward | about 3 years ago | (#35904018)

It's been a long time.

Is this related to the DDoS of Change.Org? (1)

blair1q (305137) | about 3 years ago | (#35900332)

Change.Org says that for the past several days the Chinese have been DDoSing it over a petition they are posting to gather support for Ai WeiWei.

http://blog.change.org/2011/04/chinese-hackers-attack-change-org-platform-in-reaction-to-ai-weiwei-campaign/ [change.org]

But if you go to the Change.Org site to sign the petition, you get a message saying that something is wrong with their servers, which are at Amazon.

http://www.change.org/petitions/call-for-the-release-of-ai-weiwei [change.org]

http://status.aws.amazon.com/ [amazon.com]

http://www.computerworld.com/s/article/9216064/Amazon_gets_black_eye_from_cloud_outage [computerworld.com]

Could Amazon's outage be the result of Chinese hackers?

Re:Is this related to the DDoS of Change.Org? (0)

Anonymous Coward | about 3 years ago | (#35901130)

China started attacking Change.org on Monday. I spoke with the founder, and he says they've been screening out the attack more and more, which was coming primarily from behind the Great Firewall. It could be they've changed their approach -- they were using a botnet DDOS to blunt-force the site down with fake requests, but maybe they've got an Amazon EC2 exploit.

Re:Is this related to the DDoS of Change.Org? (0)

Anonymous Coward | about 3 years ago | (#35901272)

Could Amazon's outage be the result of Chinese hackers?

Yes! The Chinese hackers infiltrated Amazon to embed broken code into their deployment schedule so on April 21st, it goes POW!

So yes, it must be Chinese. There can't be any simple explanation, like not planning for some equipment to fail.

No need to speculate (2)

jc2brown (1997958) | about 3 years ago | (#35900732)

Since apparently no one's actually looked into the issue beyond "ZOMG the cloud is down," here's some info from Amazon:

8:54 AM PDT We'd like to provide additional color on what were working on right now (please note that we always know more and understand issues better after we fully recover and dive deep into the post mortem). A networking event early this morning triggered a large amount of re-mirroring of EBS volumes in US-EAST-1. This re-mirroring created a shortage of capacity in one of the US-EAST-1 Availability Zones, which impacted new EBS volume creation as well as the pace with which we could re-mirror and recover affected EBS volumes. Additionally, one of our internal control planes for EBS has become inundated such that it's difficult to create new EBS volumes and EBS backed instances. We are working as quickly as possible to add capacity to that one Availability Zone to speed up the re-mirroring, and working to restore the control plane issue. We're starting to see progress on these efforts, but are not there yet. We will continue to provide updates when we have them.

So the engineers failed to foresee a potential hazard. Hardly something to get worked up about, especially for a relatively young technology.

Wow (-1)

Anonymous Coward | about 3 years ago | (#35901444)

Anonymous really took them out.

Change related? (1)

Biggerveggies (517226) | about 3 years ago | (#35901504)

I don't necessarily hate the marketing concept of 'The Cloud', but I am fascinated by the business decisions and risk acceptance that organisations are willing to take. ie- the typical: "Demanding high availability and hot failover, instantaneous incident resolution, and 'we are your primary customer'... but also a low cost." I think that Amazon and their competitors *may* get there with their offerings, but until there is a bit more maturity, I expect to see more incidents like this.

My wild guess is that a change triggered this, which of course leads to why has the backout plan failed (and who signed off on the risk)? I can't imagine that this is not change related - otherwise there is a serious architectural design flaw here somewhere.

Re:Change related? (0)

Anonymous Coward | about 3 years ago | (#35902434)

My experience from sampling the Fortune 50 datacenter automation dreams of the last decade, in a vendor startup, is that there is a serious lack of appreciation of capacity management, risk tolerance, and the emergent properties of their systems.

Just like the "just in time supply chain" blew up in Japan following the recent tsunami, the elastic infrastructure crowd is pushing too hard on "reduce margin" and being optimistic about worst case scenarios. You don't find these autonomic computing people talking about control system theory, like you would expect them to be doing. You are lucky to find them understanding a basic notion like hysteresis or damping.

Our field is constantly making the same failures as our ancestors. Not many of the practitioners currently making cloud or mobile systems would really remember the many disasters of previous Internet features, exhibiting unanticipated resonance at large scale. And of those who remember, very few remember it with the pain necessary to avoid doing it again.

Big deal (0)

Anonymous Coward | about 3 years ago | (#35902106)

switching from one data center to another should be no big deal.Amazon has proven that big things fall douwn and stumble trying to get back up.Why should I pay more? They fell on thier own. I have a better idea- don't fuck up the first time when you promise the world.

You get what you pay for... (0)

Anonymous Coward | about 3 years ago | (#35902112)

What do you expect...when you buy a consumer grade product from a company that sells books for a living...you get what you pay for.
Any IT manager worth his salt would never ever put all their eggs in one basket (btwHappy Easter).

Rarely do you ever see a medium to large enterprise completely outsource all operations to a single vendor and if they dothey completely understand that provider’s infrastructure and related redundancy models both within and outside the data center.

I advise anyone that is scared to move into a cloud compute infrastructure because Barnes & Nobleoops I mean Amazon cannot properly design/operate an infrastructurego look at real enterprise class cloud computing providers (IaaS/PaaS) and ask the top three for a 30 day demo (they will give it to you if you’re a legit business) and run some off the shelf benchmarks. You will find that Amazon although cheap to turn up when you compare price and performance under loadthey will be beat hands down in both categories. This is not only my opinion/real world experience but also that of multiple fortune 500 CIOs (more so their IT staff) that I have dealt with just in the last 12 months.

So in shortdon’t listen to all these ignorant (not stupidjust ignorant) cloud bashing server huggers and do some researchthere are some really good cloud infrastructure providers out there that may not take your visa as paymentbut will provide you with a much more robust, reliable and performance oriented infrastructure compared to your Barns & Nobleoops I did it againI mean Amazon service.

BTWI love Amazon as an Ecommerce site (books, electronics, musicetc)I love my Prime membership!

Click on our website: ( http://www.fullmalls.com (1)

xiaojiekuiiu (2057502) | about 3 years ago | (#35902758)

Click on our website: ( http://www.fullmalls.com/ [fullmalls.com] ) Website wholesale various fashion shoes, such as Nike, Jordan, prada, also includes the jeans, shirt, bags, hats and decoration. Personality manufacturing execution systems (Mes) clothing, Grab an eye bag coat + tide bag Air jordan(1-24)shoes $30 Handbags(Coach l v f e n d i d&g) $35 Tshirts (Polo ,ed hardy,lacoste) $15Jean(True Religion,ed hardy,coogi) $30Sunglasses(Oakey,coach,gucci,A r m a i n i) $15 New era cap $12 Bikini (Ed hardy,polo) $20accept paypal and free shipping ( http://www.fullmalls.com/ [fullmalls.com] )

Amazon and Microsoft (2)

kriston (7886) | about 3 years ago | (#35903164)

Amazon and Microsoft have to distinctly different views of "cloud computing."

When I first learned about "cloud computing" I automatically assumed it meant that there would be an arbitrary number of different services available to an arbitrary number of web servers which would then be served to the user. No one service would depend on the other.

Amazon's "cloud computing" is centralized upon the virtual machine as the hub of the "cloud." Microsoft Azure, on the other hand, originally offered the approach that I had thought about, where everything is just a service, no VM required.

Today Amazon still depends heavily on the VM concept. You can't have a web service on Amazon without one. This also makes it excessively difficult to "load balance" or provide "failover" because you are actually expected to stand up new VM instances to scale up and down and need separate VM instances on each "availability zone." In addition it's not easy or affordable to share data between availability zones. This isn't what I thought the cloud was going to be.

Microsoft eventually added VMs to its Azure service so they could compete with Amazon's VM-centralized concept. I still think the idea of separate, independent services talking to each other was what the "cloud" was supposed to be, and if these services didn't have to depend on these VMs (which they do not have access to because AWS is intermittently down) they would have still been working from the other data centers.

Cloud is mostly marketing-driven? (0)

Anonymous Coward | about 3 years ago | (#35903268)

Does anyone else feel like this cloud computing stuff is more marketing-driven than technology-driven?

Our company (a very large financial services firm) started the cloud computing stuf a few years ago, and it's been such a huge hassle that they've actually rolled out local versions of applications and backup info. Cloud computing just isn't reliable enough yet (ever?) to store/run mission-critical software. Is it? Anyone have a perspective here?

*Still A Happy, Paying EC2 Customer* (1)

CyborgWarrior (633205) | about 3 years ago | (#35903392)

As what I would consider a medium-weight AWS user (our account is about 4 grand a month) I am still quite happy with AWS. We built our system across multiple availability zones, all in us-east and had zero downtime today as a result. We had a couple of issues where we tried to scale up to meet load levels and couldn't spin up anything in us-east-1a (or if we could, we couldn't attach it successfully to a load balance because of internal connectivity issues), but we spun up a new instance in us-east-1b and attached it completely fine and were able to handle the load just fine. The load balancers worked as expected (and hoped for) and the segregation of issues between availability zones was fairly successful.

I think that fixing these issues are just as high an issue with Amazon as they would be with any internal IT infrastructure, so I don't give much credence to the arguments that having your own servers and your own internal IT team would truly solve the problem any more effectively: I think it just gives you more the illusion of control because you can see that you're working on it, as opposed to trusting to the fact that Amazon is working on it.

If there is any AWS lesson to be taken away from this it is that:

1) EBS may not be ready for prime time - most of our servers are instance-store anyway, both for performance reasons and for other reliability problems we have had in the past.

2) You should keep your server templates set up as up-to-date AMIs so you can deploy across any availability zone you want at any time you want. Right now, we have our load balancer attachment configuration all scripted as well, so spinning up new instances to feed a cluster is a single CLI execution with us specifying the availability zones.

Check out http://perfcap.blogspot.com/2011/03/understanding-and-using-amazon-ebs.html [blogspot.com] for a nice explanation of some of the issues you may come across with EBS and the internals of why.

Overall, I still give Amazon a good rating. This was a major outage and we felt barely a hiccup.

Re:*Still A Happy, Paying EC2 Customer* (1)

The Bean (23214) | about 3 years ago | (#35903934)

I'd give you the good rating. You used the service in a sane manner that exploited the strengths of the system and avoided the weaknesses.

I suspect many users of EC2 actually end up with less reliability than they'd get with a server in a closet, as they don't realize the true effort it takes to have an effective solution like you do.

Load More Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Sign up for Slashdot Newsletters
Create a Slashdot Account