Major Outage At the Amazon Web Services

Catch up on stories from the past week (and beyond) at the Slashdot story archive

Major Outage At the Amazon Web Services 247

Posted by CmdrTaco on Thursday April 21, 2011 @11:45AM from the but-the-cloud-fixes-everything dept.

ralphart writes "The Northern Virginia datacenter for Amazon Web Services appears to be having a major outage that affects EC2 services. The Amazon Forums are full of reports of problems. Latest update from the status page: 2:49 AM PDT We are continuing to see connectivity errors impacting EC2 instances, increased latencies impacting EBS volumes in multiple availability zones in the US-EAST-1 region, and increased error rates affecting EBS CreateVolume API calls. We are also experiencing delayed launches for EBS backed EC2 instances in affected availability zones in the US-EAST-1 region. We continue to work towards resolution."

This discussion has been archived. No new comments can be posted.

Major Outage At the Amazon Web Services

Load All Comments

Search 247 Comments Log In/Create an Account

Comments Filter:

No Way! (Score:5, Funny)

by Frosty Piss ( 770223 ) * writes: on Thursday April 21, 2011 @11:48AM (#35894908)

But how can this be possible? It's The Cloud . This sort of this simply doesn't happen.

Share
twitter facebook
- Re: (Score:3)
  
  by alphatel ( 1450715 ) * writes:
  
  It didn't happen. The cloud can erase history in a planck!
  - - Re: (Score:2)
      
      by Anonymous Coward writes:
      
      But it's not supposed to happen, because "if" (when!) it does, the impact is HUMONGOUS. "You're welcome to store all your data in our fast, easy and safe cloud storage. Downtime? Don't worry, it'll only experience hour long outages intermittently." Yeah, that's how they sold it in the first place, isn't it?
      This will become quite the event in data warehouse circles I bet, because the cost of 'being in the cloud' just doubled; it's not enough to buy storage from one provider. The "always there" quality that's
      - Re: (Score:3, Insightful)
        
        by cduffy ( 652 ) writes:
        
        This will become quite the event in data warehouse circles I bet, because the cost of 'being in the cloud' just doubled; it's not enough to buy storage from one provider. The "always there" quality that's supposedly the benefit of cloud storage is a facade.
        You can buy from one provider -- every major cloud provider has multiple availability zones. But yes, lots of people buy in only one zone because it's cheaper, and then suffer for that mistake -- in situations just like this.
      - Re:No Way! (Score:5, Insightful)
        
        by lgw ( 121541 ) writes: on Thursday April 21, 2011 @01:19PM (#35896572) Journal
        
        his will become quite the event in data warehouse circles I bet, because the cost of 'being in the cloud' just doubled; it's not enough to buy storage from one provider. The "always there" quality that's supposedly the benefit of cloud storage is a facade.
        The cloud doesn't have to be perfect - it just has to be as good in the eyes of VPs as the contractors they'd otherwise hire to run their internal datacenter. What's the value of an IT guy in the eyes of an MBA? Yeah, this sort of reality check wont phase them at all.
        
        Parent Share
        twitter facebook
- Re: (Score:2)
  
  by pdbaby ( 609052 ) writes:
  
  Jokes aside, if people use The Cloud (I'm using this tongue in cheek...) rather than a cloud this thing doesn't happen.
  We use a number of providers which means that even if Amazon fell over completely our systems would be fine -- it looks like a lot of sites (reddit, for instance) don't bother to do this.
  - - Re: (Score:2)
      
      by pdbaby ( 609052 ) writes:
      
      I believe they generate a new HTML document each time a comment is added or up/downvoted - they could replicate the comment and vote data to another site.
      It'd be an increase in traffic but not necessarily a huge increase in load (since they wouldn't be generating HTML at the second site unless they're in failover mode).
      I don't know whether the increased reliability would be worth the extra load in their case, however, since I doubt they lose that much money from downtime (given how frequently they're dow
- Re: (Score:2)
  
  by ron_ivi ( 607351 ) writes:
  
  But how can this be possible? It's The Cloud . This sort of this simply doesn't happen.
  To be fair to Amazon - on a good cloud (incl. Amazon's) you can launch instances in completely different data centers, so your most critical services have somewhere to fail over to.
  Though, personally I'd feel even better if my nodes were distributed across two different clouds; to avoid the single-point-of-failure of the Amazon account itself. For example, despite running in both their East and West data centers, I'm still vulnerable to a sales/billing miscommunication that freezes my whole account.
  - Re: (Score:2)
    
    by Blakey Rat ( 99501 ) writes:
    
    Each data center also has independent zones.
    It looks like in this case, only one zone in one data center was affected-- that's bad, but that's not "end-of-the-world" bad. If sites are going down, they should have been more careful to distribute redundant servers in different zones.
    (Where this is a problem is if you're a small shop with a single DB server, and the zone holding your DB server goes down-- in that case you're kind of SOL.)
    - Re: (Score:2)
      
      by watanabe ( 27967 ) writes:
      
      As an example, we run our production servers on EC2 East; they have load balancers failing them between zones. The Database and webservers are fine, and have been fine today.
      The dev servers do not have load balancers running on them, and they have been choking in a miserable hell all morning.
    - Re: (Score:2)
      
      by ron_ivi ( 607351 ) writes:
      
      (Where this is a problem is if you're a small shop with a single DB server, and the zone holding your DB server goes down-- in that case you're kind of SOL.)
      IMHO the main beauty of a cloud is that you're NOT SOL.
      For one of the sites I manage, I am a small shop.
      The beauty of a cloud is that with Amazon's $0.02/hr micro instances, and $0.007 spot-priced micro instances I can *still* do things right (failover to remote data center, backups in different data center), even for clients that can only afford under $50/month in hosting.
- Re: (Score:2)
  
  by dkleinsc ( 563838 ) writes:
  
  This sort of this simply doesn't happen.
  Now we know: All it takes is one admin screwing up and replacing an "ng" with an "s".
- Re: (Score:2)
  
  by recoiledsnake ( 879048 ) writes:
  
  But how can this be possible? It's The Cloud . This sort of this simply doesn't happen.
  
  Yay, cloud!
- - Re:No Way! (Score:5, Informative)
    
    by 0123456 ( 636235 ) writes: on Thursday April 21, 2011 @12:10PM (#35895372)
    
    A major outage on most professional cloud setups means it is down for a few hours. A major outage at work means the full day. It is like saying driving my car is so much safer then flying because I never got into an accident.
    Last time I remember a day-long outage at work was 1994, and that was because the license server failed so we couldn't run our own software (we couldn't recompile it to remove the DRM because the compiler also needed a license to run).
    I seem to remember that the Mac guys at the company also had a long outage when they couldn't connect to one of their Mac servers, but eventually someone actually went to the server room and discovered that it had been stolen.
    Back on topic, I just don't see all these day-long outages that apparenty seem to happen all the time in companies that haven't moved their servers to The Cloud(tm).
    
    Parent Share
    twitter facebook
    - Re: (Score:3, Insightful)
      
      by Synn ( 6288 ) writes:
      
      "Back on topic, I just don't see all these day-long outages that apparenty seem to happen all the time in companies that haven't moved their servers to The Cloud(tm)."
      You must not get out much. Atlantic.net I had a 11 hour outage due to the staff not understanding how to update a Cisco router. Then a 4 hour outage when they screwed up billing and shut down our service with no warning. Then there was that time they didn't like our DNS traffic and shut down DNS with no warning or notice. That was a fun hour o
    - Re: (Score:3)
      
      by im_thatoneguy ( 819432 ) writes:
      
      We were out for a good portion of the day Monday after a bird flew into the telephone pole outside our office and then caused a critical server to go wonky after the UPS battery ran out and we didn't have the auto-shutdown settings correct.
  - Re: (Score:3)
    
    by TooMuchToDo ( 882796 ) writes:
    
    But when it's your gear, you have some control over the situation. When it's "in the cloud", you sit and get yelled at by the CXO and sweat if you'll still have a job while cloud provider X works to fix the problem (and their liability? whatever you paid for the service).
    - - Re: (Score:2)
        
        by TooMuchToDo ( 882796 ) writes:
        
        Annoying > Shitcanned.
Severe weather in Virginia likely the culprit (Score:3, Informative)

by stopacop ( 2042526 ) writes: on Thursday April 21, 2011 @11:50AM (#35894944) Homepage

Severe weather hit the area. They shutdown Surry Power Station in Surry County, Virginia after a tornado took the power out that powers the power station.

Share
twitter facebook
- Re:Severe weather in Virginia likely the culprit (Score:5, Informative)
  
  by getagrip ( 86081 ) writes: on Thursday April 21, 2011 @11:59AM (#35895128) Homepage
  
  I am in Northern Virginia. There is no power outage or severe weather here.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by Wornstrom ( 920197 ) writes:
    
    it's true: http://www.examiner.com/progressive-in-richmond/surry-power-station-under-repair-the-aftermath-of-tornado [examiner.com]
    Tornado was Saturday. I live on the other side of the James River from Surry.
    - Re: (Score:2)
      
      by Drathos ( 1092 ) writes:
      
      Yeah, that may be true, but it has nothing to do with anything going on in Northern Virginia. Surry is in Southeastern Virginia, over 150 miles away.
  - Re: (Score:2)
    
    by Overzeetop ( 214511 ) writes:
    
    Well, that just about sums up the attitude of Northern Virginia towards the rest of the state.
    - Re: (Score:2)
      
      by krnpimpsta ( 906084 ) writes:
      
      Well, that just about sums up the attitude of Northern Virginia towards the rest of the state.
      There's a "rest of the state?" :)
      
      (Also in NoVA, no outages or severe weather here)
  - - Re: (Score:3)
      
      by xnpu ( 963139 ) writes:
      
      Those news reports do not rule out the possibility that he's in a place in Northern Virginia without severe weather or a power outage. How do you conclude that he is wrong?
      - Re: (Score:2)
        
        by recoiledsnake ( 879048 ) writes:
        
        N. Va is not really that big. All the article cited talk about VA, not NVA.
  - - Re: (Score:2, Informative)
      
      by Anonymous Coward writes:
      
      First: Please look at a map. Surry County is east of Richmond on the way to VA Beach. An outage at Surry Power Station would not affect a data center over in Dulles, VA. That power station does not server this area at all.
      Second: Read the news. Every comment above is wrong in one way or another. Here is a local news article about what happened down there, if you are curious:
      http://www.examiner.com/progressive-in-richmond/surry-power-station-under-repair-the-aftermath-of-tornado
      You people know nothing,
- Re: (Score:3)
  
  by pdbaby ( 609052 ) writes:
  
  Amazon's Availability Zones are designed to have separate power, cooling and network so I don't think this is the issue. It was (is) a problem with their disk subsystem in multiple availability zones so I suspect they were in the process of pushing out some new storage controller code and some bug didn't appear until the later stages of their rollout. From their status log it looks like they're manually correcting the issue with each disk.
- Re: (Score:3)
  
  by metrometro ( 1092237 ) writes:
  
  Amazon's comments on the outage do not mention weather as a cause: http://status.aws.amazon.com/ [amazon.com]
  "8:54 AM PDT We'd like to provide additional color on what were working on right now (please note that we always know more and understand issues better after we fully recover and dive deep into the post mortem). A networking event early this morning triggered a large amount of re-mirroring of EBS volumes in US-EAST-1. This re-mirroring created a shortage of capacity in one of the US-EAST-1 Availability Zones, whi
- Re: (Score:2)
  
  by alphatel ( 1450715 ) * writes:
  
  So they can't failover like a normal ESX instance? So my cloud computer is actually just a rack in Virgnia?
  - Re: (Score:2)
    
    by TooMuchToDo ( 882796 ) writes:
    
    Your cloud computer is a Xen instance in Virginia, and your "EBS block storage" is an iSCSI target. Magic it ain't.
    - Re: (Score:2)
      
      by alphatel ( 1450715 ) * writes:
      
      Essentially half-cloudassed clouding.
      - Re: (Score:2)
        
        by TooMuchToDo ( 882796 ) writes:
        
        Not really half-assed from an implementation perspective, but from a marketing perspective. Amazon likes people to think it's magic, which is fine if it worked flawlessly all the time. But it doesn't, because it's just a technical solution for a specific problem. Unless you run instances in multiple zones, use redundant EBS volumes, and your entire app is built to handle global redundancy, it's not just going to be 100% uptime out of the box. I fault Amazon for lying to technical-enough people.
      - Re: (Score:2)
        
        by Synn ( 6288 ) writes:
        
        "Essentially half-cloudassed clouding."
        EC2 is just tools. It's as cloudassed as you make of it.
        I can take ESX and use a Netapp for data storage and if my Netapp cluster takes a dive, you can't fail over to anything since your data is down.
        On the other hand I can take EC2 and run apps and clustered DBs across the east and west coast and put ELB on front of it. If the east coast takes a nuke, everything will keep on running.
    - Re: (Score:2)
      
      by tunapez ( 1161697 ) writes:
      
      .Your cloud computer is a Xen instance in Virginia, and your "EBS block storage" is an iSCSI target. Magic it ain't.
      There is no room for accurate or useful specifications in the flamboyant, misrepresentation of marketing. Please enjoy the cuddly puppies and warm fuzzys.
- Re: (Score:2)
  
  by MobileTatsu-NJG ( 946591 ) writes:
  
  Severe weather hit the area. They shutdown Surry Power Station in Surry County, Virginia after a tornado took the power out that powers the power station.
  Of course we all know that the not-cloud would have been impervious to that.
- Re: (Score:2)
  
  by Kamiza Ikioi ( 893310 ) writes:
  
  But the scanner says their power level is Over 9000!
- Re: (Score:2)
  
  by Jawnn ( 445279 ) writes:
  
  Wow, then it's understandable. Good thing they weren't running a nuclear power plant or something.
- - Re: (Score:2)
    
    by jtdennis ( 77869 ) writes:
    
    it was probably a distribution station, not a power generation facility.
  - Re: (Score:2)
    
    by MmmmAqua ( 613624 ) writes:
    
    If it's a substation, it doesn't have its own power.
  - Re: (Score:2)
    
    by kevinNCSU ( 1531307 ) writes:
    
    after a tornado took the power out that powers the power station.
    Does not compute. Once it's running why can't a power station use it's own power.
    Because you tend to want to have power available to cool nuclear fuel even if you decide to stop producing power for whatever reason (maintenance, mechanical failure, tornado, earthquake, tsunami, nazi zombi attack)
    - Re: (Score:2)
      
      by SecurityGuy ( 217807 ) writes:
      
      In which case being unable to use a secondary source (self-generated power) would be a bad thing, no?
      - Re: (Score:2)
        
        by kevinNCSU ( 1531307 ) writes:
        
        A secondary source would be the backup generators or off-site power from the grid. If you lose one of your secondary sources it becomes unacceptably risky to keep your reactor running at full steam because you no longer have the safety net of as many backup sources. The safe play is then to shut down the plant so it begins to cool immediately before something can go wrong and your left with no backup sources to provide cooling power.
  - Re: (Score:2)
    
    by hawguy ( 1600213 ) writes:
    
    News reports are spotty, but I imagine that the plant tripped the turbines offline after the tornado damaged the power distribution equipment.
    When it's generating 1GW of power and suddenly the load goes down to 0GW, the turbines have to trip offline automatically and immediately to prevent damage.
    This may have also triggered a shutdown of the nuclear reactor, and it may take days or longer to bring it online after an emergency shutdown.
  - - bean counts screw us again! (Score:2)
      
      by Thud457 ( 234763 ) writes:
      
      Cheesus Xist! Backup generators taken out again?!! After Chernobyl and Fukashima, I'm starting to thing these " nuclear engineers " aren't rocket surgeons .
- - - Re: (Score:2)
      
      by Burdell ( 228580 ) writes:
      
      No, they're not (see Fukashima, Japan). Basically, you don't just flip a switch and have a power plant go dark; you have to follow a shutdown procedure that takes both time and power. I don't know the requirements for coal or natural gas plants, but US nuclear plants are required to have multiple backup power sources (IIRC at least two independent diesel generator systems as well as off-site power). If the plant loses one backup power source for more than a certain period, it is required to shut down. I
      - Re: (Score:2)
        
        by tlhIngan ( 30335 ) writes:
        
        I think you're ignoring the fact in the case of Fukashima, they were set up to be self-sufficient -- it's just that the tsunami knocked out their backup generators.
        Only due to cost savings. The tsunami wall required was half the height required (6M instead of 12M). Naturally, a 10M high tsunami hit. And no placement of the generators would've helped (they were in the basement, and that got flooded, but if they were outside, they could've gotten washed away).
        
        Re: (Score:3)
        
        by inject_hotmail.com ( 843637 ) writes:
        
        Why not put them on the roof? I think any datacenter designer would say that, first thing...I mean, they stored their precious depleted uranium and plutonium on the roof...why not the generators too?
        
        The real problem everywhere...and I do see it everywhere...is that the people paid to be the people that 'know' simply don't know, or have no sense of creativity or foresight. I mean come on, they built a tsunami wall because they have a high probability of tsunamis, and then they go and put the most missio
Lucky (Score:2)

by denshao2 ( 1515775 ) writes:

My instance is on us-east-1d which is still up.
- Re: (Score:2)
  
  by pdbaby ( 609052 ) writes:
  
  Their API gives different names for the availability zones for each user (so your us-east-1d could be my us-east-1a) which complicates talking about issues (since all you can say is "two availability zones are experiencing problems"), especially when your system uses multiple accounts
  - Re: (Score:2)
    
    by Blakey Rat ( 99501 ) writes:
    
    Really? What's the purpose of that? Some kind of half-assed based-on-human-psychology load balancing?
    My servers are in us-east-1d as well, and they didn't go down, but maybe that's just dumb luck as my 1d is your 1b.
    I can't do a really redundant setup, though, because I need a MS SQL instance and we don't have the budget for a second one to mirror to, so ... if the zone with our MS SQL instance goes down, or app is sunk regardless of how distributed the web servers are.
    - Re: (Score:2)
      
      by pdbaby ( 609052 ) writes:
      
      Yeah, I think that's what they're trying to do. I suppose it makes sense in a way, they want to make sure load is evenly distributed across their availability zones . But it seems to me they could have prevented that through better API design (e.g. users expressing a constraint that 2 resources should be in the same zone where that's meaningful but otherwise not permitting the selection of a specific zone)
The dark side of outsourcing (Score:3)

by HangingChad ( 677530 ) writes: on Thursday April 21, 2011 @12:03PM (#35895224) Homepage

Slashdot and Digg have one day traffic surges because Reddit is down. I'm getting way too much done today not being distracted by the GoneWild girls. This productivity must cease at once!
Does go to show what can happen when your business depends on an outsource provider. Everyone has to depend on service providers to some extent, but sometimes it's a good exercise to see how many of your company eggs are in one basket. Redundancy is expensive, but so is losing business. Even Google has had Gmail interruptions, lost some customer data and experienced slow downs.

Share
twitter facebook
Give me my Reddit back! (Score:2)

by Frederic54 ( 3788 ) writes:

Else I don't know what to do? I almost went to Digg! so please amazon guys, work on your stuff!
Emergency Plan (Score:5, Interesting)

by sycorob ( 180615 ) writes: on Thursday April 21, 2011 @12:11PM (#35895392)

I didn't even realize that one of our partners was using Amazon EWS until suddenly they were down all day. Amazon is really stable historically, but it's frustrating when you're out of business and all you can do is wait and see if Amazon will fix it soon.
In the "old school" thinking, smart companies have a redundant data center somewhere, humming along and waiting to be switched on if the main data center ever goes down. "The cloud" was supposed to solve that - massive redundancy within Amazon's services were supposed to protect you from outages. Not the case, apparently, since it looks like Amazon is going to fall below their promised 99.95% uptime (4.38 hours per year downtime).
I think the answer is to have redundant cloud services online, so you could switch from Amazon to Google or DevGrid if you had issues. The problem is, there's nothing quite like Amazon right now, it's not easy to switch from Amazon to some random service. This might be the biggest argument against virtual services - lack of standardization makes it hard to move from one to another, and hard to set up backup services in case of emergency.

Share
twitter facebook
- Re: (Score:3, Insightful)
  
  by MariusBoo ( 883340 ) writes:
  
  Actually in the case of EC2 the smart thing would have been to have your instances spread over different availability zones...
  - Re: (Score:3)
    
    by hey! ( 33014 ) writes:
    
    Actually, I'm more concerned about the *organization* as a single point of failure. If you rely on, say, Oracle (ugh), and Oracle goes bankrupt or a court orders them to stop selling their database or they simply decide to stop supporting some feature, you're still in business, and have a pretty good shot at moving to some similar database management system.
    If you built a mission critical system on Amazon's cloud services, a single court order not aimed at you could put you out of business. If Amazon was
- Re: (Score:3)
  
  by ron_ivi ( 607351 ) writes:
  
  Just using Amazon West as well as Amazon East would have saved customers from this outage.
  I think Amazon actually does great at covering all the technological single-points-of-failure.
  The only reason I'd want a second cloud vendor is for the sales/account related single-point-of-failure of the Amazon Account being frozen due to a sales miscommunication or a MPAA/RIAA takedown notice,etc.
- Re: (Score:2)
  
  by Synn ( 6288 ) writes:
  
  "In the "old school" thinking, smart companies have a redundant data center somewhere, humming along and waiting to be switched on if the main data center ever goes down. "
  The problem is that gets really really expensive and it's actually quite hard to do properly.
  You can do this with EC2 though, just have your application cross various geographical zones. Things like ELB even make this somewhat easier. But you still have to solve all the application problems that exist when your data stores exist across la
  - Re: (Score:2)
    
    by mikeytag ( 1835928 ) writes:
    
    Nail on the head here. We were affected today and while I have full offsite backups of everything we don't have a second datacenter to switch on because of cost and complexity. It's not too difficult to have webservers span different parts of the globe, but DB servers like MySQL are a whole different story and usually very crucial.
- Re:Emergency Plan (Score:4)
  
  by Alarash ( 746254 ) writes: on Thursday April 21, 2011 @12:31PM (#35895772)
  
  Even by using only AWS you can set up redundancy across multiple North America's regions. Even across continents, with one data center in Ireland and one in Singapore. But obviously it costs extra as they bill you the bandwidth between the regions. That's how you use The Cloud (c) (tm) (R). Using a single data center to set up redundancy is dumb because it's not redundancy. You need high availability for your VMs, but also for your data center.
  
  This is why banks or large businesses, for instance, have two or more data centers they always keep synchronized and have at least 50 kilometers between them. Thinking "well it's in one AWS data center so it's safe" is wrong, and this incident is a fine example of that.
  
  Parent Share
  twitter facebook
  - Re: (Score:2, Informative)
    
    by Anonymous Coward writes:
    
    50km is not a far enough distance. I witnessed this first hand for the employer I worked for on the Gulf Coast during Katrina. That storm jacked up about 120 miles, took down our primary AND failover sites.
- Re: (Score:2)
  
  by pdbaby ( 609052 ) writes:
  
  Amazon have complete isolation between Regions and good isolation between Availability Zones.
  At work we'd recommend people use 2 cloud providers for their important services (which could be 2 Amazon regions or it could be Amazon and Rackspace) to prevent this sort of failure taking your business offline. You can't rely on any particular cloud provider to be reliable but it's a reasonably safe bet that a selection of cloud providers won't have significant overlapping downtime
  It's also worth pointing out tha
  - Re: (Score:3)
    
    by Slashdot Parent ( 995749 ) writes:
    
    It's also worth pointing out that all cloud SLAs are basically useless: if Amazon falls below their advertised uptime they'll refund you some of your charges - but they'll never refund more than what you've paid them: they don't compensate you for all the money you're losing (and the AWS charges are likely pocket change compared to this)
    FYI, I don't think this outage even falls under EC2's SLA. The Region was still technically on line. Only EBS was down.
    Granted, many customers depend heavily on EBS, but the SLA doesn't cover an outage in just one specific EC2 feature. That being said, I wonder if AWS will honor SLA claims anyway, as a PR move. This outage is just so clearly Amazon's fault: a network hiccup causes EBS to overload in one Availability Zone, which cascades into all Availability Zones in the Region.
    Personally, I think that they
Not so bad.. (Score:2)

by kevinNCSU ( 1531307 ) writes:

I was wondering why it took longer to start up my hadoop cluster this morning on EC2, but it still beats the living hell out of buying and configuring large numbers of machines for short term testing.
Judgement Day (Score:2)

by treerex ( 743007 ) writes:

Hmmmm... today *is* Judgement Day... perhaps Skynet's first target is AWS's East-Coast data center. Coincidence? I think not.
6 weeks before the AWS summit 2011 (Score:4, Interesting)

by grapeape ( 137008 ) writes: <mpope7 AT kc DOT rr DOT com> on Thursday April 21, 2011 @12:26PM (#35895676) Homepage

Gotta wonder what kind of flack Amazon is going to take for this one. I've had a couple clients looking into cloud services including moving to AWS and have already had one of them call me and cancel a meeting about it. While I understand stuff happens, the entire sales pitch for AWS was redundancy and build as you grow. Redundancy has obviously not worked in this case, while I usually support cloud services, this is definitely going to be a hard example to counter when trying to sell it to potential customers.

Share
twitter facebook
- Re: (Score:2)
  
  by Synn ( 6288 ) writes:
  
  "Redundancy has obviously not worked in this case"
  Only 1 region is effective. If your app was set to work with multiple zones then it likely wouldn't be impacted by this outage.
  The thing with EC2 is it gives you the tools to build complex clusters. It doesn't do it for you.
  - Re: (Score:3)
    
    by Slashdot Parent ( 995749 ) writes:
    
    Only 1 region is effective. If your app was set to work with multiple zones then it likely wouldn't be impacted by this outage.
    Not true. My application works just fine in multiple Availability Zones, yet it was knocked out yesterday due to an entire Region getting knocked offline.
    And before you tell me that the application should have been multi-Region, I'm not buying it. AWS has always maintained that deploying an app across multiple AZs is HA. AZs are supposed to be considered as separate datacenters: separate power, separate uplink, etc. And yes, separate EBS infrastructure (you can't attach an EBS volume to an instance that was
- - Re:6 weeks before the AWS summit 2011 (Score:5, Informative)
    
    by TooMuchToDo ( 882796 ) writes: on Thursday April 21, 2011 @12:39PM (#35895920)
    
    It's not short sighted at all. When someone else runs your gear, all you can do is sweat until they get things back online, and they can take their time under what's known as "commerically reasonable SLAs". When you own your own gear, your own colo, etc., how much effort you use to get back up and running is up to you.
    "The Cloud" for mission critical businesses is a joke.
    
    Parent Share
    twitter facebook
    - Re: (Score:2)
      
      by darjen ( 879890 ) writes:
      
      For a small or medium size business, it could very well take massive amounts of effort and cost to keep your servers going full time. For many people it probably makes sense to outsource that function to dedicated engineers, rather than having to hire and manage your own.
    - Re: (Score:2)
      
      by davidbrit2 ( 775091 ) writes:
      
      It's even nicer working at a place that sells used/refurb IT gear. Main file server is down? No sweat, I'll just stroll out to the warehouse, grab a new RAID controller, and be up and running again in ten minutes. (Yes, we've had that sort of thing happen - hardware failure is just about the least of our worries around here when we have a spare for nearly everything in every one of our servers.)
    - - Re: (Score:2)
        
        by TooMuchToDo ( 882796 ) writes:
        
        Dude, I used to help run a Tier-1 CMS data facility for the LHC. I've done IT for the better part of 14 years. I know exactly what the fuck is going on here. Amazon sells people on the fact that you "put everything in the cloud" and you won't have any problems. Then problems occur and it's all *shrugs, shit happens*.
        Fark. off.
  - Re: (Score:2)
    
    by grapeape ( 137008 ) writes:
    
    I understand that but it still makes it a hard sell in the short-term until this all blows over.
Soo... (Score:2)

by Syberz ( 1170343 ) writes:

Is anybody else suffering from Reddit withdrawal?
Inappropriate metaphor - the cloud (Score:2)

by NicknamesAreStupid ( 1040118 ) writes:

It means inclement weather; it rains; it pours; it delays air traffic; it's gloomy. You can look up at it and see whatever you can imagine, but it is not real. It goes away when you most need it. It is all wet.
- Re: (Score:2)
  
  by Bloodwine77 ( 913355 ) writes:
  
  My guess is "cloud" is used because networking diagrams have historically used a cloud icon for the internet to mean it was nebulous and alien to the network.
TMZ is one site. (Co worker's main news site) (Score:2)

by Bobzibub ( 20561 ) writes:

Their error page is rejected by firefox. So I wgetted it to see why.
At the bottom is a script from RUSSIA (in my best Max Headroom voice) (the src is addonrock.ru/Templatel.js)
So perhaps AWS is hacked?
to the cloud! (Score:2)

by MECC ( 8478 ) writes:

just like the ad...
Coincidence, PSN? (Score:2)

by grikdog ( 697841 ) writes:

Does Sony's PSN sublet capacity on Amazon's cloud? PSN is down for "a day or two" according to stuff on Google.
- Re:Reddit is down because of this (Score:5, Funny)
  
  by cobrausn ( 1915176 ) writes: on Thursday April 21, 2011 @11:50AM (#35894934)
  
  You're posting on Slashdot, so I believe you already found the answer.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by MobileTatsu-NJG ( 946591 ) writes:
    
    You're posting on Slashdot, so I believe you already found the answer.
    Yeah but maybe he's hungry for news.
  - - Re: (Score:2)
      
      by lgw ( 121541 ) writes:
      
      It's not the same! I want atheists being smug about not believing in god (and refusing to capitalise), and liberal lefties telling each other that the government needs to be more liberal while Libertarians accuse them of worshipping Obama, photoshopped pictures that have been debunked dozens of times before, people claiming to be things that they aren't and answering questions, and hero worshipping of Ron Paul!
      You just don't get enough of that here.
      You said it ... we can't post pictures here (for which those of us here in the Goatse spam days were quite thankful).
- Re: (Score:2)
  
  by badran ( 973386 ) writes:
  
  Productivity in Offices will reach record levels today.
- - Re: (Score:2)
    
    by wiggles ( 30088 ) writes:
    
    They took Digg down last year and replaced it with this horrible monstrosity they called 'v4' or something. It's a shame they just took such a popular site offline and haven't provided a decent replacement.
    - Re:Reddit is down because of this (Score:5, Informative)
      
      by Richard_at_work ( 517087 ) writes: on Thursday April 21, 2011 @12:46PM (#35896044)
      
      Don't worry - Slashdot just did something similar. When I try and reply to comments through my accounts comments history page, its horribly horribly broken. Each attempt to click in the reply box loads a new comment further up in the comment tree, and scrolls the page to the newly loaded comment. Scroll back down, click in the box again and it loads anotehr comment and shunts me back up the page. It can get really fucking annoying when you are trying to reply to a comment thats quite a way down a long tree.
      
      Parent Share
      twitter facebook
      - Re: (Score:2)
        
        by recoiledsnake ( 879048 ) writes:
        
        That even happens if you just click on posts. Not to mention that the comment scores are sometimes hidden randomly and you have to do all the clicking till you see them.
      - Re: (Score:2)
        
        by Ant P. ( 974313 ) writes:
        
        There's an easy fix for that: block javascript and turn on classic discussion mode. Not only will /. actually work, it'll feel 10 times faster!
  - Re: (Score:2)
    
    by jpmoney ( 323533 ) writes:
    
    People still go to digg? Oh, I see what you did there.
    I actually went to Digg this morning since Reddit is down. I haven't been in months since I removed them from my RSS reader. All I have to say is "ouch". Front page stories with a whopping 5 comments? Its pretty sad.
    - Re: (Score:2)
      
      by jafuser ( 112236 ) writes:
      
      This is the first time I've been back here in a while. I decided to try it when I realized reddit's downtime is probably going to be a while. I still feel a reverence for this place. It sort of reminds me of going back and visiting my university.
      Digg can rot in hell.
    - Re: (Score:2)
      
      by mini me ( 132455 ) writes:
      
      Digg never had much comment activity when compared to similar sites (Slashdot, Reddit, etc.). Which is a shame, because the comments are usually more entertaining than the actual links.
- - Re: (Score:2)
    
    by Scorchio ( 177053 ) writes:
    
    You'd get an upvote, but I haven't seen mod points in a long time...
- Re: (Score:2)
  
  by codepunk ( 167897 ) writes:
  
  In about the time it took you to write that message I spun up a standby deployment in another data center smart guy.
  - Re: (Score:2)
    
    by characterZer0 ( 138196 ) writes:
    
    How long does it take you to have the IP addresses rerouted?
    - Re: (Score:2)
      
      by codepunk ( 167897 ) writes:
      
      eip's move in seconds but in my use case I do not need eip's since a front end is handing off the requests to the cloud systems.
  - Re: (Score:2)
    
    by TooMuchToDo ( 882796 ) writes:
    
    Really? Wow. Perhaps you should let major sites like Reddit know. They've been down for *hours*.
    The cloud works if you don't care about having control over when your business is down.
    - Re: (Score:2)
      
      by cduffy ( 652 ) writes:
      
      Really? Wow. Perhaps you should let major sites like Reddit know. They've been down for *hours*.
      The cloud works if you don't care about having control over when your business is down.
      Last time I had a physical DC go down it was a cooling failure. Didn't have much control over that either.
      Moreover, with a cloud vendor I can have servers in multiple sites with different power, connectivity, and geographic location without massive investment in each.
- Re: (Score:2)
  
  by cduffy ( 652 ) writes:
  
  Amazon has "availability zones" for a reason, as do other cloud vendors.
  If your infrastructure isn't resilient against everything in a zone suddenly disappearing, you're Doing It Wrong.
  - Re: (Score:2)
    
    by mini me ( 132455 ) writes:
    
    I understand the need for physical availability zones, but the whole idea behind the cloud is that you, the end user, need not care about those details. It is up to the cloud provider to figure it out. The cloud represents a black box, of sorts. If they are having trouble in one zone, everything should automatically migrate to another without anyone outside of the operation knowing it.
    I'm not saying Amazon's solution is bad, but I'm not sure it is in the spirit of what I would consider real cloud hosting. R
- Re: (Score:2)
  
  by sbrown123 ( 229895 ) writes:
  
  Scalability: yes.
  Cheap: yes.
  Reliability: they don't say they are 100% fail safe. I think the figure is still in the 90's though which is pretty good.
  If anyone tries to sell you 100% they are liars.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

No Way! (Score:5, Funny)

Re: (Score:3)

Re: (Score:2)

Re: (Score:3, Insightful)

Re:No Way! (Score:5, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re:No Way! (Score:5, Informative)

Re: (Score:3, Insightful)

Re: (Score:3)

Re: (Score:3)

Re: (Score:2)

Severe weather in Virginia likely the culprit (Score:3, Informative)

Re:Severe weather in Virginia likely the culprit (Score:5, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2, Informative)

Re: (Score:3)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

bean counts screw us again! (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Lucky (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

The dark side of outsourcing (Score:3)

Give me my Reddit back! (Score:2)

Emergency Plan (Score:5, Interesting)

Re: (Score:3, Insightful)

Re: (Score:3)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re:Emergency Plan (Score:4)

Re: (Score:2, Informative)

Re: (Score:2)

Re: (Score:3)

Not so bad.. (Score:2)

Judgement Day (Score:2)

6 weeks before the AWS summit 2011 (Score:4, Interesting)

Re: (Score:2)

Re: (Score:3)

Re:6 weeks before the AWS summit 2011 (Score:5, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Soo... (Score:2)

Inappropriate metaphor - the cloud (Score:2)

Re: (Score:2)

TMZ is one site. (Co worker's main news site) (Score:2)

to the cloud! (Score:2)

Coincidence, PSN? (Score:2)