×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Dark Day In the AWS Cloud: Big Name Sites Go Down

timothy posted about 8 months ago | from the central-authority-vs-resilience dept.

Cloud 182

An outage of one company's servers might only affect that company's customers — but when a major data center for Amazon hits kinks, sites that rely on the AWS cloud services all suffer from the downtime. That's what happened today, when several major sites or online services (like Instagram and AirBnB) were knocked temporarily offline, evidently because of problems at an Amazon data center in Northern Virginia. From TechCrunch's coverage of the outage: "The deluge of tweets that accompanied the services’ initial hiccups first started at around 4 p.m. Eastern time, and only increased in intensity as users found they couldn’t share pictures of their food or their meticulously crafted video snippets. Some further poking around on Twitter and beyond revealed that some other services known to rely on AWS — Netflix, IFTTT, Heroku and Airbnb to name a few — have been experiencing similar issues today."

cancel ×
This is a preview of your comment

No Comment Title Entered

Anonymous Coward 1 minute ago

No Comment Entered

182 comments

Say what you will (0, Troll)

Anonymous Coward | about 8 months ago | (#44672587)

but I'd rather have a few strategically placed servers in datacenters spread around the country (world?) than something hosted on AWS.

Re:Say what you will (5, Funny)

Anonymous Coward | about 8 months ago | (#44672637)

In Soviet Russia, company's customers go down on YOU!

Re:Say what you will (5, Funny)

Zemran (3101) | about 8 months ago | (#44673839)

"In Soviet Russia, company's customers go down on YOU!"

Now we know the truth about why Snowden went there...

Re:Say what you will (4, Interesting)

rudy_wayne (414635) | about 8 months ago | (#44672679)

One of the features of AWS was supposed to be the ability to reroute everything to a different datacenter if one goes down. I know I read that somewhere back when AWS was first starting up. You don't think they lied, do you?

Re:Say what you will (5, Insightful)

teknopurge (199509) | about 8 months ago | (#44672729)

That's expensive. "Cloud" hosting services cost about 1.5x traditional hosting. When you want multiple locations("regions" in aws) you need to pay for resources in each additional region, then pay another cost to provide that failover. Cloud hosting is great, but it's nothing it does is new or cheaper than hosting 10 years ago.

Re:Say what you will (1, Interesting)

AHuxley (892839) | about 8 months ago | (#44672985)

Re you need to pay for resources in each additional region.
Why the lack of power and real optical links that where regional, power distinct.
Is this like the idea of linking to a site/city/state/regional 'ring' many times? Very safe from any local cut/drop, cheap, but still very dependant on one geographic provider?
You also have a submarine communications cable (France to the USA) on the way for that State?? ...the regional services should be good?

Re:Say what you will (5, Informative)

Anonymous Coward | about 8 months ago | (#44673137)

either you don't speak English or you need to take your meds. no offense. so i'll try muddling a reply together for you.

There are many ways to setup remote failover systems. Most of them rely on some type of heartbeat system where there's a "heartbeat message" which they all send each other periodically, and if the current Active goes out of response for too long the others choose one to take over. So it doesn't matter if they're all in one room connected with a single switch, or spread all over the planet.

The real rub for any mechanism is DNS... if the primary server your FQDN points at drops then you might have redundancy but most people won't be able to take advantage of it. With more manual mechanisms (such as telling users "If our primary site goes down, try here instead!") that's not as much of a concern, just a PITA to keep track of.

Re:Say what you will (3)

alen (225700) | about 8 months ago | (#44673053)

yeah, but cloud is sold as this super cheap way to compute and have five nines reliability

actually, no (5, Informative)

Chirs (87576) | about 8 months ago | (#44673215)

"cloud" is sold as a *convenient* way to compute, where it's quick to add resources when needed so you can start small and scale up (and down) with demand.

It is *not* generally considered a cheap or particularly reliable solution. So far at least none of the cloud providers are offering five nines--if you want that, you should (for now at least) jbe looking at enterprise/telecom gear.

Realistically (3, Insightful)

corran__horn (178058) | about 8 months ago | (#44674173)

Chances are that there are no providers that offer a true 99.999% uptime. If you demand that, you need to be building your code to run in a HA cluster with nationwide dispersion. (For reference, you get 5.25 minutes of downtime across a whole year).

99.999% uptime is also completely unnecessary, but sounds really good to management until you talk cost.

Re:Say what you will (2)

sribe (304414) | about 8 months ago | (#44673063)

...nothing it does is new...

Ahem, it siphons additional funds from customers ;-)

Re:Say what you will (2)

mysidia (191772) | about 8 months ago | (#44673805)

When you want multiple locations("regions" in aws) you need to pay for resources in each additional region, then pay another cost to provide that failover.

Or you can have storage in those regions prepped to failover, with no other resources provisioned. When failure needs to occur, you start spinning up the instances in the other region.

It does require planning; you can reroute But you don't get that automatically; it requires work and preparation.

Re:Say what you will (1)

Skapare (16644) | about 8 months ago | (#44673877)

Configure the warm resources in the other region to constantly monitor the primary. If the primary goes down, they automatically activate the secondary.

Re:Say what you will (4, Insightful)

Zemran (3101) | about 8 months ago | (#44673865)

"nothing it does is new or cheaper than hosting 10 years ago."

Welcome to the wonderful world of marketing. Sell people what they already have for 50% more.

Re:Say what you will (1)

Anonymous Coward | about 8 months ago | (#44672753)

That kind of redundancy is great, but not if you have a connectivity issue and your load balancers are impacted which is what happened here. Also, moving all traffic from one DC to another is a major shift; so depending on the problem and how long it may take to fix, it might not be worth it. Shifting everything over and back is a great feature to have, but it does come at a cost.

Re:Say what you will (1)

ModernGeek (601932) | about 8 months ago | (#44672787)

You just have multiple DNS records for each service, and the client should move on to the next if one is down.

Re:Say what you will (2)

whoever57 (658626) | about 8 months ago | (#44672837)

You just have multiple DNS records for each service, and the client should move on to the next if one is down.

Unfortunately, "should" is rarely "does". If a brower receives multiple IP addresses for a name, it doesn't try them in turn, it just tries one.

Re: Say what you will (2, Informative)

Anonymous Coward | about 8 months ago | (#44673189)

Most modern browsers do, indeed, try the next address. It' s a browser feature, though, not an official standard.

Re:Say what you will (5, Informative)

chrisgeleven (514645) | about 8 months ago | (#44672871)

Assuming you mean traditional round-robin A records, the timeout(s) you still have to suffer through would kill your latency.

If your talking about DNS providers (disclaimer, I work for Dyn) with advanced features that detect a failover event occurring and will only serve healthy A records, then that is a different story.

Re:Say what you will (0)

Anonymous Coward | about 8 months ago | (#44673201)

Assuming you mean traditional round-robin A records, the timeout(s) you still have to suffer through would kill your latency.

If your talking about DNS providers (disclaimer, I work for Dyn) with advanced features that detect a failover event occurring and will only serve healthy A records, then that is a different story.

Well in terms of a massive failure, most people would be a lot happier with a forced re-connect after a few minutes than simply being down.
As for DNS, there are all kinds of things you can do, but your solution won't help if the provider's server goes out. You can implement a similar solution if you run your own nameservers, one at each datacenter. Then if your Active node goes dark, and assuming the TTL is low enough on your DNS records, since it's hosting the primary nameserver, other DNS servers will go to the secondary to get a record update. And it'll start serving out the datacenter it's located in since the primary is out of comms. You still run into slow 'failover' problems of course, especially with 3rd party DNS which enforces its own TTL, clients which cache the lookup, etc.

Re:Say what you will (2)

petermgreen (876956) | about 8 months ago | (#44673087)

Most clients either won't move on to a second IP at all or will only move on once the OS times out the first TCP connection. And OS TCP connection timeouts are long enough that most users won't put up with them for interactive services.

A better strategy is to put a DNS server in each datacenter, make the TTLs short and set things up to automatically remove records if a server goes offline. This works much better because DNS fallback timeouts are much shorter than TCP connection timeouts.

Re:Say what you will (3, Interesting)

tnk1 (899206) | about 8 months ago | (#44673485)

Supposedly the load balancer problem did not affect LBs that have backing hosts in two availability zones according to the article. The major question is... who runs everything in one availability zone? You're not supposed to do that for high availability sites.

Re:Say what you will (5, Insightful)

Anonymous Coward | about 8 months ago | (#44672845)

No they didn't lie. You can set things up that way-simply set up your servers in multiple data centers(AWS availability zones) and load balance between them. It's foolish to just throw things up in the cloud and think magically I won't ever have to worry about downtime ever again. It's foolish-but a lot of companies act this way.

Somehow cloud hosting is taken as the silver bullet to prevent outages-it isn't. You still have to architect things the way you would normally if you're looking for things like disaster recovery, high availability, etc...etc..

Re:Say what you will (3, Insightful)

rudy_wayne (414635) | about 8 months ago | (#44672899)

No they didn't lie. You can set things up that way-simply set up your servers in multiple data centers(AWS availability zones) and load balance between them. It's foolish to just throw things up in the cloud and think magically I won't ever have to worry about downtime ever again. It's foolish-but a lot of companies act this way.

But that's the problem. *THEY* (i.e., AWS or whoever) are supposed to take care of all that stuff. They're supposed to worry about "uptime" and fixing things when they break and having redundant systems that kick in when something breaks so that there's no loss of service. That's the whole point of putting stuff in the "cloud".

If * I * have to worry about that stuff then I might as well just do it myself and not give my money to Amazon.

Re:Say what you will (1)

Cyberax (705495) | about 8 months ago | (#44672959)

How do you do it automatically? It's simply not possible to transparently replicate arbitrary VMs across geographically distant datacenters (lightspeed and all that...).

However, AWS provides tools for developers to do it.

Re:Say what you will (2)

silas_moeckel (234313) | about 8 months ago | (#44673059)

Oh it's possible to do just rather expensive to do well. Disk based bits don't work as sync writes past a region take far to long. Higher up the stack you can deal with 35-70ms of network latency. But now it's not mysql with any old crappy php code.

  AWS is PHB buzzword like IBM a decade and a half ago it makes the VC guys happy that you fixed your scaling issue. In reality everybody else's scaling issues now impact you.

Re:Say what you will (5, Interesting)

Cyberax (705495) | about 8 months ago | (#44673079)

Well, right now I have 500 machines running some heavy calculations in multiple AZs. Works perfectly fine, we have noticed the recent problems but simply stopped using the affected region (us-east-1) for the time being, shifting our calculations to other regions.

AWS is really great at scaling. It's better than anything else on the market, but it does require a lot of work.

Re:Say what you will (0)

Anonymous Coward | about 8 months ago | (#44673251)

Not possible you say?

http://technet.microsoft.com/en-us/library/jj134172.aspx

Re:Say what you will (1)

Anonymous Coward | about 8 months ago | (#44673051)

That has never been the "contract" with cloud. Amazon does not, and cannot, understand the architecture of every one of the multitude of applications running in their cloud. You can pay to have that kind of support from various companies (maybe even including Amazon) but it's not what "the cloud" is.

The functionality is there for you to make an extremely robust application in AWS -- if you actually take advantage of it and if it's necessary for your business needs to do to that much effort/expense.

Re:Say what you will (4, Interesting)

Glendale2x (210533) | about 8 months ago | (#44673091)

No, you have to manage your own redundancy and failover on AWS. Look at all the effort Netflix has put into programming failover and stress testing and yet they still have frequent outages with AWS.

Re:Say what you will (4, Informative)

Anonymous Coward | about 8 months ago | (#44673207)

AWS Status Dashboard?

I know this is /., and people here don't like to read, but did anyone actually read the status dashboard posts?

This issue was limited to a single AZ, effected only a small number of machines, and was specifically an issue with added latency in EBS volumes. And Amazon completely resolved the issue in 4 hours.

So, call me crazy, but didn't they do exactly what they are supposed to do? Also, AWS quite clearly states that any given AZ *might* fail. Hence, if you want any sort of high-availability, you replicate across different AZs.

Plus, I have 10+ EC2 instances, and a number of other resources with AWS, and none of them were effected by this outage.

cloud is convenient, not reliable (2)

Chirs (87576) | about 8 months ago | (#44673227)

As a cloud customer, reliability (currently at least) is up to you. If you want the extra reliability of running instances in multiple availability zones then it's up to you to pay for it.

The point of the cloud as it stands currently is not that it's cheap or reliable, but that it's easy to scale up/down with demand.

Re:Say what you will (1)

sshir (623215) | about 8 months ago | (#44673355)

If you care that much about availability, you should do it yourself anyway.

The first rule of diversification: don't put your eggs into correlating baskets.

In this context it means that if your primary is on AWS, then your secondary must be on Rackspace or whatever - NOT other AWS.

Re:Say what you will (3, Insightful)

mysidia (191772) | about 8 months ago | (#44673829)

But that's the problem. *THEY* (i.e., AWS or whoever) are supposed to take care of all that stuff. They're supposed to worry about "uptime" and fixing things when they break and having redundant systems that kick in when something breaks so that there's no loss of service. That's the whole point of putting stuff in the "cloud".

Boy have you been fed a line. Read the SLA. If it's not in there; then you don't get it.

If you think the cloud provider is clustering your instance and giving you HA; then AWS is not for you.

Amazon provides availability zones you can provision separate instances storage and networks in. If your application cannot survive the failure of an instance and the failure of an entire availability zone, then you don't have HA, and Amazon won't give it to you -- your app may be inappropriate for AWS, if HA is required.

Re:Say what you will (0)

Anonymous Coward | about 8 months ago | (#44674135)

*THEY* (i.e., AWS or whoever) are supposed to take care of all that stuff.

Did they write your applications?

If not, how can you expect them to ensure that YOUR application stays up uninterrupted on their services, when you've architected it in a poorly thought out manner?

You seem to think that "cloud" = "no thought required." That's not the case. The value Amazon gives you is the ability to rapidly expand and contract your capacity as application loads change, the ability to achieve multiple-site redundancy with less hardware investment on your side, and the ability to manage all of your datacenter assets programmatically.

But YOU have to build that capability into your application, and YOU have to figure out how to take advantage of those services. YOU will not do it as cheaply or effectively as Amazon will, at least not at feature parity. And that's why you give money to Amazon.

Re:Say what you will (1, Insightful)

JDG1980 (2438906) | about 8 months ago | (#44673817)

No they didn't lie. You can set things up that way-simply set up your servers in multiple data centers(AWS availability zones) and load balance between them. It's foolish to just throw things up in the cloud and think magically I won't ever have to worry about downtime ever again.

But that was one of the big promises of "the cloud": that you'd never have to worry about the nitty-gritty of network administration again, your provider would handle all that for you. If that isn't the case, then you gain nothing and might as well host the data yourself.

Re:Say what you will (3, Interesting)

hawguy (1600213) | about 8 months ago | (#44673977)

No they didn't lie. You can set things up that way-simply set up your servers in multiple data centers(AWS availability zones) and load balance between them. It's foolish to just throw things up in the cloud and think magically I won't ever have to worry about downtime ever again.

But that was one of the big promises of "the cloud": that you'd never have to worry about the nitty-gritty of network administration again, your provider would handle all that for you.

There are many different flavors of "cloud" computing - if you throw your app at a cloud provider and blindly expect them to make it highly available, then you'll get what you deserve. There is no end of cloud solution providers that will be happy to help you architect your app for whatever level of redundancy you want. But it's not going to be free.

Amazon does let you get rid of your network admin and concentrate on managing the servers. No need to worry about BGP, buying bandwidth from multiple redundant providers, buying and administering your own firewalls, network switches, routers, etc.

But you still have to manage your servers. Amazon will help you with multi-AZ redundancy for things like MySQL.

If that isn't the case, then you gain nothing and might as well host the data yourself.

That's depends heavily on your use case. If you have a relatively small number of servers, or have large demand spikes, Amazon can be much more cost effective than hosting your own servers. If you have hundreds of servers and keep them busy all the time, you can probably save money by doing it yourself.

But if you have dozens of servers, then it's likely that you'll save money with Amazon over buying your own servers, network gear, a SAN, backup solution, hardware service contracts, etc.

But you have to architect your application properly. We have our core servers split across multiple AZ's with the database replicated across those AZ's. We don't trust our failover/failback scripts enough to make it automatic, so we have a simple web interface to let anyone on the tech team do the failover. The only impact we saw in this outage was higher latency and timeouts to some of our app servers, but our database was not in the affected zone, and Amazon's load balancer correctly routed traffic to the servers in the good AZ.

Additionally, we have a warm spare running in a different region - the servers are kept up to date with data, but they are running in smaller instance types than we need to run our app, do to a regional failover, we'd have to reboot them into larger instance types (our app startup scripts already tune memory parameters to take advantage of the greater amounts of RAM in the larger instances), then repoint DNS.

Re:Say what you will (1)

Anonymous Coward | about 8 months ago | (#44673035)

That functionality is there -- I use it in my own deployments. The thing is, it's not automagic. You have to actually architect your application to take advantage of AWS features.

TANSTAAFL

Re:Say what you will (1)

Glendale2x (210533) | about 8 months ago | (#44673057)

Outages with AWS and cloudy friends are becoming so common it's almost a non-story at this point.

Re:Say what you will (0)

Anonymous Coward | about 8 months ago | (#44674141)

Probably because properly designed applications that require "high availability" weren't impacted by this outage. Because if they require "high availability," they weren't using resources from a singe availability zone. Which means the resources were available in other AZ's - there was perhaps some temporary loss of capacity due to the outage, but a well-built application (and a properly-planned ops protocol) would detect that and spin up additional capacity in one of the other (functional) AZ's until the problem was resolved.

If you're not testing this stuff, you deserve the outages you get. If you're just dropping your application on a EC2 instance and assuming "it'll work just fine," then you deserve the outages you get.

And you can do it with AWS (2)

Cyberax (705495) | about 8 months ago | (#44672731)

You can do it with AWS, no problem. Only one region was affected this time, other regions are OK.

Re:And you can do it with AWS (1)

sshir (623215) | about 8 months ago | (#44673629)

Saying that it's not a problem does not make it so. Besides, as soon as people learn to failover gracefully guess what would start to happen: other regions would begin buckling under load.

Re:And you can do it with AWS (1)

Cyberax (705495) | about 8 months ago | (#44673667)

If you really need your servers to be up, then you should buy enough reserved instances in target regions. They are not oversold and guaranteed to be available.

So yes, making resilient architecture on top of AWS is possible and is not that hard. You'll definitely have to pay extra money for it, but much less if you tried to build it yourself.

Re:And you can do it with AWS (1)

sshir (623215) | about 8 months ago | (#44673785)

And my point was that when everybody will try to buy them, they would become either oversold or become really-really expensive.

Actually, regardless, money becomes an issue really fast anyway - few days ago Wired run a story that for many types of loads AWS does not make much financial sense anymore and people started to add two and two together. In other words - people are prepared to pay only so much (in a pinch a little bit extra) - ask a little bit more and they'll start to roll their own.

Lack of reliability (1)

BitZtream (692029) | about 8 months ago | (#44672601)

How is it that AWS is less reliable than the 4 Windows machines I get stuck managing? One of which has had a failed CPU for a few years now ... yet its still going.

Re:Lack of reliability (0)

Anonymous Coward | about 8 months ago | (#44672721)

perhaps because your 4 windows machines aren't getting a billion hits every day?

Re:Lack of reliability (0)

Anonymous Coward | about 8 months ago | (#44672767)

Because it's a ton more complex and if a major connectivity issue happens (which seems to be what happened here) then things break.

Re:Lack of reliability (1)

cheater512 (783349) | about 8 months ago | (#44673239)

You look after 4 servers. Amazon looks after 100,000 times that.

If every server has a 1 in 100 chance of failing each year, you have to wait over 10 years to reach a 50% chance that a server has failed.
Amazon would have about 11 dying per day. Its amazing that their systems can handle 99% of those failures seamlessly.

(My math may be way out but you get the point)

Re:Lack of reliability (2)

Mashiki (184564) | about 8 months ago | (#44673249)

You look after 4 servers. Amazon looks after 100,000 times that.

I thought there were no servers in the cloud, just people willing to take your money and piss on you.

Running List of Cloud Outages? (3, Insightful)

bill_mcgonigle (4333) | about 8 months ago | (#44672605)

I thought this might already exist, but I'm not finding it with a quick Google search. Seems like it's a thing that could get ad views from some decent IT audiences.

Re:Running List of Cloud Outages? (3, Funny)

sottitron (923868) | about 8 months ago | (#44673337)

You should totally create this. I hear AWS is the way to go to get things online quickly and at scale.

watch out for birth rates (2, Funny)

Anonymous Coward | about 8 months ago | (#44672611)

When morons can't watch TV (or equivalent) they fuck. 9 months later you'll see a birth rate spike.

Re:watch out for birth rates (0)

Anonymous Coward | about 8 months ago | (#44672645)

That's no way to talk about your parents.

Re:watch out for birth rates (0)

Anonymous Coward | about 8 months ago | (#44672745)

Mod parent up. A couple years from now preschools all over the world will be inundated with children with a heightened propensity for sniffing glue.

Re:watch out for birth rates (0)

Anonymous Coward | about 8 months ago | (#44673813)

They sound pretty clever to me.

Re:watch out for birth rates (0)

Anonymous Coward | about 8 months ago | (#44673953)

So I'm assuming your significant other is either bowlegged or pregnant

Why... (0)

Anonymous Coward | about 8 months ago | (#44672639)

Why do people even cloud ? Real dedicated overpowered servers with multiple Gbps pipes are available for a few hundred bucks these days...

Re:Why... (2)

AHuxley (892839) | about 8 months ago | (#44673003)

Middle management in their luxury SUV/sedans sit in daily commutes behind buses with descriptive 'cloud' ads. The upgrade message filters back to their bosses over time?

Re:Why... (1)

LordLimecat (1103839) | about 8 months ago | (#44673841)

Now add cooling, power, generators, physical security, a SAN, a virtualization platform, and multiple failover sites.

Re:Why... (0)

Anonymous Coward | about 8 months ago | (#44674159)

Great, you gonna plunk them down in the middle of a Starbucks? Or you gonna build a massive datacenter around them, with heating, cooling, power, support staff, security, and some level of on-site power generation?

And then if you want redundancy, doing it in at least one other place far enough away from the original that the same snowstorm or earthquake won't take them both out?

That's why people cloud. The people who say "it's as simple as buying a server for a couple hundred bucks" are the ridiculous shills in this case.

Add Adobe Creative Cloud to the List too (3, Interesting)

JenovaSynthesis (528503) | about 8 months ago | (#44672699)

That went down and I think it ate some files with it. Just before the crash my client reported 103 files being removed. They weren't by me.

Re:Add Adobe Creative Cloud to the List too (0)

Anonymous Coward | about 8 months ago | (#44672843)

I would expect them to come back up when the adobe creative cloud comes back up.

"removed" just means "client can't see them" because the datastore is unreliable.

It doesn't mean deletion... at least it shouldn't.

Adobe should be providing a higher level of support for creative cloud.

--Sam

Big names (0)

Anonymous Coward | about 8 months ago | (#44672773)

Yeah, sure. Maybe in the Bay Area.

Where are the NSA comments?? (2, Funny)

Guru80 (1579277) | about 8 months ago | (#44672779)

I thought for sure the first comment would be "I'm on to you NSA...down time for service "upgrades" " I'm disappointed in you my tin foil hat wearing brethren.

Re:Where are the NSA comments?? (0)

Anonymous Coward | about 8 months ago | (#44672863)

So "tin foil hat" now refers to people who don't think the NSA is competent enough to keep their monitoring systems running during an upgrade? Did I miss a memo?

Re:Where are the NSA comments?? (1)

AHuxley (892839) | about 8 months ago | (#44672891)

An interesting part is why the brands selected to stay in one part of the USA? With all that cheap power, skilled workers and tax breaks offered by other states?
What keeps big data clinging to the Eastern USA?

Re:Where are the NSA comments?? (1)

Anonymous Coward | about 8 months ago | (#44673237)

consider the densely populated eastern coast, from miami to boston.. especially from dc to new york.. dc/northern virginia has always been one of the major internet hubs in the u.s. it couldn't have anything to do with proximity to the building with j. edgar's name on it... could it?

Re:Where are the NSA comments?? (1)

i.r.id10t (595143) | about 8 months ago | (#44673655)

population density, closeness to physical infrastructure, larger pool of qualified workers (maybe), etc.

Oh we're here. (0)

Anonymous Coward | about 8 months ago | (#44672895)

Don't lose faith that easily.

Re:Where are the NSA comments?? (0)

Anonymous Coward | about 8 months ago | (#44672903)

Yeah it's curious why it's the Norther Virginia site that gets these reported outages... Could it have something to do with proximity to NSA?

Re:Where are the NSA comments?? (0)

Anonymous Coward | about 8 months ago | (#44672997)

I think that as we all proceed into the future of technology, we each trust it less and less. As far as we can tell (any of us), technology is as flawed as the mindset of whatever political zeitgeist that runs it. Maybe there's no more tin foil hat wearing brethren on slashdot. I see that the numbers of commenters has dropped here lately.

Has Rackspace had any outages in 10 years or so? (5, Interesting)

MillerHighLife21 (876240) | about 8 months ago | (#44672811)

I've run servers on both Amazon and Rackspace for several years now and I can't recall a single instance of Rackspace having an outage. On the other hand, Amazon seems to have major issues at least 2 or 3 times a year. Is this stuff tracked anywhere?

Re:Has Rackspace had any outages in 10 years or so (1)

Anonymous Coward | about 8 months ago | (#44673005)

Yes. Rackspace even has an outage on their main website that lasted *days* just few months ago, if you wanted to access it via IPv6. Sadly, there was not easy place to report the outage. The technical contact in whois is something at netnames.com? So I just ignored it.

Anyway,

    https://status.rackspace.com/ [rackspace.com]

lots of reports of small issues. You should know this stuff if you are running an instance on their hardware!!

Re:Has Rackspace had any outages in 10 years or so (0)

Anonymous Coward | about 8 months ago | (#44673013)

RS has had issues

Re:Has Rackspace had any outages in 10 years or so (1)

AHuxley (892839) | about 8 months ago | (#44673109)

Would make a good site, a historic long term heat map of server outages. A lot of tech press to search back into, thankfully you can buy into digital press databases :)

Re:Has Rackspace had any outages in 10 years or so (5, Informative)

CritterNYC (190163) | about 8 months ago | (#44673289)

It depends which data center you're in. PortableApps.com has been hosted at Rackspace for years and we had multiple major outtages due to ongoing power issues in the Dallas data center in 2009. The switch from grid to ups was failing and would take the whole wing of the data center out with every server crashing hard. It would take quite a while to come back up. Then we'd have to wait hours for the Rackspace folks to rebuild our corrupted database (fully managed account on a dedicated server). It happened two weekends in a row in June and one other time if I recall correctly, basically costing us a full day of downtime each time.

Re:Has Rackspace had any outages in 10 years or so (0)

Anonymous Coward | about 8 months ago | (#44673605)

Amazon also offers some of the cheapest prices, so you pay for what you get.

Re:Has Rackspace had any outages in 10 years or so (1)

Anonymous Coward | about 8 months ago | (#44673681)

I had an outage in IAD just two weeks ago. Connectivity failure on several aggregates affecting many customers. Rackspace shill much?

Maybe wasn't Amazon fault (0)

gmuslera (3436) | about 8 months ago | (#44672815)

Maybe the NSA screwed things a bit when were installing there their new codenamed program (after Snowden published all the old ones).

Amazon Storefront Problems? (2)

TechHSV (864317) | about 8 months ago | (#44672835)

Were there any problems with Amazon.com? You'd assume they use their own service.

NSA Flood Spoils The Show (-1)

Anonymous Coward | about 8 months ago | (#44672893)

NSA Director General Alexander is "Balmer Pissed" at reports of his further ... uh ... eh ... 'Miss' behavior, shall we say.

This and Obama's secret directive to position elements of the U.S.A. Navy forces in the Eastern Mediterranean, correlated with B52 deployments from Minot AFB for a pre-emptive strike on Damascus and the Russian Naval Base (Obama calls it a Soviet Base) on the coast of Syria are not playing well in the mind of Obama who is visually befuddled.

DoD has dispatched elements of the 23d Bomb Squadron (23 BS) United States Air Force unit assigned to the 5th Bomb Wing stationed at Minot AFB, North Dakota to targets in Syria. Currently [04:50 UTC] the dispatched B52s are over-flighting UK for re-fuling on their trajectories to targets in Syria. Information indicates these particular B52s are equipped with AGM-69 SRAM nuclear missiles in violation of antinuclear treaties with the Soviet Union and updated B28FI model hydrogen bombs in violation of all antinuclear treaties with the Soviet Union and United Nations.

Opinion: I would say that Obama will re-play the errors of President Jimmy Carter, i.e. Operation Eagle Claw aka Desert One, in his preemptive nuclear strike within Syria in the next few hours, say less than 5 EST.

"God help us. We have a Mad Man in the White House. And he does NOT know, nor can he understand, the error that he is committing this nation to."

I hear-by renounce my allegiance to the United States Of America and for which it now stands.

Re:NSA Flood Spoils The Show (0)

Anonymous Coward | about 8 months ago | (#44673183)

"I hear-by renounce my allegiance to the United States Of America and for which it now stands."

You are the property of the Corporation called the UNITED STATES OF AMERICA and your allegiance is not yours to renounce. You are mortgaged property belonging to the Federal Government of the Corporation called the United States of America. You must do as you are told.

Statistically unlikely # of sites going down (1)

Anonymous Coward | about 8 months ago | (#44672897)

What is going on? I don't buy it. While I get that you can't tell when the NSA has tapped the line I would imagine that things might go down in such instances. Something has to go down before they cut the line unless there are multiple entry points maybe. However I wonder if this has something to do with the way things are being done. That is there not tapping the line any more. Rather they are force implementing taps that provide access to specific data types. For instance now they can do more than just search for strings in users email. Now they can see a users facebook page as they user sees it for example instead of just a series of texts.

I gues stuff goes down. But not at the rate in which major sites are going down. If there is a logical explanation of some kind that impacts everybody (sun bursts radiation type thing) then please... provide it. But they all seem to be giving vague answers to the reasons the sites have gone down. Ebay it was 'regular maintenance' gone amok (I do believe that was scheduled, but others haven't been, I don't think).

Re:Statistically unlikely # of sites going down (1)

Anonymous Coward | about 8 months ago | (#44673621)

Data centers do crash from time to time, and if all of your "cloud" is in one data center, then it's not really a cloud at all.

I forget Amazon's exact terminology, but they have region, and within those, they have data centers. When you setup your infrastructure, you pick a data center. You can provide some fault tolerance by failing over within the overall region, or you can choose to synchronize between (and therefore failover to) an entirely different region. That is the cloud: when an data center failure does not cripple your operation.

From the sounds of it--and it has happened before--Amazon's entire AWS Northern VA region failed. Anyone not able to recover from that was down (which theoretically includes my own site that is hosted in single a Northern VA AWS data center). Anyone that was paying the extra money, and had setup their software to be ready for it, was humming along just fine, albeit likely a little bit slower due to added strain on the other regions as a result of redirected traffic that would traditionally have gone to the failed data center(s).

Ironically, I did not see any issues whatsoever during the failure today, which just means that it didn't completely fail or I did not spot it (my site is not doing anything fancy to handle hardware failure or notifying me of it). This leads me to believe that there was a problem, but both Instagram and Vine are probably not setup nearly as optimally as they should be given their scale (from my understanding, Instagram is a joke anyway and Facebook got incredibly ripped off; they should have made it themselves and burned Instagram to the ground).

Multiple regions, anyone? (1)

kriston (7886) | about 8 months ago | (#44673171)

Isn't this why AWS offers multiple regions?

Such large sites should understand that having multiple availability zones means nothing if the zones are all in the same region. Oh, and your application would need to be designed for failover.

In addition, when looking for high-availability, you don't segregate your audience to individual regions. You let the working regions take over for you.

Or spend the extra money and set up your own co-lo arrangement.

internet forecast (0)

Anonymous Coward | about 8 months ago | (#44673177)

partly cloudy with a chance for server outages.

Nothing wrong here (0)

Anonymous Coward | about 8 months ago | (#44673181)

Just had to power down while the NSA live feed was plugged in.

Everybody that is surprised is stupid... (4, Insightful)

gweihir (88907) | about 8 months ago | (#44673231)

That things like this will happen with a cloud infrastructure are obvious. That the reliability claims made by the cloud providers are fantasy is also obvious. As soon as they start to do "uptime or else" (meaning you get tons of money as downtime compensation), things may be different. but they will not do that. At this time, the only thing you can do is change to a different cloud provider, which will have the same issues. Uptime guarantees without penalties when failed to meet them are worthless.

Re:Everybody that is surprised is stupid... (4, Insightful)

VortexCortex (1117377) | about 8 months ago | (#44673309)

We built a decentralized network called The Internet, even capable of withstanding global thermonuclear war -- packets rerouted moments after a city disappears from the mesh... And folks use data silos? Protip: Don't centralize services, that's daft in terms of both uptime and congestion.

This is why I laugh at tech pundits who preach... (3, Interesting)

bagboy (630125) | about 8 months ago | (#44673317)

public cloud services as "the future". I will never risk my corporate data uptime and reliability to some "location in the cloud". I'll stick to private clouds (VMWare/VCenter) where I have control of both hardware and software and reliable failsafe systems. At least then if I have downtime I also have accountability and predictability. They same cannot be said for cloud providers and no matter what anyone says once the data leaves your hardware, you have lost that control.

Re:This is why I laugh at tech pundits who preach. (3, Interesting)

l0ungeb0y (442022) | about 8 months ago | (#44673729)

Depends on which "future" you are talking about. The future where the bulk of personal data is stored on the cloud to be shared across devices and with friends, family and authorized services is one I think is bound to come to fruition.

The future where Corporations put their core infrastructure into the Cloud is not one I ever recall anyone talking about.

Instagram? Is that some kind of website? (0)

PopeRatzo (965947) | about 8 months ago | (#44673607)

The only web site that I've noticed being down in the past few weeks has been Wikifonia, the wonderful place where crowd-sourced MusicXML lead sheets for all sorts of music are available.

They're back online now, and at least from what I can see, there is great jubilation among musicians worldwide. Where else can you go and search for some old jazz standard and get an immaculate lead sheet, instantly transposable into any key, downloadable as a PDF?

I think Wikifonia has been single-handedly keeping the vast Great American Songbook alive, for which they deserve great thanks.

I thought it was just an issue where some big music publishing group that represents outfits that charge $5 for a lead sheet to a song whose composer has been dead for half a century has been hassling them, but since it's back online and faster than ever, I think it might just have been a technical glitch.

Wikifonia, salute!~

Wrong terminology? (5, Funny)

elfprince13 (1521333) | about 8 months ago | (#44673665)

Shouldn't this, technically speaking, be a "bright day" or a "sunny day"? After all, that's what I call it when the cloud-coverage breaks around here.

Serves you right (1)

Gothmolly (148874) | about 8 months ago | (#44673695)

For believing and investing in some handwavy concept called 'cloud' where you abrogate responsibility take the iOS view (it Just Works) of technology.

quantum entanglement (0)

Anonymous Coward | about 8 months ago | (#44673763)

how i hate it. rebuild my local lan vm server and on the other side of the world aws craps out ... so either quantum entangelment or a tried frame from nsa?

Load More Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Sign up for Slashdot Newsletters
Create a Slashdot Account

Loading...