×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

cancel ×
This is a preview of your comment

No Comment Title Entered

Anonymous Coward 1 minute ago

No Comment Entered

112 comments

Um, backups? (1)

Anonymous Coward | more than 2 years ago | (#35972188)

srsly, as in your own

I am not rightly able to comprehend... (5, Insightful)

Man On Pink Corner (1089867) | more than 2 years ago | (#35972194)

... the confusion of ideas that would lead someone to treat their live web server as their primary/master data repository.

I guess I'm still stuck in Commodore 64 World, or something..

Re:I am not rightly able to comprehend... (1)

SpiralSpirit (874918) | more than 2 years ago | (#35972248)

well, the a data center run by amazon certainly has more rigorous backup and maintenance schedules than anything I could personally come up with, offhand. It took something pretty catastrophic to bring it down and cause data lass. the problem is if someone decided to only have one copy, at amazon. If they had 2 at 2 different servers, success!

Re:I am not rightly able to comprehend... (1)

Anonymous Coward | more than 2 years ago | (#35972276)

naaa, I've not heard of any meteor swarm hitting amazon servers, so there was not anything catastrophic.

just business as usual.

Re:I am not rightly able to comprehend... (2)

obarthelemy (160321) | more than 2 years ago | (#35972278)

I'm not so sure about rigorous...
1- I personnally have never lost a single byte of meaningful data
2- do amazon detail their exact procedures and commitments ?
3- do amazon backup those "commitments" with hard cash ? How much will the people whose data they lost be compensated ?

read the sig....

Re:I am not rightly able to comprehend... (2)

jc2brown (1997958) | more than 2 years ago | (#35972650)

You might want to read this [amazon.com] .

They're crediting all accounts that had any activity in the USA-East region for 10 days of usage, regardless if they were affected.

Remember that it was EC2 that was affected, which is just a virtual machine with volatile storage. Had it been S3 data that was lost one should expect restitution, but in this case downtime and data loss is ultimately the fault of the user.

Re:I am not rightly able to comprehend... (1)

digitig (1056110) | more than 2 years ago | (#35972674)

So that's basically "not much" -- only a bit more than just not charging them for the period of the outage.

Re:I am not rightly able to comprehend... (3, Interesting)

teh kurisu (701097) | more than 2 years ago | (#35972692)

That depends. Only a couple of our servers in that availability zone were actually affected, but we're apparently being compensated as though all of them were. Bonus for us.

Re:I am not rightly able to comprehend... (0)

Anonymous Coward | more than 2 years ago | (#35973862)

Posting anonymously for soon-to-be-obvious reasons.

Netflix's run-rate at Amazon is significantly higher, I'd bet, than 90% of the Amazon users out there. While Netflix experienced some internal impact, its actual business was not materially affected by the outage, and given that we're also getting the ten day credit -- and it is substantial -- I'd say that worked very well for us.

Re:I am not rightly able to comprehend... (2)

darkpixel2k (623900) | more than 2 years ago | (#35975002)

1- I personnally have never lost a single byte of meaningful data

Yep--the moment I accidentally 'rm -rf /', I simply re-classify the drive as 'not containing meaningful data' and my stats are saved.

Re:I am not rightly able to comprehend... (4, Informative)

MichaelSmith (789609) | more than 2 years ago | (#35972280)

It took something pretty catastrophic to bring it down and cause data lass

Catastrophic would be an earthquake, tsunami and meltdown, in that order. From my reading of the situation amazon stuffed up their own replication mechanism and it recursively replicated the system to fill up the available hardware. Thats just bad design. Its obvious they did no testing under realistic conditions.

Re:I am not rightly able to comprehend... (1)

SuricouRaven (1897204) | more than 2 years ago | (#35972602)

I wouldn't be too hard. Yes, they screwed up - but a bug like that could easily slip through testing, as it might only occur on extreme-sized data sets. Their real screwup was in not noticing right away and reverting to the previous config.

Re:I am not rightly able to comprehend... (0)

Anonymous Coward | more than 2 years ago | (#35976814)

If I read it correctly, they discovered the original problem (config change on the router) quickly but by that time the EC2 management software was already busy destroying itself and there was no "config" change that could resolve the problem. The change to prevent this, a more aggressive backoff mechansim, isn't just a config change.

Nobody expects the thundering herd.

Re:I am not rightly able to comprehend... (0)

Anonymous Coward | more than 2 years ago | (#35973494)

Its obvious they did no testing under realistic conditions.

And how do you test under realistic conditions when those realistic conditions are an enormous, ~10 datacenter system that serves a good percentage of the internet?

Re:I am not rightly able to comprehend... (1)

Thing 1 (178996) | more than 2 years ago | (#35973522)

Its obvious they did no testing under realistic conditions.

And how do you test under realistic conditions when those realistic conditions are an enormous, ~10 datacenter system that serves a good percentage of the internet?

Like in Contact: you build two of them.

Re:I am not rightly able to comprehend... (2)

greenbird (859670) | more than 2 years ago | (#35972314)

well, the a data center run by amazon certainly has more rigorous backup and maintenance schedules than anything I could personally come up with

It's funny. Not a single place I've worked at has had as good of backups as I have for my personal stuff. And I didn't even spend 6 figures for some useless enterprise backup solution. Some scripting, cp -al, rsync, dmcrypt, ssh and a remote PC at my girlfriends house and you have an incremental backup solution more secure and more robust than any enterprise solution I've ever seen, and it only cost a couple hundred for the drives.

Re:I am not rightly able to comprehend... (1, Insightful)

zonky (1153039) | more than 2 years ago | (#35972350)

Congrads. Does your GF have a key to your house? Because your "perfect system" has a single point of failure- a insider who could cause damage to both causing loss of data. Best not get on her bad side for now, anyway....

Re:I am not rightly able to comprehend... (1)

leehwtsohg (618675) | more than 2 years ago | (#35972466)

Hmmmm? You don't store copies of your data in remote locations? What about a fire? I think a backup scheme must store data remotely. Leave one copy with your parents (upstairs), and one with a friend in australia (I guess across the street...)

Re:I am not rightly able to comprehend... (1)

Roger Lindsjo (727951) | more than 2 years ago | (#35972604)

I don't think the poster meant that it was a bad idea to store the data with the girlfriend, but to have an entity that has access to all data copies is a bad idea. It is even god to protect data from yourself by having some copies off line to prevent accidental data deletion.

Re:I am not rightly able to comprehend... (1)

renoX (11677) | more than 2 years ago | (#35973020)

He didn't claim that his backup system was perfect, just better than what many enterprise do, which is probably true.

> Best not get on her bad side for now, anyway....

Bah, if he is wise his backups are encrypted, so this shouldn't be a big issue (unless he has a bad break-up with his GF and loose data at the same time: Murphy's law in action).

Re:I am not rightly able to comprehend... (0)

Anonymous Coward | more than 2 years ago | (#35973150)

You miss his point. He's saying GGP's new ex-gf could come over and "give him an intentional heartache", but with rm -rf and/or right-angle grinder in lieu of green spraypaint. So best not to take up with a tramp, eh.

Re:I am not rightly able to comprehend... (1)

tigersha (151319) | more than 2 years ago | (#35973100)

This is definitely better than Amazon's backup plan. It backs up data and LITERALLY screws you. Amazon just screws you.

Re:I am not rightly able to comprehend... (2)

jpapon (1877296) | more than 2 years ago | (#35972358)

Until she dumps you and throws your backup drives out her window that is. Tying the security of your backup to the security of your relationship is an interesting gamble. One day you might find yourself lonely AND data-less.

Unless of course you're one of those people who refers to female friends as "girlfriends", in which case, I hate you.

Re:I am not rightly able to comprehend... (1)

brusk (135896) | more than 2 years ago | (#35973248)

Unless of course you're one of those people who refers to female friends as "girlfriends", in which case, I hate you.

Women do that too, but this is /.

Re:I am not rightly able to comprehend... (0)

Anonymous Coward | more than 2 years ago | (#35975756)

Unless of course you're one of those people who refers to female friends as "girlfriends", in which case, I hate you.

Women do that too, but this is /.

Hmm... women having other women as girlfriends. In fact, that's precisely sort of data I'm trying to back up! How'd you know? (Oh, right, this is /.!)

Re:I am not rightly able to comprehend... (1)

breakfastpirate (925130) | more than 2 years ago | (#35974392)

Well, so long as his backups at her place consist only of jpeg pictures of their relationship, you could actually kill two birds with one stone there...

Re:I am not rightly able to comprehend... (1)

greenbird (859670) | more than 2 years ago | (#35977084)

Until she dumps you and throws your backup drives out her window that is.

She'd have to come here and trash those also. That'd be a trick though. I have some 10 computers spread over several rooms here and a dozen or more external drives. I never said it was perfect. Just better than any enterprise setup I've seen. The malicious insider is always the toughest hole to cover in any data protection scheme.

Re:I am not rightly able to comprehend... (1)

jimicus (737525) | more than 2 years ago | (#35972382)

Bigger solutions are invariably more complicated. And when they're more complicated, there's more to go wrong - and when it does go wrong, there's more that can be affected.

This is why I'm quite wary of people throwing the word "Enterprise" around. IME, it's frequently a codeword meaning "A proprietary vendor has told us their product can be all things to all men - which is technically true but what we're buying needs many more man-hours of work to turn it into anything for anyone than we can hope to dedicate."

Re:I am not rightly able to comprehend... (1)

im_thatoneguy (819432) | more than 2 years ago | (#35972412)

My backup system is no where near the studio I work at's backup system. But I could deploy something even easier for one simple reason: I have less data.

Do you know how much it would cost to remote push 10TB of data once a week?

Re:I am not rightly able to comprehend... (1)

greenbird (859670) | more than 2 years ago | (#35976966)

Do you know how much it would cost to remote push 10TB of data once a week?

If you're generating 10TB of data weekly you're talking about an extremely rare situation that would require a specialized solution anyway since no backup solution out there can support that.

If your talking about have around 10TB of data total that's what rsync is for. You do fast incrementals on the local system over a high bandwith pipe (sata, sas). Then you have 2 options depending on your system requirements. You either run a daily rsync of one of the incermentals directly offsite over a separate network pipe or you dump to an onsite backup server over a fat network pipe and then do your remotes from there. You're isolating your backup bandwith from your production systems and with rsync only modified portions of files are sent greatly reducing bandwith requirements. The cp -il gives you point in time backup images (say every four hours) while only needing disk space for modified files to allow finely incremented recoveries (e.g. I need a file that was created Tuesday morning and deleted that afternoon). Your offsites are really only for major disaster like a plane crashing into the building. If you use an onsite backup server put it in a different location than your servers. That way when the server room floods you still have your incermentals.

All this can be done in a highly reliable fashion with a few hundred lines of bash scripting including the email warnings.

Re:I am not rightly able to comprehend... (0)

Anonymous Coward | more than 2 years ago | (#35972438)

well, the a data center run by amazon certainly has more rigorous backup and maintenance schedules than anything I could personally come up with

It's funny. Not a single place I've worked at has had as good of backups as I have for my personal stuff. And I didn't even spend 6 figures for some useless enterprise backup solution. Some scripting, cp -al, rsync, dmcrypt, ssh and a remote PC at my girlfriends house and you have an incremental backup solution more secure and more robust than any enterprise solution I've ever seen, and it only cost a couple hundred for the drives.

its called rsnapshot

..at my girlfriends house (1)

Anonymous Coward | more than 2 years ago | (#35972580)

What's a girlfriend?

Re:..at my girlfriends house (0)

Anonymous Coward | more than 2 years ago | (#35973176)

What's a girlfriend?

An advanced data storage repository, from what I gather. Maybe I should consider getting one. I mean, I knew they had certain recreational uses but the backup features are new. Wonder what capacities they come in.

Re:..at my girlfriends house (1)

Anonymous Coward | more than 2 years ago | (#35973238)

"Wonder what capacities they come in."

There's a lot of variance, and bigger isn't necessarily better. Most capacities are specified in the form "x-y-z w/ nX", where x, y, z, and n are numbers, and X is an alphabetic designation that may consist of multiple letters. Many people attracted to women prefer to maximize x and z (while still having them be nearly equal) while minimizing y and n, and want a designation of "C" or "D" used for X.

Re:I am not rightly able to comprehend... (1)

Eivind (15695) | more than 2 years ago | (#35972608)

I've pondered the same thing. My workplace spends 6 figures, and as far as I'm able to tell, gets significantly less than I have at home, despite my investment being 2 orders of magnitude less.

Every 2 hours for the last day, every day for the last week, every week for the last month, every month for the last year, every year forever. Physically backed up to 2 distinct discs inhouse (one of which is pretty burglar-proof, living in safe), and 2 encrypted copies under the care of 2 distinct companies, in different jurisdictions and on different continents. (there's 3 copies of the decryption-phrase, one in my head, 1 in my safety-deposit-box in the bank and one stored with my will in a secure will-storage-system run by a respected law-firm)

While this ain't "perfect" (nothing is), it's *hell* of a lot better than what we've got at work, despite the latter costing 100 times more. (and no, the amount of data backed up at work, is not larger, both backups are aproximately 5TB for a complete copy)

Yes, a burglar could steal one copy from my house. But I'm more concerned with not losing files than with preserving privacy, there's nothing *really* secret anywhere in my data.

Re:I am not rightly able to comprehend... (1)

mwvdlee (775178) | more than 2 years ago | (#35972688)

Just curious; what 5TB worth of personal data requires a 4-figure backup spending?

Re:I am not rightly able to comprehend... (0)

Anonymous Coward | more than 2 years ago | (#35972736)

Obviously that post was his fantasy, filled with exaggeration and with "plan to some day" presented as "is." One tip-off is the "forever" part, clearly showing his bringing the fantasy future into the present as if it were reality.

Re:I am not rightly able to comprehend... (0)

Anonymous Coward | more than 2 years ago | (#35972868)

The scarier part would be if he actually did implement that plan.

Re:I am not rightly able to comprehend... (1)

Cyrrus30 (1993140) | more than 2 years ago | (#35973322)

That's what I was wondering. What can be 5 TB of personal data? My "real" worhtwhile data (meaning things that I can't get back if something wrong happens) takes like 30 GB. And 29.5 of this are pictures. But things like downloaded movies or MP3 (a certain chunk of it being illegally acquired) are not worth backing up using such a complicated scheme. My house burn and I lose some MP3 I bought on iTunes? No big deal, there would be alot of other things more important that would bug me (like, you know, my house).

Re:I am not rightly able to comprehend... (1)

MarkGriz (520778) | more than 2 years ago | (#35974462)

Just curious; what 5TB worth of personal data requires a 4-figure backup spending?

Porn

Re:I am not rightly able to comprehend... (1)

jimicus (737525) | more than 2 years ago | (#35972936)

Where things start to get more complicated is when the data being stored requires some massaging before you can take a copy - or for that matter if you can only take a copy under specific circumstances or your copy is only useful under specific circumstances.

For instance:

Most modern databases store their data in files on the disk. Database transactions are atomic, sure. And (hopefully, assuming a modern FS) so are disk transactions. This does not mean, however, that you can simply copy the underlying files. cp, tar et al are not atomic, so you can wind up with an unusable backup. Both MySQL and Postgres explicitly state that you need to either shut down the database, use the native backup tool (which requires a lot of free space because you're essentially dumping the DB to a text file on disk and you take a backup of that) or in the case of MySQL, lock the tables.

For instance:

Microsoft Exchange stores all the data in a honking great database. This has pros and cons. The biggest pro is with appropriate indexing (pre-cooked by Microsoft because they designed the application), it's very fast.

The biggest con is that backup and restoration of individual mailboxes are a PITA (though I understand some companies produce proprietary software to try to resolve this). It's not too difficult to backup and recover the entire server.

Something similar is true for most mail systems to a greater or lesser extent. Systems which store everything in discrete files (such as Courier IMAP) at least have the advantage that it's dead easy to recover a person's mailbox, but it's a pig to recover a particular email from it because the metadata you'd use to find the email isn't stored in the filename, it's in the file itself. About the only sensible thing you can do is create them a new mailbox, restore the entire backup into it and tell them to find the email(s) they need from in there.

These are fairly simple examples I can come up with without having to put any real effort in. It's likely that Amazon - in creating their EC2 system - created something with lots of wonderful features but "dead easy to backup and restore, very difficult to screw up the backup process" wasn't one of them.

Re:I am not rightly able to comprehend... (1)

arndawg (1468629) | more than 2 years ago | (#35978326)

Wait. Your backup target is ONLINE? I've got news for you buddy. You don't have backups!

Re:I am not rightly able to comprehend... (0)

Anonymous Coward | more than 2 years ago | (#35972384)

well, the a data center run by amazon certainly has more rigorous backup and maintenance schedules than anything I could personally come up with, offhand. It took something pretty catastrophic to bring it down and cause data lass.

That may be, but its also much more unlikely that something you set up yourself is going to come with a giant target plastered on the side of it and a sign saying "hit me".

Re:I am not rightly able to comprehend... (0)

Anonymous Coward | more than 2 years ago | (#35972508)

This "data loss" is on the EBS volume.
The Amazon FAQ have make it very clear: EBS is just RAID -- they have no extra backup. You should take snapshot to S3 if you care the data.

Re:I am not rightly able to comprehend... (1)

davidbrit2 (775091) | more than 2 years ago | (#35973404)

well, the a data center run by amazon certainly has more rigorous backup and maintenance schedules than anything I could personally come up with, offhand

And a fat lot of good that did.

Re:I am not rightly able to comprehend... (1)

CSMoran (1577071) | more than 2 years ago | (#35973670)

bring it down and cause data lass.

I'm a lad, you insensitive clod!

Re:I am not rightly able to comprehend... (1)

Khyber (864651) | more than 2 years ago | (#35974360)

"the a data center run by amazon certainly has more rigorous backup and maintenance schedules than anything I could personally come up with, offhand."

Differential backups every five minutes across three different backup systems rolling RAID-6, one local and two remote.

I don't have data loss issues, EVER.

And I'm just a low-level tech guy. If Amazon can't get it right and I can, something is wrong.

Re:I am not rightly able to comprehend... (1)

Anonymous Coward | more than 2 years ago | (#35972362)

I am not rightly able to comprehend how some whose primary data source is people entering data via their website could have it elsewhere. Sure, they can have copies elsewhere, but those would be the backups.

Re:I am not rightly able to comprehend... (4, Informative)

wvmarle (1070040) | more than 2 years ago | (#35972576)

From a look at the linked article, it seems that one of the issues is data generated by these web sites. Such as user statistics, or user uploaded content, etc. That naturally lives primarily on the live web server and is also data that you don't want to lose. Also as other commenters mentioned as well the EC2 service is not a cloud-storage server, it's a web hosting service, and web hosts tend to indeed generate their own data.

This data of course needs to be backupped actively, and one would expect a web host to include that in its service. That's one of the reasons to pay for such a service, instead of doing it yourself.

Besides relying on their backups it's of course a good idea to regularly take backups yourself. But even if you do this daily, it means you may lose up to a day's worth of data. And that's (partly) what happened here. It's similar to someone who takes a photo on a digital camera, and subsequently loses that camera and the photo with it. You don't say "they shouldn't use a camera as primary data repository". It isn't. It's a temporary repository, and when the data is generated it's the one and only repository, simply pending copying to backup media.

Re:I am not rightly able to comprehend... (0)

Anonymous Coward | more than 2 years ago | (#35973446)

Even the backup plan I offer my customers only guarantees that I can get them back to midnight of the previous day. It's not perfect, and it would still be annoying in the case of a complete disaster, but short of the building burning down, they'd be able to reproduce all of the lost data in a couple of hours. If the building DID burn down, then one day of lost data would probably be the least of their concerns.

-Restil

Re:I am not rightly able to comprehend... (1)

aztracker1 (702135) | more than 2 years ago | (#35975368)

It's not a "web host" even.. it's simply a virtual machine run-time environment. You setup the OS, and configure it... Amazon does not... they provide storage facilities that can be used to backup to, and even mount to your host OS. Also, many virtual machine, or virtual host providers don't necessarily provide backup solutions.

Re:I am not rightly able to comprehend... (1)

ColdWetDog (752185) | more than 2 years ago | (#35973440)

I guess I'm still stuck in Commodore 64 World, or something..

Cassette tapes? I'm so very sorry.

Re:I am not rightly able to comprehend... (0)

Anonymous Coward | more than 2 years ago | (#35974082)

For files, I totally agree, but as far as the database is concerned I would guess about 99.9% of all sites use the live web server as a primary/master data repository. Other than large corporations with custom written applications storing web data across multiple databases, how else do you expect it to be done?

Clouds are ephemeral (1)

sincewhen (640526) | more than 2 years ago | (#35972200)

Who knew?

Re:Clouds are ephemeral (5, Informative)

mini me (132455) | more than 2 years ago | (#35972236)

Cloud applications hosted on Amazon survived this incident without issue, as expected. Only the regular old hosted applications had problems with the outage. They were never "the cloud" to begin with, so I'm not sure why the term even comes up in this discussion.

The cloud represents a black box that hides the underlying network topology so that there are no single points of failure. Cloud applications are tolerant because they are spread through different datacenters across multiple points of in world. A catastrophe at one or more datacenters will have no noticeable effect on the availability of a cloud application because it continues to run in many more.

Amazon offers a few cloud applications: S3 comes to mind. But Amzon's EC2/EBS hosting service is a plain old hosting service like any other. The EC2 topology is not hidden away from you. You have to make active decisions about where you want your EC2 instance to live. That goes against the idea of the cloud. What Amazon does offer in EC2 is the tools necessary for you to build a cloud application, but not everything hosted on EC2 is a cloud application by default.

Re:Clouds are ephemeral (1)

im_thatoneguy (819432) | more than 2 years ago | (#35972422)

You're right, by default EC2 isn't a cloud solution, but Amazon doesn't help alleviate that confusion (from their website for EC2):

Amazon Elastic Compute Cloud (Amazon EC2) is a web service that provides resizable compute capacity in the cloud. It is designed to make web-scale computing easier for developers.

It doesn't help when the name of the service (EC2) has the word "Cloud" in it.

Re:Clouds are ephemeral (0)

Anonymous Coward | more than 2 years ago | (#35972638)

IMO it still is "cloud" but for computing not storing lots of critical data.

For that sort of thing if data is lost, you're supposed to resubmit the inputs and recompute it.

Re:Clouds are ephemeral (1)

SuricouRaven (1897204) | more than 2 years ago | (#35972822)

I still don't think there is any real definition for what 'cloud' means. As far as I can gather it's just fancy new marketing-speak for the very old idea of putting things on a server - the only difference is that with the cloud, you don't have to care about the server's physical location.

Re:Clouds are ephemeral (1)

jimicus (737525) | more than 2 years ago | (#35972948)

That's exactly what it is.

It's confusing a lot of people because something sold as a cloud application (ie. SaaS) may or may not be designed with HA in mind and the vendor likely won't tell you. If it is, and the underlying infrastructure is sound, you're probably OK. Hopefully.

If it's not, it's not much different to an application running on some server in a co-lo somewhere, the only real difference is that you don't lease the server directly and you're not responsible for any backups.

Then you've got virtual servers (which with modern hypervisors and a suitable SAN backend can in theory make an entire application HA regardless of whether or not that was part of the original design) which confuses the issue even further. Yes, they can in theory do all that but you often find that companies selling virtual servers haven't done it at all. All they've done is bought a few racks of servers and put together some fancy management software to make it easy to create and destroy virtual servers on the fly. The features like live migration are mostly not used at all, but you don't find that one out until your virtual server fails and their support team tell you they're "rebooting the physical server it lives on". These may also be sold as "cloud servers".

Re:Clouds are ephemeral (1)

mini me (132455) | more than 2 years ago | (#35975768)

In networking, the cloud has always represented an abstract network whose implementation details are unknown – it just magically works, thanks to the hard efforts of third parties. It is only lately that some marketing types want to exploit the term to make it mean something else.

Re:Clouds are ephemeral (0)

Anonymous Coward | more than 2 years ago | (#35973262)

Lol here we go again, bob and weave.

Oh Noooes! (0)

Anonymous Coward | more than 2 years ago | (#35972220)

OMFG Run for the hills! Social Medial Data was lost. How Will I know in 30 years what you said about bobby!?!??!

What is S3? (5, Informative)

badran (973386) | more than 2 years ago | (#35972224)

EC2 is not meant to be used for data storage, that is what S3 is designed for. You store data and backups on S3, and use EC2 to serve high bandwidth websites to the masses.

Re:What is S3? (1)

Big_Mamma (663104) | more than 2 years ago | (#35973252)

And that's exactly how EBS is supposed to be backed up - it saves snapshots all the time to S3. Small and cheap incremental backups stored to a 99.999999999% durable [amazon.com] storage area. But apparently, Amazon messed up the backed up copies as well - instead of producing an outdated, but valid snapshot, they replied to affected customers with:

A few days ago we sent you an email letting you know that we were working on recovering an inconsistent data snapshot of one or more of your Amazon EBS volumes. We are very sorry, but ultimately our efforts to manually recover your volume were unsuccessful.

Re:What is S3? (1)

Anonymous Coward | more than 2 years ago | (#35973422)

The quote is about the snapshots they took before they started recovery efforts. If you took snapshots of your EBS volumes regularly, these were not affected at all...

Re:What is S3? (1)

Slashdot Parent (995749) | more than 2 years ago | (#35978028)

EC2 is not meant to be used for data storage, that is what S3 is designed for. You store data and backups on S3, and use EC2 to serve high bandwidth websites to the masses.

I don't think this is a fair criticism of people who lost data.

S3 isn't designed as an online datastore for live applications. Sure, I can put any content in there that I want, but it can't be up-to-the-millisecond.

AWS said to consider EBS volumes to be like hard disks, with a similar failure rate to hard disks. I forget the expected failure rate that they posted, but I think it was roughly between 1:100 and 1:1000 EBS volumes should be expected to fail each year. So go ahead and make your usual solutions with RAID arrays, DB write logs on a different volume, consistent snapshots stored on S3 for backup, etc.

But last week's outage was way different. That was a failure of EBS the service, not an EBS volume. This turned the whole "EBS volume as a hard disk" paradigm on its ear. That shiny RAID array you've got? Dead. Those DB write logs? Dead. Those pristine consistent snapshots sitting safe and sound on S3? Sorry, you can't access those. Those EBS-backed virtual machines that your application runs on? Sorry, you can't access those, and you can't launch any new ones, either.

So now you're left with offsite backups, which many users had, but are going to be out-of-date by nature. It didn't really matter if your offsite backups were in S3, on your local hard disk, or at some other online storage provider. Also, if your application was architected for EBS-backed instances, you couldn't launch new instances, anyway. Not without rearchitecting your application.

So "sorry, you should have used S3 for your data" isn't really the answer. It's a little hard for your application to run with no access to CreateInstance or CreateVolume!

since when is website a proper noun (0)

Anonymous Coward | more than 2 years ago | (#35972250)

^^^

Data loss (0)

Anonymous Coward | more than 2 years ago | (#35972260)

If you have no need for this snapshot, please delete it to avoid incurring storage charges.

This is completely unacceptable.

As a former sysadmin though, I think one way of keeping database backups less then 12 hours old is by writing compressed binary logs to a tape streamer. If the tape is double sided, it can continuously keep recording binary logs. But for the amount of data Amazon handles I think this can grow expensive, tapes are slow media so even a rack full of them might not cut it. Still, there has to be a way to cache volatile data to non-volatile storage. But then, even Google can loose a file.

I like to send syslog to a remote site, so in case of an outage the investigation can begin immediately. Because even if there are several redundant servers, if they all have the same flaw they're all going down the same way.

The Cloud Is Dead (0, Informative)

Anonymous Coward | more than 2 years ago | (#35972272)

If Amazon can fuck things up as badly as this, then surely so can Google. Which means the cloud is dead.

Back to mainframes, folks. It really works. It's really secure. You have full control. And not really more expensive.

I'm not even joking.

Re:The Cloud Is Dead (0)

The End Of Days (1243248) | more than 2 years ago | (#35972336)

There's something endearing about all the Slashdot luddites wishing things would move to the past.

Re:The Cloud Is Dead (0)

Anonymous Coward | more than 2 years ago | (#35972424)

There's something endearing about all the Slashdot luddites wishing things would move to the past.

Except the public cloud really is worthless - not just for the fact it brings people that would otherwise have nothing against you trying to take down your server - but for the security factor (which partially relates to the first point).

Private cloud computing with data connections allowing fluid sharing of data between individual private clouds is the way of the future, but the public cloud was a horrible idea designed to steal corporate secrets from the start.

Re:The Cloud Is Dead (2)

inputdev (1252080) | more than 2 years ago | (#35972486)

I think people miss the point of the cloud - saying the cloud is worthless because it "brings people that would otherwise have nothing against you trying to take down your server" is like saying that the internet is worthless because it opens up security risks.
I for one am glad to be connected, and obviously so are many others. Don't use services that aren't good for you - there are some cloud based services that are great, and some that aren't. It's pretty clear that in the future, things will be more connected, not less - adapt and take advantage of the good parts, the rest will fade anyway.

Re:The Cloud Is Dead (1)

Hazel Bergeron (2015538) | more than 2 years ago | (#35972574)

There's something simplistically technocratic about assuming that what is now is better than what has been.

Buy X! It's newer, thus better, than Y!

Because the economy's like a religion and set up so people lose their jobs and their homes if you don't needlessly produce and consume nothing of value.

Re:The Cloud Is Dead (1)

sarhjinian (94086) | more than 2 years ago | (#35974050)

Yes, because building your own datacentre, or paying hosting fees to a five-nines-plus facility, costs nothing. Air conditioning, batteries, generators, fire supression, multiple, redundant network connectivity: that stuff''s all free. A mainframe solves it all!!

Look, a quality DC costs millions to build or tens of thousands to rent space in. Servers and mainframes cost money to manage, support and spare out. If you're starved for capital, why wouldn't you use EC2+EBS+S3 for a few bucks a month, rather than tie up dollars that could be spent on developers, marketing or suchlike in hardware and facilities that you're not really benefitting from. To build something like EC2 and the like is seriously expensive. Can your average startup with a server or two claim five nines? Really?

All these people who chant "Don't use the cloud, there could be an outage/breach!!" are just one screw-up away from the same, and it's often pure luck that they haven't been whacked yet.

Did this save Wikileaks? (2, Funny)

kulnor (856639) | more than 2 years ago | (#35972274)

Guess Wikileaks feels good about not being hosted there anymore.... their critical information could have been "lost" as well....

Availability zones (1)

nereid666 (533498) | more than 2 years ago | (#35972312)

What is more scaring for me, is that Amazon tell you that they have multiple availavility zones on each zone, and recomends you to distribute replicated servers, on each of this zones, for example I have a project with the master database in one zone, and the replica on the other zone. Why both zones fail?? Are not isolated/independent? Amazon charges you for data transfer between zones. As other says fails the servers, anyone must had backups on other place (S3, or Amazon external).

Re:Availability zones (1)

Anonymous Coward | more than 2 years ago | (#35972468)

Availability zones are just one part of the picture for a good software design in data centers. While they are isolated from each other, they may still be in the same location (or within the same general area), and could more easily be hit with a network partition, floods, tornadoes, etc. Instead, as most big businesses know (like Netflix, which didn't suffer from the outage), you need regional separation in addition to availability zones. Amazon does provide this functionality, it just costs more, and most of the businesses that went out did not pay for regional separation.

Re:Availability zones (2)

nereid666 (533498) | more than 2 years ago | (#35972562)

From: http://aws.amazon.com/es/ec2/ [amazon.com]
Availability Zones are distinct locations that are engineered to be insulated from failures in other Availability Zones and provide inexpensive, low latency network connectivity to other Availability Zones in the same Region. By launching instances in separate Availability Zones, you can protect your applications from failure of a single location.

Better than use different region, I think it is better have multiple cloud providers...

EBS volumes can be lost at any time (0)

Anonymous Coward | more than 2 years ago | (#35972478)

Amazon never pretended that data on EBS volume is completely safe. If you want that store them on S3.
Amazon always advoctaed to back EBS volumes. Actually i'm surprised that only this little data is lost.

Post morten Amazon explanation (5, Informative)

nereid666 (533498) | more than 2 years ago | (#35972734)

Post morten Amazon explanation:
http://aws.amazon.com/message/65648/ [amazon.com]

Re:Post morten Amazon explanation (0)

Anonymous Coward | more than 2 years ago | (#35972990)

In unrelated news - until recently I worked in networking at Amazon and am looking for a new job.....

Re:Post morten Amazon explanation (0)

Anonymous Coward | more than 2 years ago | (#35973424)

Who's morten, faggot?

Re:Post morten Amazon explanation (0)

Anonymous Coward | more than 2 years ago | (#35976126)

mod parent up

any data loss can be bad to a Website operator. (1)

jamesh (87723) | more than 2 years ago | (#35972856)

any data loss can be bad to a Website operator.

any data loss is catastrophic, if it's your data. They claim "a small percentage" of data was lost... 1% is a small percentage... 10% is also small percentage, but it's a huge amount of data.

Fortunately where I live and work there isn't really sufficient and reliable connectivity to "the cloud" to make it a worthwhile endeavor, so hopefully all the mistakes are learnt from before I have to worry about it.

Re:any data loss can be bad to a Website operator. (1)

Rakishi (759894) | more than 2 years ago | (#35973060)

any data loss is catastrophic, if it's your data.

No, it's only catastrophic if you're an idiot. Then again many website operators seem to be just that given how many need to use google cache to recover data after their web provider's server croaks.

Anyway, having your data in any single unreliable location is a recipe for disaster. And yes, with a 0.5-1% annual failure rate EBS is unreliable and no one claims otherwise. If you want reliable you use S3 and off-site backups.

Re:any data loss can be bad to a Website operator. (1)

Slashdot Parent (995749) | more than 2 years ago | (#35978070)

No, it's only catastrophic if you're an idiot. Then again many website operators seem to be just that given how many need to use google cache to recover data after their web provider's server croaks.

Anyway, having your data in any single unreliable location is a recipe for disaster. And yes, with a 0.5-1% annual failure rate EBS is unreliable and no one claims otherwise. If you want reliable you use S3 and off-site backups.

Please explain to me how I can keep my data in S3 and/or offsite backups up-to-the-millisecond.

I'll wait.

Amazon is pretty up-front about expected data loss (1)

molotov303 (182638) | more than 2 years ago | (#35972922)

Unless you pay extra, they say you can expect to lose data stored in S3 on a regular basis. There's nothing wrong with that per se, but it's something you need to plan for.

S3:

Designed to provide 99.99% durability and 99.99% availability of objects over a given year. This durability level corresponds to an average annual expected loss of 0.01% of objects.

http://aws.amazon.com/s3/ [amazon.com]

EBS:

...Amazon EBS snapshot can expect an annual failure rate (AFR) of between 0.1% - 0.5%, where failure refers to a complete loss of the volume.

http://aws.amazon.com/ebs/ [amazon.com]

Clarification (2)

Mascot (120795) | more than 2 years ago | (#35972966)

The durability you quote for S3 (99.99%) is for the reduced redundancy option. The standard storage lists 99.999999999% durability.

Store a backup yourself (2)

olau (314197) | more than 2 years ago | (#35972952)

This is not the first time I've heard about a big hosting centre losing data even though it never happens, and they are keeping backups, etc.

It if it's at all manageable, keep one copy safe at your own place in addition to the replication at the hosting centre. You can set up a cheap box at the office with a couple of terabytes disk space and suck down the data periodically with something like rsync and rdiff-backup. It's not a whole lot of work and can make the difference between having a big problem and total disaster.

It would help if hosting centres actually told you how exactly they store and backup your data and what they do in case of emergency instead of throwing meaningless phrases like "99.999% uptime!" and "fully redundant storage backbone!" at you. Fully redundant storage backbone is nothing if it means it's built with some big arse proprietary SAN stuff where the whole array goes down if the main controller goes down. Which it of course does because it's a flaky embedded thing with 2k memory that has to be programmed in assembler and C with dangling memory pointers all over the place.

Re:Store a backup yourself (1)

Junta (36770) | more than 2 years ago | (#35973246)

Very good advice. One issue is a *lot* of their users are commercial companies that viewed this as a way not to sweat the details at all. For many of those, if they have to sweat backup and all that, they might as well do the hosting themselves because the cost delta for them is not particularly large.

Re:Store a backup yourself (1)

dsouza42 (1151071) | more than 2 years ago | (#35974860)

I don't know if it would help if they told you exactly how everything works. I'm sure no company with an infrastructure like Amazon's takes backups and safety very seriously. The availability numbers tell you what you can expect statistically from their services. The service that caused data loss is called EBS and acording to Amazon: "Amazon EBS snapshot can expect an annual failure rate (AFR) of between 0.1% – 0.5%, where failure refers to a complete loss of the volume". So if you have your data there you have to know that it can fail and it probably will fail eventually, so you're right, it's a really good idea to backup the data yourself.

My business runs on Amazon's infrastructure and that was one of my main concerns before hiring their service. Because of this chance of failure I take hourly snapshots of my EBS volumes (which is enough for me.. I could even do it every 5 minutes) and copy the data back to my own servers periodically as you suggest. It's just common sense for anyone who deals with this type of thing. In my case, when the outage happened I just restored the latest snapshots and was up and running in a few minutes.

Now even with "proprietary SAN stuff" and "flaky things with 2k memory programmed in assembler and C with dangling memory pointers all over the place" it's still many times more reliable and many times cheaper than hosting it myself. After moving to Amazon I greatly increased my uptime and reduced IT costs by 90%. That doesn't mean I trust they'll work 100% of the time and that's because I do my due diligence and make backups.

Re:Store a backup yourself (1)

Slashdot Parent (995749) | more than 2 years ago | (#35978216)

Be careful with your quoting from the middle of a sentence. When you quoted, "Amazon EBS snapshot can expect an annual failure rate (AFR) of between 0.1% – 0.5%, where failure refers to a complete loss of the volume," that gave the impression the EBS snapshots had an AFR of 0.1% – 0.5%. Actually, EBS snapshots are stored on S3, so they have a durability rate of 99.999999999% each year.

It's EBS volumes, themselves, that have 0.1% – 0.5% annual failure rates. I'm sure you already knew this, but others might not.

Naturally, offsite backups are still a good idea, but if snapshots failed as often as implied, offsite backups would be a total necessity, rather than just a mere "good idea". :)

Do I understand properly? (0)

Anonymous Coward | more than 2 years ago | (#35972964)

As I understand the article, an inferno-like situation destroyed servers at AWS. That can happen. But why should a customer ever get notice of such a situation? Apart from public notice about a desaster somewhere within a company?
Several years back I learned about techniques, where a second live system would jump in immediately without the customer even noticing the switch to another physical machine. And I am not an expert in high-availability...

The second thing which makes me wonder is the notice of snapshots. Why do AWS talk of data likely not useful and charge their customers at the same time for their storage?

cb

Amazon Follow Up (0)

Anonymous Coward | more than 2 years ago | (#35973682)

You can check out the full disclosure of the event details here: http://aws.amazon.com/message/65648/

Load More Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Sign up for Slashdot Newsletters
Create a Slashdot Account

Loading...