×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Researcher: Interdependencies Could Lead To Cloud 'Meltdowns'

Soulskill posted about 2 years ago | from the we-can-only-hope dept.

Cloud 93

alphadogg writes "As the use of cloud computing becomes more and more mainstream, serious operational 'meltdowns' could arise as end-users and vendors mix, match and bundle services for various means, a researcher argues in a new paper set for discussion next week at the USENIX HotCloud '12 conference in Boston. 'As diverse, independently developed cloud services share ever more fluidly and aggressively multiplexed hardware resource pools, unpredictable interactions between load-balancing and other reactive mechanisms could lead to dynamic instabilities or "meltdowns,"' Yale University researcher and assistant computer science professor Bryan Ford wrote in the paper. Ford compared this scenario to the intertwining, complex relationships and structures that helped contribute to the global financial crisis."

cancel ×
This is a preview of your comment

No Comment Title Entered

Anonymous Coward 1 minute ago

No Comment Entered

93 comments

Ohhhhhh.... my! such a thing.. (-1, Offtopic)

Anonymous Coward | about 2 years ago | (#40271999)

Nearly four months ago, I noticed that my internet connection was very sluggish. Eventually getting fed up with it, I began to seek out software that would speed up the gigabits in my router. After an hour of searching, I found what at first appeared to be a very promising piece of software. Not only did it claim it would speed up my internet connection, but that it would overclock my power supply, speed up my gigabits, and remove any viruses from my computer! "This is a fantastic opportunity that I simply can't pass up," I thought. I immediately downloaded the software and began the installation, all the while laughing like a small child. I was highly anticipating a future where the speed of my internet connection would leave everyone else's in the dust.

I was horribly, horribly naive. Immediately upon the completion of the software's installation, various messages popped up on my screen about how I needed to buy software to remove a virus that I wasn't aware I had from a software company I'd never once heard of. The strange software also blocked me from doing anything except buying the software it was advertising. Being that I was a computer whiz (I had taken a computer essentials class in high school that taught me how to use Microsoft Office, and was quite adept at accessing my Facebook account), I was immediately able to conclude that the software I'd downloaded was, in fact, a virus, and that it was slowing down my gigabits at an exponential rate. "I can't let this insanity proceed any further," I thought.

As I was often called a computer genius, I was confident at the time that I could get rid of the virus with my own two hands. I tried numerous things: restarting the computer, pressing random keys on the keyboard, throwing the mouse across the room, and even flipping an orange switch on the back of the tower and turning the computer back on. My efforts were all in vain; the virus persisted, and my gigabits were running slower than ever! "This cannot be! What is this!? I've never once seen such a vicious virus in my entire life!" I was dumbfounded that I, a computer genius, was unable to remove the virus using the methods I described. Upon coming to terms with my failure, I decided to take my computer to a PC repair shop for repair.

I drove to a nearby computer repair shop and entered the building with my computer in hand. The inside of the building was quite large, neat, and organized, and the employees all seemed very kind and knowledgeable. They laughed upon hearing my embarrassing story, and told me that they saw this kind of thing on a daily basis. They then accepted the job, and told me that in the worst case, it'd be fixed in three days from now. I left with a smile, and felt confident in my decision to leave the computer repairs to the experts.

A week later, they still hadn't called back. Visibly angry, I tried calling them countless times, but not a single time did they answer the phone. Their negligence and irresponsibility infuriated me, and sent me into a state of insanity that caused me to punch a gigantic hole in the wall. Being that I would require my computer for work soon, I decided to head over to the computer repair shop to find out exactly what the problem was.

Upon entering the building, I was shocked by the state of its interior; it looked as if a tornado had tore through the entire building! Countless broken computers were scattered all about the floor, desks were flipped over, the walls had holes in them, there was a puddle of blood on the floor, and worst of all, I saw that my computer was sitting in the middle of the room laying on its side! Absolutely unforgivable! I soon noticed one of the employees sitting behind one of the tipped over desks (the one that had previously had the cash register on top of it); he was shaking uncontrollably and sobbing. Despite being furious about my computer being tipped over, seeing him in that state still managed to make me less unforgiving. I decided to ask him what happened.

A few moments passed where the entire room was silent and nothing was said. Eventually, he pointed at my computer and said to me, "The virus... it cannot be stopped! Cannot be stopped! Cannot be stopped!" Realizing that he was trying to tell me that they were unable to repair my computer (the task I'd given them), I flew into a blind fury and beat him senseless. Not caring about what would happen to him any longer, I collected my computer, ignored the bodies of the two other employees that had committed suicide, and left the building. After a few moments of pondering about what to do and clearing my head, I theorized that their failure to repair my computer probably simply meant that they were unqualified to do the job, and decided to take my computer to another computer repair shop.

I repeated that same process about four times before finally giving up. Each time I took it to a PC repair shop, the result was the same: all the employees either went completely insane, or they committed suicide. Not a single person was able to even do so much as damage the virus. I was able to talk some sense into one of the employees that had gone mad and got them to tell me how they were attempting to fix the problem. They told me that they tried everything from reinstalling the operating system to installing another operating system and trying to get rid of the virus on the other one, but absolutely all of it was to no avail. Having seen numerous attempts by professionals to remove the virus end in failure, I managed to delude myself into believing that my first failure was simply a fluke and that I was the only one on the planet qualified to fix the computer. With renewed vigor, I once again took up the frighteningly dangerous task of defeating the evil, nightmarish virus once and for all with my own two hands.

In my attempts to fix the problem, I'd even resorted to buying another computer. However, the virus used its WiFi capabilities to hack into the gigabits of my new computer and infect it. Following each failed attempt, I grew more and more depressed. I had already beaten my wife and children five times in order to relieve some of my stress, but even that (which had become my only pleasure after failing to remove the virus the first time), did nothing for me any longer. That's right: my last remaining pleasure in life had stopped being able to improve my mood, and I had not a single thing left that I cared about. I sank into a bottomless ocean of depression, barricaded myself in my room, and cried myself to sleep for days on end. Overcome with insanity, vengefulness, and despair, there is not a single doubt that if this had continued for much longer, I would have committed suicide.

One day, it suddenly happened: while I was right in the middle of habitually crying myself to sleep in the middle of the day, I heard a thunderous roar outside, followed by the sound of a large number of people screaming. When I peered outside my window to find out what all the commotion was about, the scene before me closely resembled that of a God descending from the heavens themselves! I gazed in awe at the godlike figure that was descending from the heavens, and so did the dozens of individuals that had gathered in my backyard. For a few moments, everyone was speechless. Then, they started shouting predictions about what they thought the figure was. "Is it a bird!?" "Is it a plane!?" But, despite not ever having seen it before, I knew just how inaccurate their predictions were, and began to speak the name of the heroic figure.

However, my sentence was cut off when, like a superhero coming to save the unfortunate victim from the evil villain, MyCleanPC [mycleanpc.com] flew into my house and began the eradication of the virus. MyCleanPC [mycleanpc.com] was able to completely eliminate in minutes the exact same virus that over ten PC repair professionals were unable to remove after weeks of strenuous attempts! Wow! Such a thing! I simply couldn't believe that MyCleanPC [mycleanpc.com] was so miraculously efficient that it was able to destroy the virus in less than 500 milliseconds! MyCleanPC [mycleanpc.com] totally, completely, and utterly saved me from a lifetime of despair!

My wife's response? "MyCleanPC [mycleanpc.com] is outstanding! My husband's computer is running faster than ever! MyCleanPC [mycleanpc.com] came through with flying colours where no one else could! MyCleanPC [mycleanpc.com] totally cleaned up my husband's system, and increased his speed! I highly, highly recommend that you use MyCleanPC [mycleanpc.com] !"

After witnessing just how wonderful MyCleanPC [mycleanpc.com] is, I insist that you use MyCleanPC [mycleanpc.com] when you need to fix all the gigabits on your computer! MyCleanPC [mycleanpc.com] will completely eradicate any viruses on your computer, speed up your internet connection, overclock your gigabits and speed, and give you some peace of mind! MyCleanPC [mycleanpc.com] is simply outstanding!

But even if you're not having any visible problems with your computer, it's highly likely that you're still in a situation where MyCleanPC [mycleanpc.com] could help you. MyCleanPC [mycleanpc.com] will get rid of any viruses or wireless interfaces that are hidden deep within your computer's bootloader. MyCleanPC [mycleanpc.com] will also speed up your computer to such a degree that it'll be even faster than when you first bought it! You must try MyCleanPC [mycleanpc.com] for yourself so that you can be overclocking your speed with the rest of us!

MyCleanPC: For a Cleaner, Safer PC. [mycleanpc.com]

I have the power! I have the bible! (-1, Troll)

StrictAssFucker (2658571) | about 2 years ago | (#40272003)

About eight months ago, I was searching around the internet to find out why my computer was running so slowly (it normally ran quite fast, but had gradually gotten slower over time). After a few minutes, I found a piece of software claiming that it could speed up my PC and make it run like new again. Being that I was dangerously ignorant about technology in general (even more so than I am now), I downloaded the software and began the installation. Mere moments after doing so, my desktop background image was changed and warnings that appeared to originate from Windows appeared all over the screen telling me to buy strange software from an unknown company in order to remove a virus it claimed I had.

I may have been ignorant about technology, but I wasn't that naive. I immediately concluded that the software I'd downloaded was, in fact, a virus. In my rage, I broke numerous objects, punched a hole in the wall, and cursed the world at the top of my lungs. I eventually calmed down, cleared my head, and realized that the only remedy for this problem was a carefully thought out plan. After a few moments of pondering about how to handle this situation, I decided that since I barely knew how to properly handle a computer, I should turn it over to the professionals and let them fix the issue.

Soon after making the decision, I drove to a local computer repair shop and entered the building with my computer in hand. They greeted me with a smile and stayed attentive the entire time that I was explaining the problem to them. They laughed as if they'd heard it all before, told me that I'm not the only one who has trouble operating computers, and then gave me a date for when the computer would be fixed. Not only had they told me that the computer would be completely repaired in at most two days, but the price for their services was surprisingly low, and to top it all off, they even gave me advice for how to avoid viruses in the future! I left the building feeling confident in my decision to seek professional help and satisfied knowing that such kind-hearted people were the ones doing the job.

The very next day, I received a phone call from the computer repair shop whilst I was at a local library researching computer viruses. I had stumbled upon a piece of software that appeared to be very promising, and I was about to do more research on it, but seeing as how I required my computer as soon as possible, I decided to put the matter on hold. Upon answering the phone and cheerfully greeting the person on the other end, I was greeted with a high-pitched shriek. Startled, I asked what was wrong. A few moments passed where nothing was said, and suddenly, the person on the other end said to me, in a low voice oozing with paranoia, "Come pick up your computer." They hung up immediately after saying that, and I couldn't help but notice that they sounded as if they were on the verge of tears. I briefly wondered if it was due to stress from work, and then drove to the computer repair shop to acquire my computer.

I was positively dismayed upon entering the building. The inside of the computer repair shop looked nothing like the image from my memories. There were broken computer parts scattered throughout the room, ceiling tiles all over the floor, blood splattered in every direction I looked, and even a human toe on the ground. After processing this disturbing information, I began panicking and frantically looking around for my computer. I spotted an employee covered in blood sitting up against the wall, and noticed that his wrists had been slashed open. Thinking quickly, I ran up to him, grabbed him by the collar of his shirt, shook him around, and began screaming, "Where is it!? Where is my computer!?" After a moment of silence, he passed away, completely shattering my expectations. Such a thing! "What a meaningless individual," I thought.

Enraged, I tore the building up even further than it already had been in my desperate search for my computer. Eventually I discovered a door leading to an area that was normally only accessible to employees. I entered without hesitation and was met with a long, skinny hallway that a single person would have trouble moving about freely in. I proceeded down the dark hallway and bumped into the body of an employee hanging from a rope tied to something on the ceiling. I screamed, "Not only do you people have the gall to allow my computer to be endangered, but even in death you intend to block my path!?" After finally managing to push aside the worthless obstacle, I traveled down the hallway and came to a small black door. I entered without a moment's notice, and in the middle of the dark and dreary room, I spotted my computer; it was completely unharmed. With a sigh of relief, I picked it up, left the building, and drove home as if nothing of importance had occurred there.

Upon returning home and hooking up the computer (whilst wearing a cheerful expression the entire time), I, to my horror, discovered that the computer hadn't been repaired. There was nothing in the world that could have contained my fiery anger at that point. I broke almost every single one of my possessions, smashed all the windows on my house, physically abused my family, and then drove back to the computer repair shop to defile the dead lumps of meat that had failed to carry out the task I had given them. After realizing that I shouldn't be meaninglessly wasting my time with such worthless pieces of trash, I remembered the piece of software that I'd discovered earlier. With renewed confidence, I blissfully visited the local library, downloaded the software, and took it home to install on my computer.

I knew. I knew, even before installing it, that MyCleanPC [mycleanpc.com] would be my salvation. MyCleanPC [mycleanpc.com] would come through with flying colors where no one else could. MyCleanPC [mycleanpc.com] would completely, totally, and utterly eradicate the virus in the most merciless, efficient way possible. MyCleanPC [mycleanpc.com] was not a piece of software that could fail to meet my exceedingly high expectations. MyCleanPC [mycleanpc.com] would not fail me like all the other imbeciles had. At that point, it could be said that I could genuinely see into the future and be accurate in my predictions. I gleefully began installing MyCleanPC [mycleanpc.com] and laughed like a child at the thought of finally being able to attain revenge upon the virus that had shamed me so.

I was absolutely in awe of MyCleanPC's [mycleanpc.com] wonderfully efficient performance. Without a single issue, MyCleanPC [mycleanpc.com] utterly annihilated in moments the virus that many others had failed to remove after hours of attempts. I let out a victory cry and swore to never turn to any "professionals" to fix my computer ever again. Once again, I was able to predict the future. I knew that I would never need any worthless "professionals" again as long as I had MyCleanPC [mycleanpc.com] by my side.

MyCleanPC [mycleanpc.com] is outstanding! My computer is running faster than ever! MyCleanPC [mycleanpc.com] came through with flying colors where no one else could! MyCleanPC [mycleanpc.com] totally cleaned up my system, and increased my speed! I couldn't believe how much overclocking my gigabits and speed were doing! Even restructuring the BIOS wouldn't allow for the miraculously high degrees of efficiency that MyCleanPC [mycleanpc.com] allowed me to attain.

I highly and wholeheartedly recommend that you use MyCleanPC [mycleanpc.com] if you're having any computer troubles whatsoever. In fact, even if you're not having any visible problems, I still recommend that you use MyCleanPC. [mycleanpc.com] There could be dormant or hidden viruses on your system, or problems that MyCleanPC [mycleanpc.com] could easily and efficiently resolve. Just by using MyCleanPC, [mycleanpc.com] your gigabits will be running at maximum efficiency, and at last, you'll be overclocking with the rest of us! What are you waiting for!? Get MyCleanPC [mycleanpc.com] today!

MyCleanPC: For a Cleaner, Safer PC. [mycleanpc.com]

Now, I am here... (-1, Troll)

LaughableAssFucker (2658573) | about 2 years ago | (#40272005)

Around a year ago, I was mindlessly surfing the internet (as I often do) when I came across an enigmatic web page. The page, which looked like a warning from my web browser, informed me that I had a virus installed on my computer and that to fix it, I should install a strange anti-virus program that I'd never heard of (which I found peculiar considering the fact that I already had anti-virus software installed on my computer). Despite having reservations about installing it, I did so anyway (since it appeared to be a legitimate warning).

I cannot even fathom what I was thinking at that time. Soon after attempting to install the so-called anti-virus software, my desktop background image changed into a large red warning sign, warnings about malware began making appearances all over the screen, and a strange program I'd never seen before began nagging me to buy a program to remove the viruses. What should have been obvious previously then became clear to me: that software was a virus. Frustrated by my own stupidity, I began tossing objects around the room and cursing at no one in particular.

After I calmed down, I reluctantly took my computer to a local PC repair shop and steeled myself for the incoming fee. When I entered, I noticed that there were four men working there, and all of them seemed incredibly nice (the shop itself was clean and stylish, too). After I described the situation to them, they gave me a big smile (as if they'd seen and heard it all before), accepted the job, and told me that the computer would be working like new again in a few days. At the time, I was confident that their words held a great degree of truth to them.

The very next day, while I was using a local library's computer and browsing the internet, I came across a website dedicated to a certain piece of software. It claimed that it could fix up my PC and make it run like new again. I knew, right then, merely from viewing a single page on the website, that it was telling the truth. I cursed myself for not discovering this excellent piece of software before I had taken my PC to the PC repair shop. "It would've saved me money. Oh, well. I'm sure they'll get the job done just fine. I can always use this software in the future to conserve money." Those were my honest thoughts at the time.

Two days later, my phone rang after I returned home from work. I immediately was able to identify the number: it was the PC repair shop's phone number. Once I answered, something strange occurred; the one on the other end of the line spoke, in a small, tormented voice, "Return. Return. Return. Return. Return." No matter what I said to him, he would not stop repeating that one word. Unsettled by this odd occurrence, I traveled to the PC repair shop to find out exactly what happened.

Upon arriving inside the building, I looked upon the shop, which was a shadow of its former self, in shock. There were countless wires all over the floor, smashed computer parts scattered in every direction I looked, fallen shelves on the ground, desks flipped over on the ground, and, to make matters even worse, there was blood splattered all over the wall. Being the reasonable, upstanding, college-educated citizen that I was, I immediately concluded that the current state of the shop was due to none other than an employee's stress from work. I looked around a bit more, spotted three bodies sitting against the wall, and in the middle of the room, I spotted my computer. "Ah. There it is." Directly next to it was the shop's owner, sitting on the ground in the fetal position.

When I questioned him, he kept repeating a single thing again and again: "Cannot be stopped! Cannot be stopped! Cannot be stopped!" I could not get him to tell me what was wrong, but after a bit of pondering, I quickly figured out precisely what happened: they were unable to fix my computer like they had promised. Disgusted by their failure, I turned to the shop's owner (who I now noticed had a gun to his head), and spat in his general direction. I then turned my back to him as if I was attempting to say that nothing behind me was worth my attention, and said to him, "Pathetic. Absolutely, positively pathetic. I asked you to do a single thing for me, and yet you failed even at that. Were I you, I'd be disgusted by myself, and I'd probably even take my own life. Such a worthless existence isn't even worthy of receiving my gaze!"

After saying that, I left the shop with my computer as if absolutely nothing had occurred there. And, indeed, there was nothing in that shop that was worthy of my attention. Still understandably disgusted by their inability to fulfill the promise, I said to myself, "I'll have to take this into my own hands." After getting into my car to drive home, I heard a gun shot from inside the repair shop. Being that it originated from the worthless owner of that shop, I promptly decided to ignore it.

Once I returned home, I, filled to the brim with confidence, immediately installed the software that I'd found a few days ago: MyCleanPC [mycleanpc.com] . The results were exactly what I expected, and yet, I was still absolutely in awe of MyCleanPC's [mycleanpc.com] wonderful performance. MyCleanPC [mycleanpc.com] removed every last virus from my computer in the span of a few seconds. I simply couldn't believe it; MyCleanPC [mycleanpc.com] accomplished in moments what "professionals" had failed to accomplish after days of work!

MyCleanPC [mycleanpc.com] is outstanding! My computer is running faster than ever! MyCleanPC [mycleanpc.com] came through with flying colours where no one else could! MyCleanPC [mycleanpc.com] totally cleaned up my system, and increased my speed!

If you're having computer troubles, I highly recommend the use of MyCleanPC [mycleanpc.com] . Don't rely on worthless "professionals" to fix up your PC! Use MyCleanPC [mycleanpc.com] if you want your PC to be overclocking, if you want your gigabits to be zippin' and zoomin', and if you want your PC to be virus-free.

Even if you aren't having any visible problems with your PC, I still wholeheartedly recommend the use of MyCleanPC [mycleanpc.com] . You could still be infected by a virus that isn't directly visible to you, and MyCleanPC [mycleanpc.com] will fix that right up. What do you have to lose? In addition to fixing any problems, MyCleanPC [mycleanpc.com] will, of course, speed up all of your gigabits until every component on your PC is overclocking like new!

MyCleanPC: For a Cleaner, Safer PC. [mycleanpc.com]

This is why you cloud your cloud... (4, Insightful)

houstonbofh (602064) | about 2 years ago | (#40272013)

If you have a critical service, have it at more than one host... That way when AWS has a bad hair day, you are still up.

Or, have your entire business totally dependent one someone else. (Sounds kinda scary that way, don't it?)

Re:This is why you cloud your cloud... (5, Funny)

girlintraining (1395911) | about 2 years ago | (#40272085)

If you have a critical service, have it at more than one host... That way when AWS has a bad hair day, you are still up.

While we're at it, we should probably backup the internet too. You'd think someone would have done it by now, in case it crashes, but I can't find any record of anyone doing it.

Re:This is why you cloud your cloud... (4, Funny)

c0lo (1497653) | about 2 years ago | (#40272101)

You'd think someone would have done it by now, in case it crashes, but I can't find any record of anyone doing it.

Heh... the real think crashed long ago, you are using now the backup.

Re:This is why you cloud your cloud... (1)

siddesu (698447) | about 2 years ago | (#40272867)

You can always run freenet on your tablet, it will back the important parts for you ;)

Re:This is why you cloud your cloud... (1)

RKBA (622932) | about 2 years ago | (#40275995)

While we're at it, we should probably backup the internet too. You'd think someone would have done it by now, in case it crashes, but I can't find any record of anyone doing it.

Internet Archive [archive.org]

Re:This is why you cloud your cloud... (2)

sgtrock (191182) | about 2 years ago | (#40276089)

...I also noticed that my harddisk on "linux.cs.helsinki.fi" (which is where I keep the primary development sources) seems to be going, so keep your fingers crossed. I thought I'd better upload what I have now, rather than notice that I lost everything when I get back to work on Monday..

(Only wimps use tape backup: _real_ men just upload their important stuff on ftp, and let the rest of the world mirror it ;)

Linus Torvalds, July 20, 1996 [google.com]

Re:This is why you cloud your cloud... (4, Insightful)

martin-boundary (547041) | about 2 years ago | (#40272243)

There's a limited number of cloud hardware providers on the internet, and the rest are middle men. It's useless to diversify yourself on the middle men, they will all be affected when the common underlying hardware provider has an issue. Thus there's a limit to the reliability that can be achieved, irrespective of how much mixing and matching is performed at the "business end".

Diversification only "works" when the alternatives are provably independent. That's not true in a highly interconnected and interdependent world, which is TFA's point, I believe.

Re:This is why you cloud your cloud... (2)

Gerzel (240421) | about 2 years ago | (#40272463)

That's why you use a back up other than the cloud. If you can backup across clouds you almost certainly can backup across some real-hardware and the cloud using both at the same time.

The could is great to provide extra power and computing resources cheaply but real hardware and servers owned by your company also still serve a vital role. One can be a backup to the other and both be utilized sharing the load at normal times.

Thus you don't have to pay for the full hardware costs of what you use by offloading some of it to the clouds.

Re:This is why you cloud your cloud... (4, Informative)

im_thatoneguy (819432) | about 2 years ago | (#40272269)

That's one of the problems though that the researcher is flagging.

1) If a company has one instance on AWS and one on Azure and AWS fails... Azure suddenly doubles in load ( and also fails due to everybody piling on unexpectedly).

the other being:

2) Everybody uses Azure for SQL and AWS for hosting and Azure goes down... suddenly SQL dies and the AWS hosts all fail with the database down. Or the converse happens and AWS goes down and the SQL is useless without a head.

The more services you rely on the more likely that on any given day one of them will be down. If you have 99% reliability and 20 services that you depend on (without any redundancy) then your failure rate could be up to 20% since any one of the 1% failures could kill your service.

It's interesting but it seems like most of the cloud failures have been due to #1 internally so far. One sector fails and in an effort to load balance it starts taking out its peers who then also overload and take out their peers.

Re:This is why you cloud your cloud... (1)

k8to (9046) | about 2 years ago | (#40279099)

It's worse than this.

Most cloud services are built out using a significant number of other cloud services. That's the "upside" of being in the cloud -- you can use software/platform as a service to reduce the management/overhead costs of building out all that infraustructure yourself. So you can use service X for credit card running, and service Y for user support, and service Z for indexing and search, and so on.

A modern cloud offering might use 10 or more other services. And those offerings may be using a list of other cloud services. And so on.

This is where meltdowns become highly plausible.

Re:This is why you cloud your cloud... (0)

Anonymous Coward | about 2 years ago | (#40280311)

It's worse than this.

Most cloud services are built out using a significant number of other cloud services. That's the "upside" of being in the cloud -- you can use software/platform as a service to reduce the management/overhead costs of building out all that infraustructure yourself. So you can use service X for credit card running, and service Y for user support, and service Z for indexing and search, and so on.

A modern cloud offering might use 10 or more other services. And those offerings may be using a list of other cloud services. And so on.

This is where meltdowns become highly plausible.

I think that's my cue: The Obligatory XKCD! [xkcd.com]

Re:This is why you cloud your cloud... (0)

Anonymous Coward | about 2 years ago | (#40272395)

It's time to turn to Cloud Based Raid Array Digital Interface Optimization

a.k.a. CB Radio

Re:This is why you cloud your cloud... (0)

Anonymous Coward | about 2 years ago | (#40272579)

Don't be a clown in the clowd. You can run trivial shit in the clowd but not mission critical infrastructure. I don't have critical services I have critical networks. I can and do swap them out. I can't do that with the clowd clowns. I've already had one clowd provider go dark for 15 minutes and take several hours to get fully back online. Yes we had backup resources and were lucky one of of six worked. Clowds are for clowns.

Re:This is why you cloud your cloud... (1)

AmberBlackCat (829689) | about 2 years ago | (#40278591)

That kind of sounds like distributed computing, which is what the cloud was before corporations picked up on the word and applied it to what they wanted to sell.

Re:This is why you cloud your cloud... (1)

Shagg (99693) | about 2 years ago | (#40284881)

If you really have a critical service then you're not going to be putting it on "the cloud" anyway.

XKCD (3, Funny)

Shadyman (939863) | about 2 years ago | (#40272019)

XKCD (jokingly) saw this coming a while ago: http://xkcd.com/908/ [xkcd.com]

Re:XKCD (1)

Anonymous Coward | about 2 years ago | (#40272839)

XKCD (jokingly) saw this coming a while ago: http://xkcd.com/908/ [xkcd.com]

It's rather chilling. Imagine for a moment a configuration where Cloud A hosts an online auction website. It automatically converts bids into the user's chosen currency. 'A' also hosts a service which calculates the current value of a few obscure currencies. Add in a Cloud B which hosts that auction sites' currency exchange information service and is written to check the exchange rate of, for example, bitcoins, every hour. Also on 'B' is a service that aggregates various auction sites for best deals. Assume any part of this is badly written, and may break when unable to get a pong on a value. If either cloud hosting provider goes down, up to four services stop working, despite each only hosting two. This problem is only magnified and made much more complicated when dealing with a heterogeneous environment, with many cloud solutions and multitudes of apps and services being hosted within each. More and more so, we are becoming dependent on CDNs and third party APIs to help streamline and improve the web browsing experience. It's just so easy to link against, say, jquery mobile. They even recommend it, "It’ll likely be the fastest way to include jQuery Mobile in your site."

I can easily imagine the future possibility of a single service having an issue that cascades into a tidal wave of problems and broken apps/software/websites.

impossible idea. (2)

Spinalcold (955025) | about 2 years ago | (#40272041)

we live in an age where information is distributed, even if statistical. (hell I made a fake Facebook account and somehow they found my mom, and she is no where close to me) a meltdown of information can't happen unless there is a world wide melt down of power. we have backups, but also ways of statistically restoring those backups.

Re:impossible idea. (3, Insightful)

c0lo (1497653) | about 2 years ago | (#40272119)

we live in an age where information is distributed, even if statistical. (hell I made a fake Facebook account and somehow they found my mom, and she is no where close to me) a meltdown of information can't happen unless there is a world wide melt down of power. we have backups, but also ways of statistically restoring those backups.

Redundancy helps but it is not bullet-proof. A good chunk of it is the "topology" in which this redundancy is engaged in events of failure.(e.g. we had cascading blackouts in the past even if the energy network had enough total power to serve all consumers)

Have a look on cascading failures [wikipedia.org] .

Re:impossible idea. (1)

Gerzel (240421) | about 2 years ago | (#40272469)

No no it CAN happen. It might not be likely at any given moment but it is well within the range of possibility and that possibility grows larger every day steps are not taken to minimalize it.

Re:impossible idea. (1)

Spinalcold (955025) | about 2 years ago | (#40272589)

ok, I agree, it can happen, but the chance is on a logarithmic scale. So, a huge failure is unlikely but it would be...well huge. Data loss is inevitable, why not put things in many area's at once? The 'earthquakes' are easier to deal with for the 'country' but for the individual, they should invest in long term 'earthquake control'. That's probably a HORRIBLE analogy, but it's all I could come up with.

Re:impossible idea. (0)

Gerzel (240421) | about 2 years ago | (#40273251)

I don't think it is even that small of a chance, certainly not on the logarithmic scale. Remember while the chance of failure for any one instance may be very low you have to take the chances over the whole range of instances.

Re:impossible idea. (1)

Spinalcold (955025) | about 2 years ago | (#40274691)

But that's the point, it's layers of redundancy. You'd have to have failures across not just one center but ALL centers simultaneously. The chance you get of that is the chance of each one going down multiplied together (.1*.1*.1*n).

The analogy the author uses doesn't work. (4, Insightful)

stephanruby (542433) | about 2 years ago | (#40272045)

The analogy the author uses doesn't work.

A better analogy would be the airline industry. The airline industry likes to over-book airplane seats it may not have because it's always trying to optimize its profit-margin.

The same will happen with cloud-services. Cloud-services will always try to optimize their own profit-margins, at the risk of triggering significant outages.

And I don't see what this has to do with the financial crisis at all.

Re:The analogy the author uses doesn't work. (4, Insightful)

pitchpipe (708843) | about 2 years ago | (#40272137)

A better analogy would be the airline industry.

I think a better analogy is the power grid. System hits a peak, one line goes down, others try to compensate becoming overloaded, another can't handle the load and goes down, and behold: cascading failures.

Re:The analogy the author uses doesn't work. (1)

Anonymous Coward | about 2 years ago | (#40273739)

Yep. The power grid is a good example -- especially the wet dreams of a 'smart grid'. Particularly since one of the conclusions from the 2004 blackout was that the management complexity of the power grid was a principle cause of the cascade -- i.e. the potential interactions of all the interconnects transcended our understanding. KISS is still the watchword of the day -- as Scottie once said 'the more you complicate the plumbing the easier it is to plug up the works'. Do the math sometime on the likelihood that a connected system, with each part having a low failure rate, will be up -- its pretty depressing. The miracle, actually, is that anything works at all.

Re:The analogy the author uses doesn't work. (2)

c0lo (1497653) | about 2 years ago | (#40272143)

And I don't see what this has to do with the financial crisis at all.

The insurance/reinsurance and CDO [wikipedia.org] schemes in finance resembles "fail-over redundancy activation" in the cloud. Enough complexity and nobody can predict what can happen until it actually happens - see cascading failure [wikipedia.org]

Re:The analogy the author uses doesn't work. (3, Informative)

TubeSteak (669689) | about 2 years ago | (#40272159)

And I don't see what this has to do with the financial crisis at all.

FTFA

New cloud services may arise that essentially "resell, trade, or speculate on complex cocktails or 'derivatives' of more basic cloud resources and services, much like the modern financial and energy trading industries operate," he wrote.

Each of these various cloud components are often maintained and deployed "by a single company that, for reasons of competition, shares as few details as possible about the internal operation of its services," Ford added.

As a result, the cloud industry could find itself "yielding speculative bubbles and occasional large-scale failures, due to 'overly leveraged' composite cloud services" with weaknesses that don't become known "until the bubble bursts," Ford wrote.

The metaphor more ore less fits, except for the part that ignores how a lot of what happened during the financial crisis was outright fraud perpatrated by lenders.

The potential mess with the cloud is not about fraud, just about excessive dependancies.

Re:The analogy the author uses doesn't work. (1)

Gerzel (240421) | about 2 years ago | (#40272489)

The biggest difference is that it is still somewhat easy for companies to balance themselves against the cloud by having their own hardware running.

They don't need full capacity capabilities, but even a small amount of capability can keep their services up, if slowed, rather than a full crash mitigating costs when things do go wrong.

Physical hardware also provides a way out of a service provider.

Though physical hardware also requires physical staff but that is the downside of in-sourcing. The downside of outsourcing (which is what the clouds are) is that you don't have the capabilities and when your outsourced provider crapps out you are eventually SOL.

Re:The analogy the author uses doesn't work. (2)

turbidostato (878842) | about 2 years ago | (#40272703)

"The biggest difference is that it is still somewhat easy for companies to balance themselves against the cloud by having their own hardware running."

Regarding services, what's the real difference when using my own hardware? I think Amazon owns its own hardware too.

"They don't need full capacity capabilities, but even a small amount of capability can keep their services up, if slowed"

Slashdot effect? For so many services, if you can't go full capacity, you don't serve at lower speed, you just don't work, full stop.

And it is not even a cloud issue. It's not the first time I advised a customer not to go with an active/active scenario for high availability but active/hot standby instead and my advise being rejected because they didn't want to invest on a spare "doing nothing". Of course, the first time a failure pushed full capacity to the remaining, which couldn't stand the overload and failed too, they started thinking otherwise.

Re:The analogy the author uses doesn't work. (1)

houstonbofh (602064) | about 2 years ago | (#40275463)

"The biggest difference is that it is still somewhat easy for companies to balance themselves against the cloud by having their own hardware running."

Regarding services, what's the real difference when using my own hardware? I think Amazon owns its own hardware too.

Mega-upload also owned it's own servers. And they are not the only cloud provider to have hardware inappropriately seized. You do not have control of someone else's hardware.

Re:The analogy the author uses doesn't work. (1)

turbidostato (878842) | about 2 years ago | (#40278525)

"Mega-upload also owned it's own servers. And they are not the only cloud provider to have hardware inappropriately seized. You do not have control of someone else's hardware."

Maybe, but that's not the point you are making with your example.

As you already said, megaupload *owned* their servers. If this case has to show anything is that you can't control your own hardware either.

Re:The analogy the author uses doesn't work. (1)

CimmerianX (2478270) | about 2 years ago | (#40283039)

No - you can't control when the government seizes your hardware. But the point here is that if you use the 'cloud' as your storage or sole backup, your trusting your data to another service.

Unlike Megaupload which stored other peoples data and allows 'sharing', if you 'own' your own servers, you control the data, the backups, can write backups to out of state servers you also own....

Re:The analogy the author uses doesn't work. (0)

Anonymous Coward | about 2 years ago | (#40272981)

also, in the wake of economic crysis, they just discovered money they tough it was simply wasn't there

you can always call dell for some more cpu, if the cloud boes bankrupt and can't lend any more cpu power.

Re:The analogy the author uses doesn't work. (3, Insightful)

plover (150551) | about 2 years ago | (#40272227)

I think by "financial crisis" he meant "a minor market crash due to autotrading algorithms", and not the real crisis being caused by thieves running trillion dollar banking, mortgage, and insurance scams.

The point is "if you use similar automated response strategies as a large set of other similar entities, you could all suffer the same fate from a common cause."

Supposedly a market crash was triggered by autotrading algorithms that all tended to do exactly the same thing in the same situations. So when the price of oil shot up (or whatever the trigger was) then all those algorithms said "sell". As all the sell orders came in, the market average dropped, and the next set of algorithms said "sell moar". So there was a cascade because so many systems had identical responses to the same negative stimulus. Think of those automated trades as being akin to a "failover" IT system: if host X is failing, automatically shift my service load this way.

So that's the analogy the author is trying to make with respect to systems that depend on automated recovery machinery like load balancers: if response time is too high at hosting vendor X, my automated strategy is to failover to hosting vendor Y. And perhaps 500 large sites all have a similar strategy. Now let's say that vendor X suffers a DDoS attack because they host some site that pissed off Anonymous. So now all these customer load balancers see the traffic slowing down a X, and they simultaneously reroute all app traffic to vendor Y in response. Vendor Y then gets hammered due to the new load, and the load balancers shift the traffic elsewhere. Now two main hosting providers are down while they try to clean up the messes, and the several smaller providers are seeing much bigger customers than usual using them as tertiary providers, and they start straining under the load as well, causing their other clients to automatically shift.

And if that isn't exactly what plays out next year, might not something similar happen with payment gateways, or edge content delivery systems, or advertising providers?

It's a cascade of failures due to automated responses that's remarkably similar to the electrical grid overloads that caused the northeast coast blackout in 2003. The author's point is "we don't know precisely what bad thing might happen within this particular ecosystem, but there is significant risk because we've seen complex interdependent systems have similar failures before."

Re:The analogy the author uses doesn't work. (2)

khallow (566160) | about 2 years ago | (#40272347)

The point is "if you use similar automated response strategies as a large set of other similar entities, you could all suffer the same fate from a common cause."

Systemic risks like this were also present in the real crisis. I think the primary problem here is simply that the risks aren't well understood and that users and suppliers of cloud services are likely to make unwarranted assumptions (or even warranted assumptions that get invalidated when the infrastructure gets stressed in certain ways). There's also the possibility for tragedy of commons problems on a global scale. For example, if requests for DNS (it's a bit low level for a cloud service, just using a concrete service for this example) occasionally get bounced, one might be tempted to issue several such requests in order that one gets through. If everyone does that, then the number of requests to DNS have just gone up by an unknown but hefty factor. If DNS was already stressed, then this collective behavior might push it into failing completely.

Re:The analogy the author uses doesn't work. (2)

im_thatoneguy (819432) | about 2 years ago | (#40272845)

I think by "financial crisis" he meant "a minor market crash due to autotrading algorithms", and not the real crisis being caused by thieves running trillion dollar banking, mortgage, and insurance scams.

You're right about the cascading failure but wrong about the financial crisis. The larger financial crisis was a crisis because banks had circular loans and insurance on one another. So if one bank failed it would suddenly stress the next bank to the point of failure and bankruptcy which would trigger another bank to fail and so on and so forth. What we had in 2008 was a cascading financial failure because everybody was insuring everybody else assuming that everybody wouldn't fail simultaneously.

Re:The analogy the author uses doesn't work. (1)

TubeSteak (669689) | about 2 years ago | (#40279265)

You and the GP are both incorrect.
TFA did not mean "a minor market crash due to autotrading algorithms," which the GP would know if they had RTFA.

And the larger financial crisis was not about circular loans. It was about 5 banks that were wildly overleveraged*
when the housing bubble popped and their losses were magnified between 30:1 and 40:1 instead of the industry standard 12:1.
This disaster devalued their housing holdings, which devalued everyone else's housing holdings, which fucked everything else, everywhere else, all at once.

*which TFA mentions

Re:The analogy the author uses doesn't work. (1)

plover (150551) | about 2 years ago | (#40292187)

The autotrading event was the trigger, but not the cause of the disaster. By itself, the autotrading crash would have been a minor event. The root cause of the disaster was the thieving bank scams, all of them together, including the housing market, the overextended banks, the deregulated investments made by the insurance companies, the insanely complex derivatives that spat out profits but had ultra high risks built in, all of that together was the real cause.

If you have a barrel full of gunpowder, and you're examining it closely with a lighted candle, it's hardly the candle's fault if it explodes.

What this has to do with the financial crisis? (0)

Anonymous Coward | about 2 years ago | (#40277417)

Nothing, it just another attack by bad analogy man. The global financial meltdown was cause by a small number of financial houses who bet against their own CDO debt bubble and then shorted the entire economy. The same people who are currently going through Europe and bankrupting whole countries one-by-one.

Re:The analogy the author uses doesn't work. (0)

Anonymous Coward | about 2 years ago | (#40285253)

And as usual at the source of every problem is an MBA. That whole department needs an overhaul. eg more ethics and more long term planning.

Low hanging fruit of a research piece (4, Interesting)

mcrbids (148650) | about 2 years ago | (#40272069)

Efficiency normally comes with economies of scale. As a partner in an outsourced vertical software company, we have hundreds of clients running in our highly tuned hosting cluster, and are able to bring economies of scale to an otherwise ridiculously expensive software niche. Yes, that means that if we have an outage, all of our clients experience an outage as well.

However, we have carefully laid plans for multiple recovery points in a disaster scenario, (Plan B, Plan C, Plan D, etc) and have maintained an uptime significantly better than our clients would typically attain if left to their own devices. We easily manage close to 4 nines of uptime in an industry where the average is realistically around 2 nines. (having "the computer is down" a day or two every year or so is typical)

Although the Internet is a "network of ends" the truth is that not all ends are created equal. Having a high quality, high speed (100 Mb), reliable (99.99%+) Internet feed in my small-ish hometown of around 80,000 people is ridiculously expensive. But in a nearby city (500,000 people 2 hours' drive) we host our servers in a tier 1 colo at 1/10th the cost of running it all ourselves, with dramatically improved reliability and network performance.

Yes, putting all your eggs in one basket means that if that basket fails, you lose all your eggs. But it also makes it easy to buy just one, really nice basket that won't break and lose your eggs.

Re:Low hanging fruit of a research piece (2)

jtownatpunk.net (245670) | about 2 years ago | (#40272667)

Yes, putting all your eggs in one basket means that if that basket fails, you lose all your eggs. But it also makes it easy to buy just one, really nice basket that won't break and lose your eggs.

That's a great analogy until someone throws a weasel in your well-crafted basket.

Re:Low hanging fruit of a research piece (1)

blind biker (1066130) | about 2 years ago | (#40272945)

Efficiency normally comes with economies of scale. As a partner in an outsourced vertical software company, we have hundreds of clients running in our highly tuned hosting cluster, and are able to bring economies of scale to an otherwise ridiculously expensive software niche. Yes, that means that if we have an outage, all of our clients experience an outage as well.

In your post is implied that you have a single location. How do your customers feel about that - if they're even aware of it?

Technology.... (1)

Anonymous Coward | about 2 years ago | (#40272105)

Never turns on its makers. Never. This story is bullshit. Technology is a tool. I treat it like a tool. I control it.

Now, who's up for another drink?

A cloud meltdown? (0)

Anonymous Coward | about 2 years ago | (#40272133)

Sounds to me that you have a mushroom cloud.

if they actually do this - they're stupid (3, Interesting)

Karmashock (2415832) | about 2 years ago | (#40272135)

systems needs to be compartmentalized or have redundancies built into them.

For example, I have several systems that send automated emails. I've had a problem in the past of given email servers not accepting or sending messages. It's uncommon but it happens and it's not acceptable. These are mission critical systems. They can't fail.

Solution? Redundancy up the wazoo. The way it's set up now so many things would all have to happen at the exact same moment that the only way the system is likely to fail is if we fight world war 3... and lose.

That is how you solve this problem. Don't rely on any one system. Rely on all of them. Once you figure out how to integrate one of them it's typically easier to integrate the rest. The virtues of this approach are manifest. Not just stability but if the services do processing or data retrieval you can cross reference them to find errors in databases or get a more complete data set then exists in any one source.

I mean is google or bing the best search engine? What about both at the same time?

Re:if they actually do this - they're stupid (1)

Anonymous Coward | about 2 years ago | (#40272471)

Umm, no. Doing what you say will result in the catastrophic failure. You and X percentage of companies are on CloudA. One of those companies gets a massive DNS attack and CloudA can't handle it along with their normal loads (they oversell just like net providers, both lined and wireless telephone services, and airlines). CloudA goes down (or it forwards to CloudB). You and some of X move their loads to another CloudB. CloudB now has way more load than it expected. They've never had so much of CloudA's load before. The other copmanys on CouldA more to other Clouds increasing those loads somewhat. One of these smaller Clouds can't take the extra load and go down. Again the companys redistribute to the reminaing Clouds. These Clouds again see their loads increase and again the weakest Cloud dies. This pattern will repeat until every cloud service is offline.

Can the largest cloud server handle all the cloud computing in the world? Why would the MBAs let so much hardware sit idle (and thus wasing money)? They would require services to be oversold to keep loads at an acceptable profit level. Peak demand be dammed. It'll be cheaper to offload our peak demands to another cloud instead of mataining hardware that only gets used once a year (the exact reason to go to the cloud in the first place).

Everything will happen much faster if a larger cloud goes down. The smaller ones will be clobbered with massvies loads and will all go down; especially the ones who had planned on offloading their extra work onto the downed larger cloud. There's no way to know what cycles the cloud compaines have gotten themselves into. CloudA might off load to CloudB, CloudB uses CloudC and CloudD, CloudC uses CloudA, and CloudD uses CloudC (A->B->C/D->C->A...)

Keep your data local and be able to matain minimal operating levels with that data. Feel free to use the cloud for everything above that. The cloud is a silver bullet; aimed right at your company.

err, I misunderstood some of your post but I already wrote all of the above. I'll post it anyway because it's still correct, just not a great reply. Sorry.

This! (0)

Anonymous Coward | about 2 years ago | (#40276329)

This is a perfect description of what will be "the perfect storm"(cloudy pun intended). And when, not if, it happens there will be a massive exodus form the cloud. The question is, where will the exodus go? Will they bring their data centers back in house? Will they colo and build their own private clouds?

As soon as I can figure out where they will go, I'll be putting my money there.

Re:if they actually do this - they're stupid (1)

Karmashock (2415832) | about 2 years ago | (#40279135)

Why would a DDoS attack cause a chain reaction?

First off, the cloud is especially resistant to DDoS attacks. Ask Amazon. They've designed their systems specifically to reduce the effectiveness of that sort of attack. And as systems become larger they become harder to hit with DDoS attacks. You might as well try to DDoS a root DNS server. Have fun with that.

Furthermore, why would ALL systems route to the same alternate cloud provider? Rather then everyone going from A to B what you'd actually see is some going to B some going to C some going to D-Z. You're not going to see everyone go from one to the next. Which means that rather then load doubling on B you'll see a 5 percent increase in load on B-Z. That's an exaggeration. There will be favorites so some service might get a 15 percent increase while some would see a 1 percent increase. But you're not going to get some perfect domino effect.

It all boils down to how many reasonable IF/THEN subroutines you've built into the system. In the emailing system I have, every time an email is sent a check is done on the email server to make sure it is working. If it isn't it routes to another server in a list. Then there is a check to make sure the email got through. If it didn't there are additional subroutines that go through a trial and error process working through most likely reasons for a failure.

The result is that emails get through. Always. The only thing I don't have control over is the receiver's server. If that stops working there isn't anything I can do about that. But short of that, the email gets through.

Everything is logged. Everything is cross referenced. Everything is added to spreadsheets and turned into lovely little graphs.

The mail gets delivered. Period. No excuses.

And with this cloud computing there are ways to do the same thing. Redundancy and compartmentalization.

You want to make sure that a failure in one system can't cause a failure anywhere else. And you make sure that if any system fails there are backs ups upon backups upon backups.

The system fails if the whole network drops. If any portion of the network starts working again, all jobs can be routed to that portion of the network and everything continues.

I've seen multiple failures before. We had an issue not long ago where many companies decided to do maintenance at the exact same time. Their systems all went off line for a couple hours. The system automatically shifted from unresponsive systems to responsive systems and there was no disruption even though upwards of 40 percent of the systems were not responding.

This is how you do it.

Re:if they actually do this - they're stupid (1)

happyhamster (134378) | about 2 years ago | (#40272499)

Redundancy in common sense would only solve hardware issues. You can have multiple servers, multiple network connections, multiple power solutions. What about software? If you run the same mailing software on all servers, a bug or vulnerability in the mailer would happily bring down all your servers. Same with bugs and vulnerabilities in operating systems running on the servers, and other software. Unless you run several different mailer systems, operating systems, etc, and they all synchronize between themselves (e.g. no emails get lost, no emails sent more than once etc.), redundancy as it's commonly practiced is not the solution to cloud problems.

Re:if they actually do this - they're stupid (1)

Karmashock (2415832) | about 2 years ago | (#40273113)

Why can't I use several competing cloud systems that do the same thing? What they're talking about here is powerful cloud systems that depend on each other. So if one cloud goes down it causes a chain reaction of failure.

But if every system can use two or three different sources for everything then it doesn't need any specific cloud to be running so long as most of them are running.

Re:if they actually do this - they're stupid (1)

NeutronCowboy (896098) | about 2 years ago | (#40274947)

Because at some point the ROI isn't there. It's a common problem actually. Everybody knows how to make things redundant - triply, quadruply, etc. The problem is that no one is willing to pay for that kind of redundancy. The business doesn't, the clients don't, and you sure as hell aren't paying for it out of your own pocket. So you rely on failover mechanisms that are generally doubly redundant, or at least that rely on a large number of inexpensive machines. On top of that, you craft as clever a process as you can.

And then you discover that there is a cascade effect you didn't consider, or that you did consider but didn't have the money to build for. And that's when things go to hell..

Re:if they actually do this - they're stupid (1)

Karmashock (2415832) | about 2 years ago | (#40279229)

It's all about how you design it. It isn't just redundancy. It's compartmentalization. Given systems are going to fail. You need to set it up so they can fail totally without it effecting anything else. The redundancy especially in an enterprise organization is a requirement.

A lot of people looked at the cloud as a way to save a lot of money on computers etc. For mission critical applications that's the wrong attitude. Instead, you should look at it as an opportunity to make the system more robust. Take the per system cost savings and put them towards redundancy. A cloud system is a tenth the cost of an in horse system by our calculations. The cost of the system isn't really important to us since it's nominal in the scheme of things. So we took the savings and put them towards making the system more robust.

We joke that it's military grade at this point though honestly it's probably well beyond anything you see out of the military. We built everything with the assumption that Murphy was not our friend. A cascade failure is only going to happen if you don't have any redundancy. And in that case you deserve the result.

No pity.

Re:if they actually do this - they're stupid (1)

ultranova (717540) | about 2 years ago | (#40275559)

For example, I have several systems that send automated emails.

Didn't those switch to globally distributed clouds years ago? Ones composed mostly of unpatched Windows machines, if I understood correctly.

Re:if they actually do this - they're stupid (1)

Karmashock (2415832) | about 2 years ago | (#40279273)

I can't speak to what remote systems outside my control are doing. But we could tolerate 95 percent failure and still operate at 100 percent efficiency.

As I said, we have a great deal of redundancy built into it.

We've seen failure rates as high as 40 percent. Only for an hour or so... It had no effect on us.

just like mainframes (4, Insightful)

Dan667 (564390) | about 2 years ago | (#40272147)

I think it is funny that lessons learned years ago with mainframes are being presented as new by just changing the word mainframe to cloud.

Re:just like mainframes (1)

StormReaver (59959) | about 2 years ago | (#40275041)

I think it is funny that lessons learned years ago with mainframes are being presented as new by just changing the word mainframe to cloud.

I wish I could moderate you higher than +5. I got into computing in the mid 80's, when home computers were popular, and never had to deal with mainframes. However, I know enough about computing history to see exactly how absurd this entire "cloud" computing fiasco is becoming. And it is going to exactly follow the same curve that mainframes followed, until "suddenly" the concept of having your own computing resources is going to be "new and exciting" again.

P.T. Barnum would be proud.

Re:just like mainframes (1)

houstonbofh (602064) | about 2 years ago | (#40275541)

So you worked at an Application Service Provider (ASP) at the turn of the millennium too?

I think this is why the 40+ crowd has trouble getting work in some of the "New Internet" businesses. We were at the old Internet businesses last time and remember how the cool-aid tasted then.

In other words (2, Insightful)

Anonymous Coward | about 2 years ago | (#40272197)

Unmanaged systems are hard to manage.

So (1)

mikes.song (830361) | about 2 years ago | (#40272199)

Ford compared this scenario to the intertwining, complex relationships and structures that helped contribute to the global financial crisis.

Cloud computing is like fractional reserve accounting, with artificially low interest rates?

UNISEX HotCloud? (0)

Anonymous Coward | about 2 years ago | (#40272247)

Sounds like a hair salon.

Nightmare scenario has already happened (3, Insightful)

dbIII (701233) | about 2 years ago | (#40272259)

It's a leap year, February 28, and all over the world, completely out of the blue (or azure if you prefer) cloud clusters crash as the local clocks swing around to midnight, then stay down all day.
Still, it's three nines of uptime when it's spread out over a few years :)

A highly interdependant system is only as reliable as the QC on the weakest link. Who would have thought that somebody from a company that had a lot of embarrassing press about a leap year stuffup would make such a stupid and obvious mistake four years later? That's the cloud, where even the biggest names still don't care anywhere near as much as you would about your own systems and so don't pay enough attention to detail.

Re:Nightmare scenario has already happened (0)

Anonymous Coward | about 2 years ago | (#40274921)

That systemic failure is a bitch.

However, I believe that demonstrated the system wasn't highly independent and there in lies the issue. I really doubt it's a matter of caring or not. Often, you'll find these guys really do care about what they do or else it would just fall apart. No one gets up at 4am to respond some god forsaken issue because they like the early morning pager based wake up call. More likely, mistakes are made because there are usually not enough hands in the cookie jar. That's typically what I see businesses do. Run with a horrible level of safety in the guise of saving a few dollars. Never mind any reasonably horrible failure will actually cost more then the equipment or software that could have mitigated the disaster. After a few of those incidents company's tend to either go the way of the dodo or get smart. Suddenly, it's not worth risking issue X and Y is created to handle it. (Maybe they get smart and start asking the tough questions on mitigating all the Z's too)

That's my entirely random and likely inapplicable thoughts.

with a difference (1)

Ralph Spoilsport (673134) | about 2 years ago | (#40272393)

Ford compared this scenario to the intertwining, complex relationships and structures that helped contribute to the global financial crisis."

The difference being, of course, that the global financial crisis was the product of the abyssal greed of speculators and the stupidity of venal governments borrowing from private banks instead of doing the right thing and being directly responsible for the creation of money.

But other than that, sure it's just like it.

(/snark)

Who uses the cloud for serious uptime? No one sane (1)

Anonymous Coward | about 2 years ago | (#40272443)

Using a public cloud seems sensible for low risk projects, or one off, large scale computations. The security and availability risk would suggest that anyone using the cloud for their entire infrastructure has either read too many brochures, or is about to do something else crazy, like divest their entire original business, and then hike service charges.

Re:Who uses the cloud for serious uptime? No one s (0)

Anonymous Coward | about 2 years ago | (#40274515)

You mean like hotmail, ebay, amazon, salesforce or.... Apple? (icloud runs on azure).

The Octel Clusterfuck (0)

Anonymous Coward | about 2 years ago | (#40272761)

I was a sysadmin at Octel Communications back in the day. Octel invented voice mail; perhaps you've heard of it.

When I hired on we had three Sun 3/280 servers. I think these were 60830 boxen, but they might have been '020s. They were primarily used for cross-compiling the homebrew RTOS that Octels voice mail machines ran, but they were also used for Electronic Design Automation.

There was a mysterious problem that from time to time would cause one of the servers to go to its knees for an hour or two, but not actually crash. Because all three machines were NFS hard-mounted on each other, as soon as one machine got stuck, they were all stuck. 250 engineers all got to sit on their hands while I contemplated whether I'd be a few inches short of a head by the end of the workday.

I asked a colleague why we didn't soft-mount the NFS shares. That would allow a client of a hung server to timeout. My colleague's reply was that, at the time at least, we couldn't count on our development tools to do the right thing if they got read or write errors during a build. It was felt that soft-mounting might lead to bad machine code generation.

In the end it turned out that the hung servers was caused by high capacitance serial cables. When a machine would emit "SunOS Login:", it would receive a capacitively-couple bunch of garbage back, that login would take as the username. Login would then prompt "Password:", and receive again garbage for the password "attempt". Each machine had 32 serial lines, some of them going hundreds of feet. Good thing I studied Physics and not Computer Science!

The solution was to buy a big, long, expensive spool of serial cable that had lower capacitance per foot, as well as a bunch of RS-232 plug kits, and then to tear out and replace all the cable. That took some convincing to get the management to give me the budget and the time to do the work, but in the end all I required to convince my manager Karen Coates was to hook a glass TTY up to a scope.

In Other News: I have been doing some study of security, and will have results to announce soon. These results will be digitally signed. Please use a keyserver to download my Public Key [mit.edu] into your keyring. Please use nothing other than my key fingerprint; key emails and Key IDs can be spoofed:

Fingerprint=9B9F 2D03 9996 AF83 9A4F CB26 20E8 0D0B F760 5786

How is this different from the internet itself? (1)

WOOFYGOOFY (1334993) | about 2 years ago | (#40273265)

Seriously. I don't get why this same description doesn't apply to the internet itself, a thing known to work reliably?

Don't MAKE me RTFA.

How to solve the debt crisis (0)

Anonymous Coward | about 2 years ago | (#40273607)

Host all the debt on the cloud, then pfft, gone!

The focus on infrastructure as a service is flawed (0)

Anonymous Coward | about 2 years ago | (#40273863)

As long as you are focusing on infrastructure, or dealing with IaaS providers, you will be stuck thinking of all of the typical IT failure scenarios (systems, not people) but at a much larger scale. The future of cloud computing lies in two areas. Platform as a Service (PaaS), and changing how we write software (in that order).
I don't work for Microsoft, I am talking about Azure specifically because this is our first implementation, but we plan on using other cloud providers as they mature to catch up with Azure.

PaaS. http://en.wikipedia.org/wiki/Platform_as_a_service

Systems like Azure and to a much smaller extent AWS although nobody uses it that way, are abstracted away from the 'myapp==this host' thinking and more towards treating the cloud as if it is an OS overlaid on top of a very large compute fabric. In our deployments we have started re-writing all of our critical functions as worker roles within Azure. The worker roles are dispatched using cloud native functions. We have roles for SQL, processing, BLOB (data) stores, etc. We have some fairly generic communication libraries we use to get them to work together along with the native azure functionality. We have several backend management instances which act as coordinating hubs to deploy, monitor, and manage, all of the worker roles. This allows us to do several things, one, is that any bottleneck can typically be isolated at a much finer level than you would typically get running a monolithic application stack. This allows us to duplicate roles that are getting overworked. This in turn gives us much finer control over scale as we can run multiple roles on the same system, on different systems, whatever makes sense. For purposes of backup we have a very small, almost idle mirror setup in each of the different Azure data centers with only the database being actively migrated (synced). If one data center were to go down, we could basically pick up in another data center and 'right size' the entire thing in a matter of minutes (at worst). All of this is routed to the users through two different CDNs. So there is no direct client to process connection.

Anyhow, that is the route we are taking. Yes, it was a bit of an undertaking to get going but we have been doing it piece by piece with a long ways to go but we are very satisfied with what we have achieved so far.

Server gone! (0)

Anonymous Coward | about 2 years ago | (#40274363)

This is nothing compared to the harm that will be done when government confiscate cloud servers in the name of gathering terrorist information.

Re:Server gone! (1)

houstonbofh (602064) | about 2 years ago | (#40275631)

Why post AC? This has happened already, many time, and not "terrorist" buzzword needed. MegaUpload was just the most recent high profile one.

uhm where is the fraud? (1)

decora (1710862) | about 2 years ago | (#40275633)

the 'global financial crisis' was caused directly by massive fraud and profiteering. is there any incentive for cloud companies to create massive quantities of products that are completely worthless and sell them to sucker investors?

Re:uhm where is the fraud? (1)

lennier (44736) | about 2 years ago | (#40277799)

the 'global financial crisis' was caused directly by massive fraud and profiteering. is there any incentive for cloud companies to create massive quantities of products that are completely worthless and sell them to sucker investors?

Um, is that a trick question?

If you're selling something - whether it's investment, insurance, public key certificates or data backup - where the other buyer can't directly measure the quality of the product, of course there's incentive for fraud.

Here, just upload your data to Dev Null Industries quad-cached completely tamperproof server. No, your data isn't encrypted. No, you can't have it back all at once. No, we won't peek, honest, and of course we'd never sell your spreadsheets to your competitors. Seriously. Would this stock photo of a face lie to you?

Not just technology (1)

Sqreater (895148) | about 2 years ago | (#40282369)

Complexity is rising in all things at a frightening rate, not just technology. Over my lifetime the amount of information required to make any decision has become massive. For instance, can your select the "best" cellphone for you today? Which credit card? Car? Checking account? There is a coming "complexity collapse." What it will look like, or what the consequences will be is hard to project, but there cannot be an infinite rise in complexity in our lives without something painful happening eventually. Will people retreat from complexity? Will they just start to chuck technology and pull back from activities we now take as normal? Put their money in a mattress at home and use tin cans to communicate? Probably not. But what will they do to protect their sanity when bombarded by too many unmakeable decisions?

Pretty vague (0)

Anonymous Coward | about 2 years ago | (#40283765)

Would be nice to read the paper rather than some nearly meaningless story about it.

Check for New Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Sign up for Slashdot Newsletters
Create a Slashdot Account

Loading...