Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Scalability In the Cloud Era Isn't What You Think

kdawson posted more than 3 years ago | from the partly-cloudy-with-a-chance-of-data dept.

Databases 75

Esther Schindler writes "'Scalability' isn't a checkbox on a vendor's feature chart — though plenty of them speak of it that way. In this IT Expert Voice article, Scott Fulton examines how we define 'scalability,' why it's data that has to scale more than servers, and how old architectural models don't always apply. He writes, 'If you believe that a scalable architecture for an information system, by definition, gives you more output in proportion to the resources you throw at it, then you may be thinking a cloud-based deployment could give your existing system "infinite scalability." Companies that are trying out that theory for the first time are discovering not just that the theory is flawed, but that their systems are flawed and now they're calling out for help.'"

cancel ×
This is a preview of your comment

No Comment Title Entered

Anonymous Coward 1 minute ago

No Comment Entered


I read the article (4, Interesting)

Saint Stephen (19450) | more than 3 years ago | (#32173082)

and learned not a damned thing. Classic marketecture speak.

Re:I read the article (1)

DeadDecoy (877617) | more than 3 years ago | (#32173160)

Damn, I was hoping for some technical discussion on moving from small databases of a few hundred mb to largish ones of a few petabytes while maintaining some kind of low level latency. (side note, Eve online's server model is an interesting example of this).

Re:I read the article (2, Insightful)

Anonymous Coward | more than 3 years ago | (#32173180)

Given "marketecture" speak is what got us into this cloud mess in the first place, perhaps fighting back with "marketecture" is appropriate.

Re:I read the article (4, Insightful)

timeOday (582209) | more than 3 years ago | (#32174346)

I'll bite, what's the "cloud mess"? In the olden days, we mocked slashdot story submitters who linked to videos because their ISP account, or university account, could never handle it. There wasn't really a way for an individual to share a video with thousands of people. Now we just upload to youtube, and viola, it works. Scalability issue solved. How many computers does it take to accomplish that? Where are they? Are they all in one place? It's a cloud, most of us don't know and don't care. It's good.

Re:I read the article (1, Insightful)

Anonymous Coward | more than 3 years ago | (#32177954)

So... you think that Youtube just popped into existence one day, perfectly scalable? People had to design a horizontally scalable video distribution platform. True, it works very well.

But that's irrelevant. Companies are coming online with products and thinking "I'll just host it in The Cloud(tm)!" Then they start looking at "Cloud services". And they think that their application will Just Work(tm) in The Cloud(tm).

Technology people know it doesn't work like this. Products, applications, and architectures need to be designed to be horizontally scalable. The "cloud mess" is people thinking that the cloud can actually solve any of their problems in application design. "The Cloud" is a term that is hugely misunderstood, and while this article screams marketspeak, it actually addresses some of that misunderstanding.

Re:I read the article (1)

joelsanda (619660) | more than 3 years ago | (#32173206)

Yeah, FTA: "“Take the time to sit down up front and ask, ‘What would we look like if we got really busy?’ and then plan to that." I remember yawning through micro economics in college. Then it was “Take the time to sit down up front and ask, ‘What would we look like if we got really *expensive*?’ and then plan to that." Same problem, different charlatan with a marketing budget.

Re:I read the article (0, Redundant)

Meshach (578918) | more than 3 years ago | (#32173274)

and learned not a damned thing. Classic marketecture speak.

You must be new here.

Re:I read the article (2, Informative)

c0d3g33k (102699) | more than 3 years ago | (#32173576)

You must be out of good ideas to add to the discussion.

Re:I read the article (1)

mehemiah (971799) | more than 3 years ago | (#32174092)

AWSOME comeback, just because our article selection process is susceptible to social engineering doesn't mean we shouldn't do anything about it.

Re:I read the article (0)

Anonymous Coward | more than 3 years ago | (#32174908)

I cringe when I see Schindler as the submitter. People will catch on eventually. Everything that I have seen from this submitter has been fluff.

Re:I read the article (1, Funny)

Anonymous Coward | more than 3 years ago | (#32175928)

OK, I'll bite.

No, really, get the fuck out of here or I will fucking bite you.

Re:I read the article (2, Interesting)

gstoddart (321705) | more than 3 years ago | (#32173296)

and learned not a damned thing. Classic marketecture speak.

I don't think it's marketecture -- I think it' trying to point out some issues which most of us have never really thought about in terms of cloud computing.

Admittedly, I couldn't read through the entire article in one go, but I am going to go back and try to finish it.

The thesis seems to be something along the lines of: everyone thinks that with cloud computing if you keep throwing resources at the problem, scalability is something which sorts itself out.

The reality seems to be that as companies do that, they find themselves using proportionally more resources, and then they hit a wall where scalability tips over, and they get less additional benefit per additional resource. Which seems to be contrary to what everyone believes about cloud computing.

That seems to happen because the traditional apps everyone is trying to scale with cloud computing don't necessarily benefit from that kind of scaling. It looks like that for a while, and then it falls apart, and people need to figure out how to fix it -- you end up with a hot mess that doesn't even remotely do what you need it to.

At least, that' my best high-level grokking. But, I don't claim to grok with fullness. :-P

Re:I read the article (3, Insightful)

c0d3g33k (102699) | more than 3 years ago | (#32174118)

Both. (Marketecture and not grokking with fullness, that is.)

Marketecture part: The delusional fantasy that because one is able to talk about things in a new way, old problems affecting scalability no longer apply. Very true. The marketers believe it. The foolish customers believe it. Anyone who has a clue runs for the hills.

Not grokking with fullness part: You've accurately grokked the "every (idiot) thinks that if ..." part. What you haven't grokked is the details. In place of your speculation, just substitute that those who do not learn from history are doomed to repeat it.

The fantasy I see over and over again whenever a "new" paradigm changing technology comes along is that problems which were hard using the 'old' approach are suddenly eliminated merely by virtue of doing things in a "new" way. The fantasy is that having the 'insight' to recognize the awesome potential of the magical new approach is somehow superior to having the discipline to *fully* understand the problem and solving it decisively and intelligently. The latter is often viewed as not worth the effort or offering a "poor return on investment". The delusion is that effort is better spent on looking for a loophole that doesn't require any understanding because the new approach will magically make the hard problem go away so nobody has to expend any real effort. Doing things 'in the cloud' is one of those magic new approaches that substitutes for actually engineering a solution in an informed way.

Even if a new approach reduces the effort previously required for certain tasks, it invariably brings with it new problems that have to be understood in order to avoid being bogged down.

History shows that folks who solve the hard problem wipe the floor with those who are looking for shortcuts. FedEx (solved the logistics problems associated with rapid delivery to anywhere), Southwest Airlines (solved the logistics problems associated with low cost regional air travel), Walmart (developed a satellite network to track inventory and sales chain-wide). Google (a better algorithm for search). Etc.

Re:I read the article (0)

Anonymous Coward | more than 3 years ago | (#32183156)

Most marketeers work in the sales mindset of, "The customer doesn't want to understand the problem. They just want the problem to go away *without* having to understand it. It will cost $xxxx for the problem to go away is all the customer cares about."

Re:I read the article (1)

raddan (519638) | more than 3 years ago | (#32173612)

Scalability is a real property, though. But hardware resources are only a single aspect of scalability. Take the IPv4 address depletion problem. You can throw all the hardware you want to at the problem, but it's not going to budge. That's a problem with the addressing architecture. When IPv6 happens, we still have the router-table growth problem, and that you can throw hardware at, to a degree, although for how much longer, nobody really knows. Moore's Law has kept us ahead of that particular issue.

Essentially, scalability is how well a piece of software performs given an increase in some workload. More users, more nodes, more addresses, more work. The answer to that question is highly application-dependent, and as every experienced programmer knows, not always something that can be solved with more CPU cycles. Being able to solve the problem with more CPU cycles is typically something you have to plan for, meaning that your application is already scalable if you can solve your problem with a cluster. As usual, of course, the business types rarely understand this.

Nice URL (3, Insightful)

Anonymous Coward | more than 3 years ago | (#32173100)

It says ad right there so there isn't any question.

Infinite scalability? (2, Interesting)

Locke2005 (849178) | more than 3 years ago | (#32173144)

Unlike stupidity, computing resources are inherently limited. Which is a good thing... imagine, if it were really unlimited, the huge bill you would get at the end of the month for a runaway task attempting to use every node?

Re:Infinite scalability? (0)

Anonymous Coward | more than 3 years ago | (#32174086)

If both were scaleable I'd ask it to compute Pi to the last digit and wait for the bill.

Re:Infinite scalability? (1)

jd (1658) | more than 3 years ago | (#32175446)

I dunno - I remember a Director of Architecture who could produce infinite fluff. From this, one can extrapolate that you could build a machine that did an infinite amount of nothing useful. It would need to be a quantum computer that existed in every possible state simultaneously, much like said Director in fact.

Re:Infinite scalability? (1)

Locke2005 (849178) | more than 3 years ago | (#32175680)

Nice to hear from you jd. I think I know whom you are talking about. (Isn't he now listed as "Senior Technologist"?) I apologize for getting you thinking about old jobs when we both should just be focusing on moving on. Hope you're doing well and have found a much easier commute.

Re:Infinite scalability? (2, Interesting)

jd (1658) | more than 3 years ago | (#32175876)

Sometimes an old thought can trigger a new line of thinking. For example, it would be difficult to make a 3-CCD camera that's as flat as a modern digital camera, because a decent-sized CCD placed sideways will widen the camera by that amount. The prism would normally be bulky, too. Far as I know, that's the main reason you see this sort of camera on high-end video equipment, not cheap digital cameras. However, I don't see anything there that can't be solved by using a few lenses and mirrors. Since CCDs can do 16bpp, this would not only let you triple the number of pixels but also produce high dynamic range. I've no doubt it's been done, it's too obvious not to have been, but I don't see anything like this in the regular marketplace although I can see no obvious objection. The only place I see anything like it is in the high-end with much larger - and much more expensive - gear.

Re:Infinite scalability? (1)

Locke2005 (849178) | more than 3 years ago | (#32176040)

The only devices I've seen using 3 CCDs are $4000 Sony videocameras. High-quality optics make each lens and mirror very expensive, so most high-end cameras have very simple optical paths (old Hasselblads are now being repurposed as digital cameras [google.com] ). Dealing with RGB is different from directing a laser beam, which has a single frequency. For a laser, I suspect even a hologram could be used as a lensing system. Not so for multi-megapixel RGB cameras.

Re:Infinite scalability? (1)

jd (1658) | more than 3 years ago | (#32176320)

You're right about the optics being the challenging part. It would depend on whether it's cheaper to make errors smaller or lenses/mirrors larger (either will let you reduce the visibility of defects, up to a point), and on how large an angle any defect can be allowed to cover when the image reaches the CCD. To be honest, I haven't the foggiest. And, yes, the very earliest (late 1800s, early 1900s) "colour" photography was done by photographing through three distinct filters and you can therefore do the same today to produce the extra resolution and definition without the enhanced optics. It merely requires you take three photographs (well, six if you want to go all the way and subtract the light with the filter present but shutter closed) and overlay them. Actually, this has the benefit that you can add as many colour planes as you like, so long as the CCD is sensitive to the frequency and you use a monochromatic filter.

(RGB is ok, but the three types of cone don't behave in quite the same way, the visual cortex processes colours by subtracting corresponding pairs, etc. If you capture in more than three colours, perhaps you could tweak the values to eliminate some of the differences between how RGB displays display and how the eye expects things to be.)

tl:dr (3, Funny)

adeft (1805910) | more than 3 years ago | (#32173168)

can someone scale that article down a bit?

"Cloud computing is a shitty lie." (0)

Anonymous Coward | more than 3 years ago | (#32173528)

"Cloud computing is a shitty lie."

Confirmed: The Cloud IS (0)

Anonymous Coward | more than 3 years ago | (#32173172)

a botnet !

Yours In Perm,
Kilgore T.

Of-course it is a checkbox (4, Interesting)

roman_mir (125474) | more than 3 years ago | (#32173190)

Scalability is a buzzword that equipment, databases and servers (hardware/software) are sold on. It is as if by adding more weblogic servers to a cluster really makes your application scalable, as if throwing more processors onto a RAID system gives you more parallel ways to read / write the same data etc.

It is all true to an extent and it is all false where it really matters. Applications need to be designed to be scalable and if I learned anything over the past 16 years is that people do not even begin to understand what it means.

The managers and even many 'architects' really think that by throwing some stupid app on a cluster will really solve the scalability issues and so on. But the problem is that it is a very specific problem that can be solved by simply adding cluster nodes without actually properly designing the app. I blame various silver bullets like EJBs, CORBA, RMI, JNDI, BEA, Oracle, IBM and such for promoting this view among the top brass and pulling attention away from working out correct architecture to solve the specific problems that appear in building truly scalable applications.

Application servers and databases are the worst at this, they certainly provide some specific type of scalability solution but because of that, it is almost expected that it does not matter how an app is designed to interact with these, and the design is really on the distant third, fourth, fifth or further place, way behind the deadlines, the politics, the hiring practices etc.

Scalability is like security, it is not a one specific thing it is a way to approach many different issues and problems and even when you think your app is secure in 5 different ways, there is a sixth way in which it is not. Same with scalability: it is not only about multi-threading requests, it is not only about multiple processors for a RAID system, it is about total understanding of how the application is and will be used and adjusting it for various types of usage. Proper design for scalability mixes various approaches, there could be intermediate steps added, back-ground processing added, intermediary storage, separate storage for reading than for saving, various caching mechanisms and synchronization between nodes in a cluster for different caching questions. This could be redefining an algorithm to be less dependent on reading data from slow media. Some things are not supposed to be done in parallel, so certain bottlenecks due to synchronization need to be looked at and solved early on, because these become the Achilles heel - synchronizing on anything at all can defeat a super-fast cluster and make it no better than as a single laptop.

It is a design issue.

Re:Of-course it is a checkbox (0)

Anonymous Coward | more than 3 years ago | (#32174264)

A friend of mine who is in video would say that this point that aything requiring sequencing -- video processing goes in one direction -- can't be assumed to scale. I guess the same applies to financial stuff where the system is regularly seeking authentications/inputs.

Re:Of-course it is a checkbox (2, Informative)

lmckayjo (532783) | more than 3 years ago | (#32174508)

But many types of video processing DO scale very nicely, as racks and racks of SGI machines proved years ago ("rendering farm" is a beautiful name for computers...). The "flow of time" argument against scaling, which is basically an argument against easy parallelization, works for some things but not others.

Even when the analysis or manipulation of one frame depends heavily on those before it, most video (or audio) work is broken nicely into scenes (or tracks/movements) which can be easily scaled - damn near linearly.

Financial markets work similarly. Yes, there is a very important interdependence, sequentially significant, but only between certain transactions. There may need to be "traffic cops" that don't scale linearly, but other parts of the transactions will scale nicely.

In the limit, nothing that we do will scale efficiently forever (to extremely large OR small), but video processing and financial systems are two examples which seem to scale quite well.

Re:Of-course it is a checkbox (1)

cybrthng (22291) | more than 3 years ago | (#32177658)

My experience with Oracle Grid Computing tells me you don't quite understand the capabilities of their RDBMS/Grid Platforms.

Re:Of-course it is a checkbox (1)

roman_mir (125474) | more than 3 years ago | (#32177852)

My experience with Oracle shills is that they tout Oracle as the only true way. Luckily I am not susceptible to advertising, I look at facts. Fact is that Oracle's grid computing will add no more scalability to any particular application than their earlier clustering approach, though it may help with cutting some costs on probably some hardware and energy, good, that should help to offset the crazy licensing costs. I am setting PostgreSQL everywhere I can, and I use more of an app design approach to solve my scalability concerns.

Here is a fact for you: virtualization will not add more processing power, it is just another silver bullet, an easy way out of designing the applications to scale within their context and relying on someone else to do your job.

I avoid Oracle like a plague that it is, an overprice plague.

I dunno... (4, Funny)

thewils (463314) | more than 3 years ago | (#32173204)

That ash cloud from Eyjafjallajokull seems to be scaling pretty good.

Eyjafjallajokull is the ice cap or glacier (1)

Chirs (87576) | more than 3 years ago | (#32173730)

Eyjafjalla is the volcano

Re:Eyjafjallajokull is the ice cap or glacier (1)

skastrik (971221) | more than 3 years ago | (#32179542)

Eyjafjalla is the volcano

Since we're being pedantic, Eyja*fjöll* is the mountain (welcome to Icelandic). That said, the friendly natives as well speak of the cloud coming from the glacier/jökull.

Re:Eyjafjallajokull is the ice cap or glacier (0)

Anonymous Coward | more than 3 years ago | (#32244286)

Eyjafjalla is the volcano

But the jökull (ice cap) probably contributes to the cloud...

the "Cloud" (1)

AnonymousClown (1788472) | more than 3 years ago | (#32173322)

Reading the TFA, the author kept making references to scaling using the "cloud" without mentioning any particular vendor. I'm thinking Microsoft's Sharepoint was alluded to, but as for as FlightCaster - what are they using? How would they use Sharepoint for that? Or is there a Hardware as Service company they're using?

Re:the "Cloud" (1)

RingDev (879105) | more than 3 years ago | (#32173510)

Sharepoint isn't a cloud, it's a CMS with a whole lot of crap mixed in.

Microsoft's cloud service is called Azure. One of my coworkers was looking at it to host his company's web site and services. The scalability there was actually quite impressive for simple hosting and heavy loads. I don't know the details, but he seemed pretty impressed by it, just not by the cost. It was right on par cost wise as having a dedicated VM with decent resources. The only real difference he was looking at going from a dedicated VM to their cloud was that he could instantly spool up a second (or third, fourth, etc...) instance of his system on the cloud, although he would be getting charged for each of them.

Until they can get the cost to be lower than the TCO of a cheap server, UPS, and business cable line though, I can't see making the jump for small businesses.


Re:the "Cloud" (1)

Lord Ender (156273) | more than 3 years ago | (#32173618)

If you cloudsource everything, you can lay off all your datacenter operations staff. You still need sysadmins, security guys, and coders; but the people who run wires, rack servers, replace faulty disks, manage the SAN, etc. etc. are no longer relevant. You must factor the cost of this staff when comparing TCO.

Re:the "Cloud" (1)

TooMuchToDo (882796) | more than 3 years ago | (#32173996)

What's the cost of having your entire physical infrastructure under someone else's control?

Re:the "Cloud" (1, Insightful)

Anonymous Coward | more than 3 years ago | (#32174714)

A cost that isn't visible in the short term. Thus, it's invisible to the poeple making the decisions.

Re:the "Cloud" (1)

kgwilliam (998911) | more than 3 years ago | (#32174340)

Until they can get the cost to be lower than the TCO of a cheap server, UPS, and business cable line though, I can't see making the jump for small businesses.

Remember that TCO isn't only hardware (server, UPS, cable). You also have to factor in software licenses, physical building, physical building security, network security, HVAC costs, etc. And these are just the easy to calculate costs.

You also have to think about other costs such as procurement (someone has to order the hardware from Dell, receive it at the shipping dock, unbox it, install the server OS on it, handle warranty repairs, etc), network administration, management overhead, load balancing, etc.

And then what about the 'costs' in terms of business opportunity lost when your service is not able to quickly scale to customer demand. There is a time lag between realizing you need more capacity to actually getting that capacity online.

Also what about the costs of having to build your datacenter scale to the max usage? If your site experiences heavy usage 9-5 M-F, but very little usage during evenings and weekends, then you still have to build your datacenter to that peak usage. With cloud computing you can scale it up and down daily depending on your needs.

Looking at the cost of moving a service to the "cloud" is a lot more than just looking at how much a server costs. It won't make sense for all businesses and scenarios, but the more you research it the more it seems to make sense. And noting your use of the term "small businesses", cloud computing really can pay off for a lot of scenarios. Small businesses usually don't have the budget to properly implement all of the functionality that cloud computing offers (security, network management, redundancy, fault domains, load balancing, scalability, etc) so, while a small business can get a cheap server up and running, it may not actually be cheaper in the long run.

Slight difference in scope (1)

RingDev (879105) | more than 3 years ago | (#32174832)

I agree completely. I should have spoken more clearly.

My co-worker's personal small business has 1 employee: him. He is a skilled tech guy who can handle most of the work himself. He had 3 options:

1) Move to the cloud for ~$150/month. supreme uptime, no hassle scaling, everything is managed for him
2) Move to a more robust dedicated virtual machine for ~$150/month. solid up time, scaling available, all network stuff is managed for him
3) Buy a server and business cable line to his residence for ~$750 one time purchase, but $20 less a month than he is currently spending on residential cable. He has to manage everything himself.

If he keeps the server/cable running for 3 years, it pays for itself, assuming he pays himself $150 a month for his own work. But really, his own work is volunteer, so against the opportune cost, he's in the green after 5 months. Even against his current costs he's in the green in under a year.

If his small business grows to the point that his cable line can't handle the bandwidth, or his cheap server needs more umph, or if his time becomes more valuable, the advantage will quickly switch back to the cloud.


Re:the "Cloud" (1)

spatley (191233) | more than 3 years ago | (#32175148)

That is because the essence of his article is that it does not matter what segment of cloud computing you use, if you application is not *designed* to scale, it will not scale. No matter if it was sold to do so or not.

This is that same idea that if you take a single threaded app and put it on an 8 core proc, you will not get any performance boost from the single core. If your data set has to join a trillion rows to a billion rows, you can throw all the parallelism you want at it and you will just have a thousand boxen trying to perform the same join a thousand times and performance will not improve.

On the other hand if you have a single table or name value pairs, you can split work among many many machines and have what used to take a week happen in minutes. But not all problems can be modeled to fit that kind of process, and even fewer of them actually are.

Re:the "Cloud" (1)

Timothy Brownawell (627747) | more than 3 years ago | (#32175556)

If your data set has to join a trillion rows to a billion rows, you can throw all the parallelism you want at it and you will just have a thousand boxen trying to perform the same join a thousand times and performance will not improve.

No, you split your billion row table into a thousand pieces so each piece fits in memory on one of your thousand machines, and then multicast your trillion row table to all thousand machines and have them match the stream against the million rows they have in memory.

cloud computing only scales horizontally (3, Interesting)

Lord Ender (156273) | more than 3 years ago | (#32173330)

The Google App Engine cloud computing offering plans to (eventually) automatically scale your application as much as you need. But that scalability comes at a cost: only key-value stores may be used. Sorry, no relational databases available. JOINs just don't scale. You can distribute data across any number of nodes, but JOINing data which lives on separate computers is not gonna happen.

If you need JOIN-like behavior, your app has to request all the data, then compute the result itself. Trying to write an app for such a system means rearchitecting the data in ways to minimize the need for such operations, even if that means having duplicate data.

It's quite an exercise to unlearn what you have learned about SQL and relational databases, but the use of object mappers can help a lot.

Re:cloud computing only scales horizontally (1)

ducomputergeek (595742) | more than 3 years ago | (#32173478)

I've seen joins scale decently with Teradata. Might not be the best OLTP oriented database, but a great analytical database when you need to do very complex BI Logic searches across large datasets.

Re:cloud computing only scales horizontally (2, Insightful)

LWATCDR (28044) | more than 3 years ago | (#32173642)

So one is going to have to learn a totally different way to do everything and then deal with a new set of problems.
Which is why IBM is still selling ZSystems running DB2 :)
That being said I have not used much in the way of key-value database in a complex application. Frankly it sounds like a real pain.

Re:cloud computing only scales horizontally (0)

Anonymous Coward | more than 3 years ago | (#32174428)

If your JOINs don't scale, your application architect, cluster administrator, and DBA have all failed. Fire them and find ones that know what they are doing.

Either that or you're joining two billion-row tables repeatedly without an index. In either case, something stupid is going on.

Re:cloud computing only scales horizontally (1)

lennier (44736) | more than 3 years ago | (#32178198)

JOINs just don't scale. You can distribute data across any number of nodes, but JOINing data which lives on separate computers is not gonna happen.

If that's the case, then surely we're Doing Something Really Wrong with our implementation of relational theory. Should we perhaps be looking at things like Extended Set Theory [xprogramming.com] instead?

Relational - (and more specifically, SQL, which as Chris Date is at pains to tell every is NOT even a correct let alone good implementation of the relational model - but even Codd's original paper shows signs of this) - came out of a timesharing environment, where it was just assumed as a matter of course that you'd have very large datasets sitting on single giant machines. And out of that environment came the 'database server' concept we're familiar with today.

The problem is that in the age of the Web, more and more our data IS distributed among thousands if not millions of nodes. This is especially obvious in social networking - at the moment, we're very obviously Doing It Wrong by relying on huge centralised mega-stores like Facebook. We should be asking why this is, and how our tools can be improved to stop forcing us toward a centralised model when decentralised is what we clearly need.

It seems to me that in a massively decentralised model, it ought to be easy enough to have multi-dataset JOINs or some equivalent. We just have to implement a way of caching the results and transmitting only the changes, in a store-forward-publish model.

  Ie, a decentralised JOIN (or any other dataset filter) shouldn't be an 'operation' which runs, connects to lots of datasources, transmits/receives huge amounts of information, processes it into a subset, hands it on and then disconnects, throwing away all that data. That model is still harking back to the old batch processing days of mainframes. Instead, a JOIN should be a sort of object: you create a new derived dataset entity which connects ONCE to all its sources, grabs what it needs, then CACHES its data locally so it doesn't consume bandwidth.

This would let us create a truly decentralised 'web of data' - a bit like the old Usenet model, but for data not next - where you subscribe to datasets, receive just the updates, do whatever filtering or processing you want on the updates as they come in, and present your derived view - a bit like a SQL 'View' but far more general and able to do arbitrary Turing-complete processing.

I dunno who's working on this but it seems obvious that we need something like that. Huge centralised 'databases' are just so foreign to the Web model that I don't know why we still force people to use them.

Re:cloud computing only scales horizontally (1)

lonecrow (931585) | more than 3 years ago | (#32178690)

I wasn't thinking that ease of provisioning more resources was the same as scalability.

If for every additional 10 tasks a system is required to do takes an additional 10 units of computing resources that is not "scalability" regardless of how easy it is to procrue those additional resources.

Or perhaps that is an example of an app that scales linearly, and what people really want when they want scalability is a system that scales geometrically?

Re:cloud computing only scales horizontally (1)

Lord Ender (156273) | more than 3 years ago | (#32178988)

Growing your cluster to handle more traffic is certainly considered "scaling" by most, and this is the way most cloud-computing services do things. I refer to this sort of scaling as horizontal, whereas adding RAM or CPU power to a single machine would be horizontal scaling. Please correct me if you know of a better term...

Re:cloud computing only scales horizontally (1)

lonecrow (931585) | more than 3 years ago | (#32179192)

OK Sure. I guess my point was that you may be able to get more output from a crappy system simply by throwing more resources at it. But I wouldn't call that a system that was "designed to scale". There are so many things a developer can do to identify key bottlenecks so that only portions of the app need to scale linerally, but the overall app would scale non-linerally.

I am in the process of migrating my servers to EC2 so I am a big fan of cloud computing. I am intrigued by app engine but it would mean re-writting apps from scratch so its a non-starter for the time being.

Sidebar tocloud computing only scales horizontally (2, Interesting)

davecb (6526) | more than 3 years ago | (#32180796)

A minor niggle to a correct thesis: clouds are indeed horizontal creatures, like lichens (:-)) Joins, however, can be decomposed into a horizontally scalable component that runs on many nodes to return a small candidate set and a vertical component that puts together the candidates and returns the valid ones as a join. This is what the Oracle Teradata (sp?) machine does, making TP substantially more scalable. The bottleneck in this scheme is the backplane: it requires Linux hyperchannel to achieve the expected performance boost. --dave

Hand wave (5, Funny)

Itninja (937614) | more than 3 years ago | (#32173560)

I find that when I speak in my "IT Expert Voice" I get all kinds of things. Even if I am saying gibberish:

"Linda. The malware infecting you CRT is several beta tests behind the best practice of current IPv6 drives. I will need your password to defrag the driver and upload the taskbar to your certification path...Thank you Linda."

Re:Hand wave (1, Funny)

Anonymous Coward | more than 3 years ago | (#32174306)

God, only a true nerd would say that. Here, let me show you how this is done.

"Linda. ... I will need you to drink this bottle of Scotch and hop in the hot tub while I defrag the driver and upload the taskbar to your certification path. I will come there when it is done...Thank you Linda."

Re:Hand wave (1)

BitZtream (692029) | more than 3 years ago | (#32179408)

Have you seen Linda? I think you'd be better off drinking the bottle of Scotch yourself if you plan on joining her in the hot tub.

well .. (1)

satsuke (263225) | more than 3 years ago | (#32173608)

Well, expecting to get more output from the same input is of course illogical and impossible, but if a company puts up the planning, development and engineering resources to make it happen up front than the scalability claims in the marketing copy can be done to some extent.

But the way some (most?) deployments seem to go make it cost prohibitive to put the distributed database / distributed applications and fault tolerant components in in the first place.

Scalability makes no sense on Hardware.... (1)

jameskojiro (705701) | more than 3 years ago | (#32173614)

By the time you need to expand a complete and less expensive system has already supersceded what iron you were originally running it on.

In many cases it is cheaper to replace the hardware than adding more "modules" for your scalable hardware.

Re:Scalability makes no sense on Hardware.... (1)

raddan (519638) | more than 3 years ago | (#32174500)

You're kidding, right? If not, your applications aren't big enough. Do you think Google runs on a computer under somebody's desk?

Re:Scalability makes no sense on Hardware.... (1)

fusiongyro (55524) | more than 3 years ago | (#32175504)

How many businesses are there which have Google's needs? Ten? Twenty?

Re:Scalability makes no sense on Hardware.... (1)

raddan (519638) | more than 3 years ago | (#32178046)

Kind of a lame argument. You don't have to be Google to require scalability. All you need is a workload that may grow or diminish faster than you can throw a fast computer at it. Don't know if you've ever heard of a website... turns out these things often run on load-balanced clusters nowadays...

The point is not how many (and the answer to your question is actually "thousands"), but that there is a legitimate need for scalable computer architectures. Scientists need them, design firms need them, video production houses need them, high-density data centers need them, web hosts need them, and so on. Low-cost, commodity hardware makes sense in a lot of places, but when performance and high availability are really important, you need architectures to support it. There is indeed a point at which the benefits of unusual configurations outweigh the cost.

Synopsis: ZOMG BOTTALNEXZ!!!!1!! (1)

Rogerborg (306625) | more than 3 years ago | (#32173724)

You may now all skip even pretending to read the article and do what you do best: use a car analogy to explain why duplicating kiddie porn isn't theft, unless the Government does it.

Re:Synopsis: ZOMG BOTTALNEXZ!!!!1!! (0)

Anonymous Coward | more than 3 years ago | (#32174584)

Scalability is like when an unexperienced user works on a car with a hood welded shut that shares the same lock on the house (or the same thumb print) with the window blinds pulled up, letting a stranger from a public street take pictures of your home while his friend walks through your yards turning on all the exterior taps and tripping your GFI by splashing water on an exterior plug.

MS Word is infinitely scalable! (1)

Old97 (1341297) | more than 3 years ago | (#32174506)

Yes, indeed! I can run copies on as many desktops as I care to. Just add monkeys and ta dah - Shakespeare!

Re:MS Word is infinitely scalable! (1)

gstoddart (321705) | more than 3 years ago | (#32185524)

Yes, indeed! I can run copies on as many desktops as I care to. Just add monkeys and ta dah - Shakespeare!

So far, mostly just Slashdot. Shakespeare seems to be in the offing yet.

Check for New Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Sign up for Slashdot Newsletters
Create a Slashdot Account