Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Horizontal Scaling of SQL Databases?

timothy posted more than 3 years ago | from the side-to-side dept.

Databases 222

still_sick writes "I'm currently responsible for operations at a software-as-a-service startup, and we're increasingly hitting limitations in what we can do with relational databases. We've been looking at various NoSQL stores and I've been following Adrian Cockcroft's blog at Netflix which compares the various options. I was intrigued by the most recent entry, about Translattice, which purports to provide many of the same scaling advantages for SQL databases. Is this even possible given the CAP theorem? Is anyone using a system like this in production?"

cancel ×

222 comments

Sorry! There are no comments related to the filter you selected.

XML (2, Funny)

Anonymous Coward | more than 3 years ago | (#34273440)

Just store everything in a big XML file.

Re:XML (5, Funny)

Anonymous Coward | more than 3 years ago | (#34273496)

XXXML

Re:XML (1)

drouil11 (1941690) | more than 3 years ago | (#34273656)

Have you tried big and tall?

Re:XML (0)

icebraining (1313345) | more than 3 years ago | (#34273502)

Accessed by a Samba share.

Re:XML (1)

word_virus (838778) | more than 3 years ago | (#34273618)

Sounds like someone's been watching my screencasts.

Re:XML Go Diagonal (0)

Anonymous Coward | more than 3 years ago | (#34273530)

After that go diagonal, you get a preemptive database which can guess your sql needs.

Re:XML Go Diagonal (1)

JustOK (667959) | more than 3 years ago | (#34273928)

If it was really good, it would create itself, if it hasn't already.

Have you tried Perl? (1)

goombah99 (560566) | more than 3 years ago | (#34273972)

Perl seems to work well for me. You may want to try it.

First! (-1, Troll)

Anonymous Coward | more than 3 years ago | (#34273474)

First!

What limitations are you running into? (5, Insightful)

Anonymous Coward | more than 3 years ago | (#34273524)

It would be a lot easier to talk about solutions if you said which limitations you run into.

Is your dataset to large (large tables), are you having to much joins, too many transactions per second? In short, what is the problem we're trying to solve here?

Re:What limitations are you running into? (5, Interesting)

Anonymous Coward | more than 3 years ago | (#34273660)

It would be a lot easier to talk about solutions if you said which limitations you run into.

Is your dataset to large (large tables), are you having to much joins, too many transactions per second? In short, what is the problem we're trying to solve here?

My money is on "No one here likes SQL" and "There aren't any exports on RDBMs to help us get things set up properly".

Re:What limitations are you running into? (4, Insightful)

DarkOx (621550) | more than 3 years ago | (#34274878)

I would have to agree, its really hard to imagine a "start up" can't make anything work on traditional SQL RDBMS(es). If you put the right hardware underneath it even SQL Server 2000 (64bit anyway) will scale just fine to terabyte size databases at thousands of transactions per second. That is not on impossible hardware for a successful start to buy either, we are talking a dedicated storage controller with gigabyte or so cache and few dozen SAS drives. I know I have worked on such projects.

You need the schema right, and if its more reads than writes you might even de-normalize a little and you will need to partition the data appropriately, but it can be done. This is why realDBAs still make the big bucks. There is a lot to know in that domain. You probably should hire someone who is an expert on whatever stuff you are using now to consult before you go down the path of NOSQL. All you told us is you are a growing start up with is not much to go on but without know what you are doing its hard for me to believe you are doing anything on a scale that can't be done well with a relational database; but maybe I am wrong and maybe you are doing something huge. Remember as soon as you go down the NOSQL path you are going to have to be doing a great deal of heavy lifting because the quantity of libraries and off the shelf stuff out there is not great.

Re:What limitations are you running into? (-1, Flamebait)

Anonymous Coward | more than 3 years ago | (#34273684)

As far as I can see, the main problem you have to solve is your inability to correctly choose between "to" and "too". Judging by how often you got it right, I guess you relied on a coin-toss each time it came up.

Relational stuff scales (5, Insightful)

Anonymous Coward | more than 3 years ago | (#34273586)

Learn partitioning principles, get a database product that does partitioning properly, learn normalization, never worry again about not being able to scale with relational databases. It just requires some real skills but relational databases really do scale all the way up.

Re:Relational stuff scales (2, Interesting)

ani23 (899493) | more than 3 years ago | (#34273712)

Partitioning does complicate backups and HA/DR scenarios as the entire system is dependent on all machines being up and running. Also in most commercial db's (I know about db2) this feature takes you to the enterprise tier of software which is usually very expensive.

Re:Relational stuff scales (4, Informative)

h4rr4r (612664) | more than 3 years ago | (#34273782)

Postgres seems to not charge extra for that.

Re:Relational stuff scales (1)

TooMuchToDo (882796) | more than 3 years ago | (#34275312)

OH SNAP

Re:Relational stuff scales (-1)

Anonymous Coward | more than 3 years ago | (#34274026)

Yeah, only hipsters use NoSQL, while in fact SQL IS the silver bullet! Why do we need new solutions, when EVERYTHING was solved decades along!! Sheesh!

Consider scaling via other layers? (3, Interesting)

mlts (1038732) | more than 3 years ago | (#34273608)

Another idea is to scale using other layers, if there are problems at the SQL server level.

At the lower areas, one can go with a mainframe (parallel sysplex) and have geographically separate pieces of hardware acting coherently.

At the higher layers, have the app use multiple SQL servers and handle the redundancy in this layer.

Re:Consider scaling via other layers? (0)

Anonymous Coward | more than 3 years ago | (#34273676)

Also I assume if your looking at stuff like nosql you already did the obvious like, using an in-memory caching system with tons of memory.

Re:Consider scaling via other layers? (1)

mlts (1038732) | more than 3 years ago | (#34273958)

That is what mainframes are for. Yes, the technology is old and not exciting, but one of the strong points of mainframes is I/O, which is critical to most database architectures.

Call me skeptical (5, Insightful)

Kjella (173770) | more than 3 years ago | (#34273674)

Call me skeptical but there are companies out there with massive amounts of data in relational databases, if you as a setup are "constantly hitting limitations" you're either a very odd startup or using it very wrong. As long as the volume is small you can make almost anything happen on SQL. Hell, most small business I've known run mostly on Excel. Somehow I don't see a startup needing NoSQL unless they specialized in processing huge amounts of data, in which case trying to make slashdot work on your core business seems stupid. But maybe I missed something...

Re:Call me skeptical (1)

ani23 (899493) | more than 3 years ago | (#34273794)

We have a winner. No more discussions on this topic. mmmkay sigh . . . . .

Re:Call me skeptical (1)

doroshjt (1044472) | more than 3 years ago | (#34273806)

My Space uses SQL Server, so unless this start up is bigger then myspace, I think they are just doing it wrong.

Re:Call me skeptical (2, Funny)

PRMan (959735) | more than 3 years ago | (#34274028)

MySpace is also slower than maple syrup in January.

Re:Call me skeptical (2, Insightful)

nschubach (922175) | more than 3 years ago | (#34274256)

It's rather fast now that nobody uses it anymore.

(sorry, I couldn't resist.)

Re:Call me skeptical (1)

thetoadwarrior (1268702) | more than 3 years ago | (#34274610)

Have you seen how "fast" MySpace is? It's certainly no Google.

Re:Call me skeptical (5, Funny)

Squeebee (719115) | more than 3 years ago | (#34273828)

Agreed, we have massive sites serving millions of requests a day using Open Source relational databases and yet it seems everyone wants to use NoSQL because it's the hip new thing.

Naturally I start thinking of this: http://xtranormal.com/watch/6995033 [xtranormal.com]

Re:Call me skeptical (1)

suso (153703) | more than 3 years ago | (#34274094)

Naturally I start thinking of this: http://xtranormal.com/watch/6995033 [xtranormal.com]

Thank you for posting that. I'm so sick of the NoSQL shit. Learn to design schemas.

Re:Call me skeptical (1)

nschubach (922175) | more than 3 years ago | (#34274350)

Are you sick of the NoSQL talk because you know specialize in SQL and feel as if it's a competitor, because it's gained a lot of attention recently and happens to be talked about more than SQL, or is there some other reason for the sick feeling?

(I do very light SQL development and have not touched a "NoSQL" solution, but I do not find myself sickened by people investigating alternatives.)

Re:Call me skeptical (1)

gfody (514448) | more than 3 years ago | (#34274972)

It's a sickening display of ignorance coming from people who are supposed to be professionals. Nobody takes issue with people investigating alternatives to SQL but SQL has come under heavy fire by NoSQL proponents and yes one can become very sick of hearing the same old fallacious arguments again and again.

Re:Call me skeptical (1)

cratermoon (765155) | more than 3 years ago | (#34275192)

Could you be specific about which fallacious arguments you have in mind? Preferably, cite 3 different fallacies with multiple sources for each one.

Re:Call me skeptical (0)

Anonymous Coward | more than 3 years ago | (#34275014)

I think it's more that there's so much hype for something that mostly just compensates for user incompetence at the expense of real world usage. I know I'm sick of hearing about it. If I had a legitimate use for it, I'd use it, but I certainly don't want to hear any more blind evangelism for it. NoSQL is a tool. Use it if it works for you and shut up about it, 'cause I don't want to hear you brag about your lack of RDBMS experience.

Re:Call me skeptical (4, Informative)

vadim_t (324782) | more than 3 years ago | (#34275154)

A lot of people don't understand how a database really works, so they do it horribly wrong. As a result, it's dreadfully slow. So they go and use some key/value lookup system because "they're fast". There you often get one of two things:

They still don't understand the problem, so they recreate it yet again. If you don't understand what's wrong with reading an entire table with a million records, and discarding all but 5 of them client-side, then replacing the SQL DB with a key/value system just isn't going to make things better.

Or, they improve performance, but since they don't understand what ACID is for, they eventually end up with weird inconsistencies. In some cases this might be acceptable, but you really don't want to see it happening in an order tracking system.

The sickening feeling people get is not because it's a competitor. In a large part it isn't a competitor, but a different class of system with different tradeoffs. The sickening feeling comes from seeing people not understand what they're doing, and then run towards the latest technology because it's what $BIG_COMPANY uses without understanding it any better, and generally making an even bigger mess.

The performance of specialized solutions like key/value systems doesn't come from magic. They're not really new, and don't use anything very groundbreaking. They simply use different tradeoffs at the cost of sacrificing quite a lot of what is present in a RDBMS. It's important to understand first whether you can really afford to discard those things, because if you can't, it's either not going to work right, or you'll have to graft all that you removed on top of it anyway.

Re:Call me skeptical (0)

mini me (132455) | more than 3 years ago | (#34274388)

The big boys are using NoSQL databases because it is easier (read: cheaper) to scale than relational (SQL) databases. That does not mean relational databases cannot scale.

The small startups are using NoSQL because there is, more and more, a push in the web app market to store data which does not fit into any schema. Several of the NoSQL options are very good at handling that kind of data, SQL, not so much.

Yes, there are a handful of people that think they need a Facebook-scale database before they have even released their project into the world, but they are the exception. Most people are using NoSQL databases because they are a better fit for the job than a traditional SQL database. Those people will not argue that SQL cannot do the job, just that a NoSQL database can do the job better.

Re:Call me skeptical (0)

Anonymous Coward | more than 3 years ago | (#34274950)

I love it when "big boys" try to pinch pennies by using NoSQL. As soon as they try to do more than small scale testing (megabyte sized dbs) they quickly find performance and concurrency going down the drain. Then they spend 10x the amount of money they thought they saved trying to get their cheapo system up to a usuable speed running giga-terabyte dbs but it is always poor (if it works at all). They end up scrapping the whole mess and buying an RDB anyway.

Re:Call me skeptical (1)

ADRA (37398) | more than 3 years ago | (#34274494)

Wow, that was a great video. Thanks for the link.

Re:Call me skeptical (1)

Brian Quinlan (252202) | more than 3 years ago | (#34274954)

Agreed, we have massive sites serving millions of requests a day using Open Source relational databases and yet it seems everyone wants to use NoSQL because it's the hip new thing.

Naturally I start thinking of this: http://xtranormal.com/watch/6995033 [xtranormal.com]

A million requests per day translates to 11.5 requests per second. That's a pretty trivial amount of traffic. A massive site like Facebook is probably serving about 4 orders-of-magnitude more requests that that.

Re:Call me skeptical (2, Funny)

Squeebee (719115) | more than 3 years ago | (#34275166)

Would you have preferred I have said bazillions?

Re:Call me skeptical (1, Insightful)

MatthiasF (1853064) | more than 3 years ago | (#34273832)

In Cloud scenarios, a distributed relational database is cumbersome or even impossible to maintain. Hence why lots of web companies have moved over to NoSQL solutions tailored to their processes.

So, you're describing centralized, local databases whereas the OP is focusing on decentralized, cloud databases.

Re:Call me skeptical (3, Interesting)

craftycoder (1851452) | more than 3 years ago | (#34273862)

My thoughts exactly. I have a couple 100 GB in a MsSQL database with extensive normalization and it is lightning fast. It's all about indexes and appropriate design.

Re:Call me skeptical (0)

Anonymous Coward | more than 3 years ago | (#34275082)

"lightning fast"? How many Libraries of Congress per Jigawatt is that?

Re:Call me skeptical (2, Insightful)

Ruke (857276) | more than 3 years ago | (#34273932)

I think the real problem is that people are seeing inconsistencies in their growing systems, and looking to grow to a system that doesn't have inconsistencies. Which is basically impossible. It's not that the big players don't ever have inconsistent data - Amazon's Dynamo relies on reaching a quorum, rather than a totally consistent state. Rather, the big players have a much better idea of exactly how inconsistent their data can be, while still giving their system good performance.

Re:Call me skeptical (4, Insightful)

RobertM1968 (951074) | more than 3 years ago | (#34273954)

Agreed... the biggest limitation I see with SQL (My, DB2, Postgres anyway... found plenty in MS) are people who don't know how to lay out a database, people who don't know how to install and configure the server daemon(s), people who have no idea how to properly select appropriate hardware, and people who don't know how the heck to do a query (as a for instance, I worked on some code done by someone else, where on massive records, they were always selecting "*" instead of the needed or anticipated values. Big waste when one needs (by ID#) last and first name and selects a whole row instead - then wonders why it's not scaling upwards).

Re:Call me skeptical (5, Funny)

Cylix (55374) | more than 3 years ago | (#34274118)

I just select * from * and then sort it out with grep and cut.

Re:Call me skeptical (1)

Hoi Polloi (522990) | more than 3 years ago | (#34275060)

Good thing all my db's are massive, flat text files.

Re:Call me skeptical (1, Insightful)

Anonymous Coward | more than 3 years ago | (#34274170)

The real problem is scale. Any SQL-DB server will cope with most application fine, but add live data on a public facing site with a decent volume of users, and they're crawl to a slow death. This is why non-crucial sites use denomenilization and do the dastardly deed of data duplication to speed up their bad query and suspect table design.

So what's the solution? Very large and expensive boxes for the simplest method (no one likes this these days), and then lots of boxes performing certain tasks. Which has its own huge costs, because skilled people in this field are very few and far between, and are already working for Yahoo, Google and Fartybook.

Re:Call me skeptical (1)

nschubach (922175) | more than 3 years ago | (#34274464)

Is it people that don't know how to lay out a database or that you need to know how to lay out a database so it does fit with their need?

I see a lot of hate around alternatives to SQL and most of them blame the design of data retention rather than accepting that there may be another way to achieve what is needed. It sounds to me like people trying to justify their job (which may not be necessary under a different model that doesn't need someone to "design" anything.)

Honest question there...

Re:Call me skeptical (0)

Anonymous Coward | more than 3 years ago | (#34274892)

Sometimes it's the difference between what works and whats better then what is possible. A lot of things are possible, it doesn't mean they're actually a good way to do anything. That said I know pretty much nothing about NoSQL so I can't add anything more useful then my previous warning.

Re:Call me skeptical (1)

hedpe2003 (1735078) | more than 3 years ago | (#34274024)

... Somehow I don't see a startup needing NoSQL unless they specialized in processing huge amounts of data...

Can we pretend that he does - and actually offer some useful information? Thanks

From the guy who knows nothing about NoSQL other than what wikipedia just provided him.

Re:Call me skeptical (0)

Anonymous Coward | more than 3 years ago | (#34274428)

How is telling you that you are looking up the wrong tree not useful information about NoSQL? The reality is you really don't need NoSQL solutions unless you're an internet giant, and somebody setting up a NoSQL environment is going to hurt their performance more than help it and just generally make things harder for themselves. If you need NoSQL, you know you need it and you probably have the talent to implement it.

Re:Call me skeptical (0)

Anonymous Coward | more than 3 years ago | (#34274640)

NoSQL is about sacrificing different parts of ACID for speed. It really is a matter of which one are you ok with. If you do not want to give up any of the letters then you are stuck with SQL.

However, the dude hit all the 'your doing it wrong buttons'. Startup, post on slashdot, jumping to nosql buzzword. It could be anything from a badly written application, to a bad index, to badly laid out data. There are literally hundreds of ways to grind a SQL server to a halt. There are also just as many fixes for those bad things.

What it comes down to is many times people are afraid to test large datasets. 'oh that will never happen' is the rallying cry. So they do not even try realistic scenarios. Just 'small' data sets. So all the interfaces 'work' but the data storage/retrieval is horrid.

DB optimization is fairly straight forward. But if you want it done 'easy' it usually costs 150 an hour and you hire a contractor that does it for a living.

Also in this case I would say he is trolling around for a 'swap out' that is just 'faster'. NoSQL does not get you that if your application is acting crappy. It is a matter of stop thinking and LOOK at what the thing is doing. But many times devs like to speculate what is wrong. He is actually looking for a justification at a rewrite of whatever he works on. But wants a swap out. Not going to happen. He would really be better of looking at why things are not performing. And if he is unwilling to do that then NoSQL is not going to get him what he wants either.

Re:Call me skeptical (2, Insightful)

Anonymous Coward | more than 3 years ago | (#34274272)

I'm currently responsible for operations at a software-as-a-service startup, and we're increasingly hitting limitations in what we can do with relational databases...

Call me skeptical but there are companies out there with massive amounts of data in relational databases, if you as a setup are "constantly hitting limitations" you're either a very odd startup or using it very wrong.

Agreed. My knee-jerk response once I saw the sentence in the article's summary was "No, you're not [hitting the limititations in what we can do with relational databases]. You're hitting the limits of what you know about performance tuning and scalability with the relational databases you have.

NoSQL, BigTable, and Cassandra are designed for extremely fast key-value pair lookups over enormous datasets (as one poster puts it, > exabyte-sized.) With these solutions alone, you lose:

a) ACID
b) FK relations/semantic modeling

which is huge. (If you don't know why losing ACID and FK relations is such a bad thing, you might as well stop here, hit the library for a good database textbook, read and understand it, then come back in 3-6 months and rephrase your question.)

If you *really* have > exabyte-sized data in a table or two and you really are hitting the limits of what current RDBMS engines can provide (and if you haven't looked at DB2 or Oracle, maybe you should - their optimizers are better than Postgres or (laugh) MySQL), you'd probably want to work around (a) and (b) by using some sort of enterprise transaction management system (e.g. JTA if you're using Java EE), then incorporate the tables you need into NoSQL, Cassandra, or BigTable by providing middleware to interface with these hash stores that provides support for two-phase distributed commit and fakes the FK relationship to cross datastore boundaries.

And if you think that doesn't sound too bad, think again: what I just described is a HUGE undertaking. Are you really sure you haven't exhausted all other options to stick with proven database technology that performs well up to exceptionally large-sized datasets? Maybe it's time to hire, you know, a real DBA - this type of analysis is what they get paid the big bucks for.

Re:Call me skeptical (0)

Anonymous Coward | more than 3 years ago | (#34274796)

Data consistency will always be the anchor on any system. Unless you don't care about data reliability and accuracy you'll always need systems to be either centralized or constantly in synch which means locking, etc.

CAP is fine (0)

Anonymous Coward | more than 3 years ago | (#34273766)

Translattice is not consistent... it is eventually consistent ...

Is it a technical or a budget problem? (4, Insightful)

ducomputergeek (595742) | more than 3 years ago | (#34273772)

Given my past 12 years between working at consultancies and start ups, I've seen this a few times. It's usually not a technical hurdle, it's a "We can't solve this problem within our budget" problem. Either by going out and hiring someone who is an expert at performance tuning with their DB of choice or moving from certain db's to real databases that could handle the work like MSSQL, DB2, Oracle, or in some cases Teradata if dealing with Data warehousing.

Because I've worked around some very large database installs in my day. Every time the scaling question/problem came up, it was solvable with RDBMS's, but the solution wasn't cheap.

Re:Is it a technical or a budget problem? (1)

Cylix (55374) | more than 3 years ago | (#34274160)

There are a few other players in the field next to teradata, but when you move to that format there is nothing that would be associated with the word cheap.

However, generally when it gets to that level of field the amount of data in storage usually makes it very obvious.

In some scenarios, we have avoided going to those rather massive solutions by really digging down and seeing if we really needed to store everything.

Re:Is it a technical or a budget problem? (3, Interesting)

PRMan (959735) | more than 3 years ago | (#34274372)

My experience is that there is a lot you can do that is very cheap.

One time, I walked into a mortgage company (I'm a developer, not a DBA) and they were complaining that they couldn't run a required government report breaking down their fee codes because it would time out after 2 minutes. The table had millions of records. I looked at the table and immediately noticed that they didn't have an index on fee code, which the report was trying to sort and total by. I told the manager that I would add an index on the fee code column after hours and run the report. He wasn't sure it would work so he said, "Go ahead and add it now."

I added the index (which took about 30 seconds) and ran the report again. It finished in 45 seconds.

I looked at the report. Whoever wrote it for them was concatenating strings all over the place. Millions of them. I switched the app to StringBuilder using a search-and-replace.

I ran the report again. 8 seconds. In less than an hour I took a report that wasn't finishing in 2 minutes down to 8 seconds. That wasn't expensive for them and it wasn't hard to do.

At another client, they were complaining about database slowness and the DBA wasn't having much luck fixing it. They fired him and asked me to look at it. I simply recorded a profiler log (a little slower for that day, but it's already dog slow so who would notice), found the longest duration and most common queries and then searched the source code repository and rewrote them. Many of these queries were cross-joins, missing indexes on the joined field or other really obvious problems. One was doing a data conversion on every record instead of data converting the passed in input once. It took me about 2-3 days to solve massive slowness problems. At the end, the employees were saying, "I'm glad they finally bought a new database server." This was at one of the country's largest mortgage companies with tens of millions of records in the database. And the fixes should have been brain-dead obvious to anyone with a few years of SQL experience.

Re:Is it a technical or a budget problem? (2, Insightful)

Hoi Polloi (522990) | more than 3 years ago | (#34275190)

I wish most tuning efforts only required fixing glaring index issues. You eventually find yourself dealing with large dbs with all the basic tuning done and now they want to get app X to return in 8 secs instead of 10. Then you go down the rabbit hole of initialization params, hints, etc. Sadly design considerations are almost always off the plate at this point.

you're doing something wrong (4, Insightful)

Surt (22457) | more than 3 years ago | (#34273790)

"I'm currently responsible for operations at a software-as-a-service startup, and we're increasingly hitting limitations in what we can do with relational databases. "

Relational databases scale to pretty amazing heights. The notion that you are hitting some limit of relational databases at a startup stretches the imagination. I mean, really, you've already hit exabyte data sizes? That's typically where relational starts to struggle.

You really need to define your problem with much greater specificity to get a valuable answer.

Re:you're doing something wrong (2, Insightful)

Stradenko (160417) | more than 3 years ago | (#34273940)

Relational databases scale to pretty amazing heights

Horizontally?

Re:you're doing something wrong (1)

C_Kode (102755) | more than 3 years ago | (#34274076)

Some people use sharding to scale horizontally.

Re:you're doing something wrong (3, Informative)

mlyle (148697) | more than 3 years ago | (#34274382)

And that's what Translattice does, actually: for the database part of the system, we transparently shard large tables behind the scenes, and figure out how to store it to the computing resources available taking into account historical usage patterns and administrators' policies on how data must be stored (for redundancy and compliance purposes). A different population of nodes is used to store each shard and the redundancy is effectively loosely coupled, so when a failure or partition occurs, the work involved in re-establishing redundancy is fairly shared over all nodes. This provides linear scalability for many workloads and better redundancy properties, and can also as a side benefit position data closer to where it's consumed.

When it comes time to access the data, the query planner in our database figures out how to efficiently dispatch the query to the minimal necessary population of nodes, introducing map and reduce steps to provide for data reduction and efficient execution.

All of the table storage is directly attached to the nodes, eliminating much of the need for a storage area network and scaling beyond where shared-disk database clusters can go.

Re:you're doing something wrong (3, Insightful)

camperdave (969942) | more than 3 years ago | (#34274062)

Relational databases scale to pretty amazing heights. The notion that you are hitting some limit of relational databases at a startup stretches the imagination. I mean, really, you've already hit exabyte data sizes? That's typically where relational starts to struggle.

You really need to define your problem with much greater specificity to get a valuable answer.

Given that the title of the story is "Horizontal Scaling of SQL Databases?" the notion that that relational databases are able to scale to pretty amazing heights is irrelevant.

You really need to define your problem with much greater specificity to get a valuable answer.

That's definitely true. It may be, in fact, that an RDBMS is not what is needed at all.

Re:you're doing something wrong (1, Funny)

Anonymous Coward | more than 3 years ago | (#34275054)

You really need to define your problem with much greater specificity to get a valuable answer.

The real problem is he lied on his resume, has no idea what he's really talking about, and now they're asking about it at his job...

Re:you're doing something wrong (1)

Civil_Disobedient (261825) | more than 3 years ago | (#34275432)

You really need to define your problem with much greater specificity to get a valuable answer.

The OP said they were using NoSQL. That alone explains everything.

Solution (to the OP, not the parent who clearly understands what they're talking about): go learn how to use relational databases properly. Normalize your data. Nine times out of ten, if you're repeating information in multiple tables, you're doing something wrong. DO NOT USE BUSINESS KEYS. Surrogate keys only. Why? Because you do not own a crystal ball.

Every thought of AppEngine (0)

Anonymous Coward | more than 3 years ago | (#34273808)

Let Google worry about it. Pricing is stupid cheap.

Look at your code (0)

Anonymous Coward | more than 3 years ago | (#34273826)

I used to work at a managed service provider and we often had clients complain that the SQL Server was slow or did not scale. 99 times out of 100 the issue was that their code was horribly inefficient. Either it was eating up connections or executing inefficient queries thousands of times more than necessary.

It's often hard to convince the developers that their code is bad, but if you do some profiling, capture the most frequent queries, and show them the results, that may help.

If in fact the code is behaving and you are still having trouble scaling, here are a few hints:
1. See if there is some caching that you can do on the application tier
2. Reorganize and index your data structure to optimize for the queries that you find are inefficient
3. Separate the database logically onto separate servers.

What company? (0, Flamebait)

MeanMF (631837) | more than 3 years ago | (#34273944)

Please post the name of your company so we can learn more about what kind of data you're storing and what kind of issues you are seeing. And so we can avoid using your services until you hire somebody competent. Thanks.

Re:What company? (3, Insightful)

jlusk4 (2831) | more than 3 years ago | (#34274454)

Geez, you guys. There's a real person behind the question. Do you HAVE to be an asshole?

Wow (5, Informative)

mlyle (148697) | more than 3 years ago | (#34273960)

I didn't expect we'd be on Slashdot just yet. I'm Michael Lyle, CTO and cofounder of Translattice.

With regards to the original submitter's question, we'd love to talk to him. How much we can help, of course, depends on the specific scenario he's hitting.

What we've built is an application platform constituted from identical nodes, each containing a geographically decentralized relational database, a distributed (J2EE compatible) application container, and distributed load balancing and management capabilities. Massive relational data is transparently sharded behind the scenes and assigned redundantly to the computing resources in the cluster, and a distributed consensus protocol keeps all of the transactions in flight coherent and provides ACID guarantees. In essence, we allow existing enterprise applications to scale out horizontally while keeping the benefits of the existing programming model for transactional applications, by letting computing resources from throughout an organization combine to run enterprise workloads.

Current stacks are really complicated, multi-vendor, and require extensive integration/custom engineering for each application install. We're striving to create a world where massively performing infrastructure can be built from identical pieces.

Re:Wow (4, Insightful)

Cylix (55374) | more than 3 years ago | (#34274194)

He posted to slashdot.... do you really think he can afford you?

Lyle Can Do Anything Better Than You (0)

Anonymous Coward | more than 3 years ago | (#34274268)

http://thedailywtf.com/Articles/Lyle-Can-Do-Anything-Better-Than-You.aspx

*SCNR*, don't take it personally :)

Re:Wow (-1, Flamebait)

Nadaka (224565) | more than 3 years ago | (#34274370)

Where is my -1 obvious marketing shill for both this guy and the original poster?

Re:Wow (1)

aclarke (307017) | more than 3 years ago | (#34274576)

I think it's hiding behind the giant "I think everything is a conspiracy" badge I just awarded you.

Re:Wow (1)

joib (70841) | more than 3 years ago | (#34274414)

So you're claiming ACID; IOW you are saying your system provides consistency as per the definition used in CAP?

How do you deal with network partitions? That is, per the CAP theorem, if you have C, is your system CA or CP?

Thanks,

Re:Wow (1)

joib (70841) | more than 3 years ago | (#34274498)

Replying to myself, TFA contains some info about this. Hey, this is slashdot, who has time to read TFA?

Re:Wow (4, Interesting)

mlyle (148697) | more than 3 years ago | (#34274656)

The short answer is, CA/CP/AP on a transaction-by-transaction basis depending on application requirements. Also of note: network delay is effectively a special "partition", requiring an engine that can have massive workloads in flight and reconcile/order non-commutative changesets in a distributed fashion.

Re:Wow (2, Interesting)

Crimey McBiggles (705157) | more than 3 years ago | (#34274518)

The problem with identical pieces, is that in order for them to be interoperable among myriads of applications, they must be very small, and there must be a great number of them. Not one business operates in a manner that is identical to another. If relational databases aren't solving the problem, it is more than likely due to poor data structure. The main difference that NoSQL provides in terms of what is exposed to a novice database administrator, is that NoSQL promotes key-value pairs. This is no different than what exists in a relational database, except that in RDBMS the admin is allowed and often compelled to create tables with multiple fields. More tables with fewer fields is the solution in either case.

Re:Wow (5, Funny)

Squeebee (719115) | more than 3 years ago | (#34274622)

Congratulations, you just won Slashdot's buzzword bingo, please collect your prize at the cashier window in the back of the hall.

Justification for new toys? (5, Insightful)

StuartHankins (1020819) | more than 3 years ago | (#34273978)

The post is so vaguely worded, I imagine the author is merely trying to find some justification to purchase some new toys. "See, Slashdot people think this is a good idea!"

I agree with most of the posts so far -- if you're truly hitting a limit, you are most likely doing something wrong. Hire an outside DBA to make recommendations if you don't have the resources in-house. I strongly suspect this is the real issue.

Another Sales Pitch Posing As A ( +1, Helpful ) (0)

Anonymous Coward | more than 3 years ago | (#34273984)

story. It's pretty obvous from "I was intrigued by the most recent entry, about Translattice, which purports to provide many of the same scaling advantages for SQL databases."

Yours In Akademgorodok,
Kilgore Trout

Hire a DBA (1)

knavel (1155875) | more than 3 years ago | (#34274048)

Most of the time, when someone says they're having trouble scaling their database, it's a case of a developer that has an incorrectly configured database. Installing MySQL is easy, but configuring it is VERY difficult. That's why you need a person with very specialized knowledge to properly configure a database for efficiency or throughput or whatever you're going for in your specific case.

It would be like saying that anyone can go to a hardware store, buy some lumber, and nail them together to create a rudimentary shelter, but if you want a *house*, something that will weather the elements and keep you warm and comfortable and secure, you need to hire a professional carpenter.

We're using MongoDB in production (1)

josef.salyer (1130615) | more than 3 years ago | (#34274064)

Set up and scaling has been really easy in comparison to similar MySQL clusters I have set up previously.

hbase is an option to NoSQL and Cassandra. (3, Informative)

ooglek (98453) | more than 3 years ago | (#34274068)

I recently read that someone moved their large operation from Cassandra to Hbase, a hadoop file system. http://hbase.apache.org/ [apache.org]

HBase is the Hadoop database. Use it when you need random, realtime read/write access to your Big Data. This project's goal is the hosting of very large tables -- billions of rows X millions of columns -- atop clusters of commodity hardware.

HBase is an open-source, distributed, versioned, column-oriented store modeled after Google' Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, HBase provides Bigtable-like capabilities on top of Hadoop. HBase includes:

Convenient base classes for backing Hadoop MapReduce jobs with HBase tables
Query predicate push down via server side scan and get filters
Optimizations for real time queries
A high performance Thrift gateway
A REST-ful Web service gateway that supports XML, Protobuf, and binary data encoding options
Cascading, hive, and pig source and sink modules
Extensible jruby-based (JIRB) shell
Support for exporting metrics via the Hadoop metrics subsystem to files or Ganglia; or via JMX
HBase 0.20 has greatly improved on its predecessors:
No HBase single point of failure
Rolling restart for configuration changes and minor upgrades
Random access performance on par with open source relational databases such as MySQL

What you should really be doing... (2, Funny)

ADRA (37398) | more than 3 years ago | (#34274200)

Is to write better queries, I mean how hard can it be:

select * from (select * from A,B,C,D,E,F,G WHERE A.ID=B.AID(+) AND B.ID=C.BID(+) AND C.ID=D.CID(+) AND D.ID=E.DID(+) AND E.ID=F.EID(+) AND F.ID=G.FID(+) order by F.name ASC) where F.name='zzzzz'
Everything will work out, I swear.

Re:What you should really be doing... (3, Insightful)

Nerdfest (867930) | more than 3 years ago | (#34274994)

I think I've seen SQL written by you before. I realize your post is a joke, but I see people aliasing bad table names down to even less readable single letters. It's a maintenance nightmare. Treat SQL like a language and write it so it's readable and maintainable. It even frequently helps when you're trying to resolve performance problems ... they're much easier to spot in well written SQL.

Relational DB limitation or app design limitation? (1)

Arawak (98728) | more than 3 years ago | (#34274220)

Are you sure you are hitting a limitation of the RDBMS or a limitation in the way your services are built? I'm just a little skeptical that a SaaS startup is already hitting limits with what you "can do with relational databases". How many hundred terabytes are you talking about here?

Usually when I hear this I see a PHP application which hits the database synchronously for every request. Or worse, a Java/Python/Ruby/.NET/whatever application built like it was a PHP app.

Is this a slashvertisement or so? (1)

guruevi (827432) | more than 3 years ago | (#34274322)

What limits are you hitting. And why are you mentioning but one of the many solutions to your problem one which is probably mighty expensive compared to the other solutions.

If you're genuinely hitting a limit, you're doing it wrong. You're probably not Google so most likely you're having issues scaling your proprietary and expensive SQL database (Oracle, MSSQL) but don't want to buy more $10-20k licenses. Most likely you can fix it by simply throwing better and more hardware at it (SSD, more hard drives and RAM) and while you're at it changing to a cheaper database solution (MySQL or PostgreSQL) which you can scale further for less money.

Re:Is this a slashvertisement or so? (1)

cheesedog (603990) | more than 3 years ago | (#34275310)

Google isn't the only company in the world that has to deal with petabytes of data. It's also not the only company that has to deal with incredibly large volumes of structured data.

I speak from experience, son. Your relational DB can't handle successful internet-scale loads, no matter how many awesome dbas you hire, and no matter how much money you fork over to Oracle.

MySQL scales just fine. (4, Interesting)

poptix_work (79063) | more than 3 years ago | (#34274432)

I work with some very high traffic sites, storing large data sets (100GB+).

  Depending on the application (if it allows for different write-only/read-only database configurations) we'll have a master-master replication setup, then a number of slaves hanging off each MySQL master. In front of all of this is haproxy* which performs TCP load balancing between all slaves, and all masters. Slaves that fall behind the master are automatically removed from the pool to ensure that clients receive current data.

  This provides:
  * Redundancy
  * Scaling
  * Automatic failover

  The whole NoSQL movement is as bad as the XML movement. I'm sure it's a great idea in some cases, but otherwise it's a solution looking for a problem.

(*) http://haproxy.1wt.eu/ [1wt.eu]

Re:MySQL scales just fine. (1, Informative)

cheesedog (603990) | more than 3 years ago | (#34275258)

100GB+ is not a large dataset.

Look at columnar... (0)

Anonymous Coward | more than 3 years ago | (#34274512)

depending on what specifically you're trying to do, it may be the way to go.

Look at ParAccel.

FIFY (1, Insightful)

Anonymous Coward | more than 3 years ago | (#34274522)

"I'm currently responsible for operations at a software-as-a-service startup, and we're increasingly hitting limitations in what [OUR OUTSOURCED INDIAN DEVELOPERS] can do with relational databases.

FIXED IT FOR YOU MY PRETTY LITTLE OPERATIONS MANAGER. (Just using all caps to make you feel more at home)

MS SQL Server has Horizontal Partitioning (1)

danparker276 (1604251) | more than 3 years ago | (#34274802)

Save the headaches and just use SQL Server

Voltdb (1, Informative)

Anonymous Coward | more than 3 years ago | (#34275072)

Have you looked at voltdb ? http://www.voltdb.com .
My 2 cents.

,YOU FAiL IT (-1)

Anonymous Coward | more than 3 years ago | (#34275114)

code3ase became [goat.cx]

InfiniDB? (0)

Anonymous Coward | more than 3 years ago | (#34275168)

Depending on your intended application this may help: http://www.infinidb.org/

Losers mock (0, Flamebait)

cheesedog (603990) | more than 3 years ago | (#34275238)

You guys posting that traditional relational databases can handle the load of internet scale applications kill me. You mock this guy who has a legit problem that everyone who has ever run an internet scale technology is very familiar with.

NoSQL isn't some passing fad invented by high school kids.

Luckily, most of you will probably never discover that fact for yourselves, because you'll never have experience with a successful internet-scale architecture. Relational DBs are just fine for internal "enterprisey" apps, or for your hobby website that drives an astounding 1200 page views/month, or for your failed attempt at launching a web service that only ever garners 300,000 users, so you can continue to delude yourselves that there just isn't a problem here, and SQL is the only skillset you'll ever need.

For the elite few who actually achieve success, you'll totally know where the OP is coming from. Intimately. And you'll either be very glad that there is a path (hadoop, cassandra, mongodb, etc) to migrate to that solves your problems, or you'll be very glad that you started with one of those solutions in the first place.

fast and extremely scalable (1)

bhcompy (1877290) | more than 3 years ago | (#34275454)

The fastest DB I've ever used is based on PICK OS/DB. Reality is the retail name for it now(essentially an emulator with an API for *nix/Windows). The military used it for inventory tracking and various companies still use it today for a great deal of things. ADP uses it for extremely large databases with tons of history for accounting, financials, inventory, etc. Even very old systems with 20+ years of data are very responsive/quick(these systems are running Digital Unix 4 with Alpha processors) Pick/Reality is a hashfile oriented multivalue database. Wikipedia has a pretty good explanation and I believe Northgate Systems markets Reality today
Load More Comments
Slashdot Login

Need an Account?

Forgot your password?

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>