Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Yale Researchers Prove That ACID Is Scalable

CmdrTaco posted more than 3 years ago | from the i-could-prove-lunch dept.

Databases 272

An anonymous reader writes "The has been a lot of buzz in the industry lately about NoSQL databases helping Twitter, Amazon, and Digg scale their transactional workloads. But there has been some recent pushback from database luminaries such as Michael Stonebraker. Now, a couple of researchers at Yale University claim that NoSQL is no longer necessary now that they have scaled traditional ACID compliant database systems."

cancel ×

272 comments

Sorry! There are no comments related to the filter you selected.

I have to admit (0, Offtopic)

Even on Slashdot FOE (1870208) | more than 3 years ago | (#33437576)

I have a different image of ACID on Windows than they do.

Pfah. (5, Interesting)

stonecypher (118140) | more than 3 years ago | (#33437610)

NoSQL never was necessary. Traditional SQL database - not just terascale, but even simple ones like MySQL - regularly deal with data volumes at Google and Walmart that make the sites that built these databases in desperation look positively tiny.

Digg's engineers wear clown shoes to work.

Re:Pfah. (-1, Flamebait)

Anonymous Coward | more than 3 years ago | (#33437726)

I hear you smoke a mean pole. Do you swallow and would you let me fuck your ass after you blow me?

Re:Pfah. (-1, Flamebait)

Anonymous Coward | more than 3 years ago | (#33437864)

what kind of gaytitude is that? Don't anthropomorphize fags, they don't like it when you do it.

Re:Pfah. (0, Offtopic)

baka_toroi (1194359) | more than 3 years ago | (#33438238)

It's better to fuck their asses BEFORE the blow you :D (Ass2mouth FTW!) Thanks for the laugh!

Re:Pfah. (1, Interesting)

Anonymous Coward | more than 3 years ago | (#33437746)

NoSQL never was necessary. Traditional SQL database - not just terascale, but even simple ones like MySQL [...]

Is MySQL ACID?

Re:Pfah. (5, Informative)

Trieuvan (789695) | more than 3 years ago | (#33438296)

It is if you use innodb .

Re:Pfah. (3, Insightful)

TheSunborn (68004) | more than 3 years ago | (#33437818)

It was newer database size which were the problem but the number of queries per second(Aka performance) which could be executed.

You can run a Google size database from MySQL, but you can't use to MySQL* to implement a search solution with performance like Google, without requiring much much much hardware.

*Or an other sql database.

Re:Pfah. (1, Insightful)

Anonymous Coward | more than 3 years ago | (#33438374)

Right, raw size is only one component. As a practical matter, if you have 100 trillion records in a DB, you probably also have ferocious insertion and query rates, as well. Not enforcing ACID has its advantages under those conditions.

Whether such a tact was logically required is an interesting question...

Re:Pfah. (3, Insightful)

Anpheus (908711) | more than 3 years ago | (#33438806)

Well, and if you don't need it [the guarantees of ACID], why pay for it? I mean, if you have to spend any amount of time thinking about "How do I make that work?" that's a cost.

Whereas if all you care about is updating individual records without global consistency, well, don't enforce global consistency.

Re:Pfah. (5, Insightful)

mini me (132455) | more than 3 years ago | (#33437830)

NoSQL is not really about scalability, it is about modelling your data the same way your application does.

There is a strong disconnect between the way SQL represents data and the way traditional programming languages do. While we've come up with some clever solutions like ORM to alleviate the problem, why not just store the data directly without any mapping?

I am not suggesting that SQL is never the right tool for the job, but it most certainly is not the right tool for every job. It is good to have many different kinds of hammers, and perhaps even a screwdriver or two.

Re:Pfah. (5, Insightful)

bluefoxlucid (723572) | more than 3 years ago | (#33438098)

There is a strong disconnect between the way SQL represents data and the way traditional programming languages do.

Yes but there is a strong disconnect between computer RAM and information. Computer RAM contains DATA; information comes in associated tables. Relational databases represent data in tables with indexes, keys, etc. A Person is unique (has a unique ID), but they may share First Name, Last Name, and even Address (junior/senior in same household). There are many Races, and a Person will be of a given Race (or mix, but this is horribly difficult to index anyway). A Person will own a specific Car; that Car, in turn, will be a particular Make-Model-Year-Trim, which itself is a hierarchy of tables (Trim and Year are pretty separate, Model however will be of a particular Make, while a particular car available is going to be Model-Year-Trim).

Indexing and relating data in this way turns it into information, which is what we want and need. Separating the data eliminates redundancies and lets us use fewer buffers along the way, crunching down smaller tables and making fast comparisons to small-size keys before we even reference big, complex tables. Meanwhile, we're still essentially asking questions like "Find me all people who own a 1996-2010 Year Toyota Prius." Someone might own 15 cars, so we're looking in the table of all individual Cars with MYT where table MYT.Model = (Toyota Prius) and .Year is between 1996 and 2010, and pulling all entries in table Persons for each unique Cars.Owner = Persons.ID (an inner join).

Information theory versus programming. We're studying information here. We might have something more interesting to do than look in a giant array of Cars[VIN] = &Owners[Index]. For the actual data, the model we use makes sense; programmers get an API that says "Yeah, ask me a specific structured question and I'll give you a two-dimensional array to work with as an answer." That two-dimensional array is suitable for programming logic to manipulate specific structured data; extracting that data from the huge store of structured information is complex, but handled by a front-end that has its own language. You tell that front-end to find this data based on these parameters and string it together; it does tons of programming shit to search, sort, select, copy, and structure the data for you.

Re:Pfah. (0)

Anonymous Coward | more than 3 years ago | (#33438112)

Never use a hammer after using four or more screwdrivers. Bad things happen and insurance doesn't cover them.

Re:Pfah. (0)

logjon (1411219) | more than 3 years ago | (#33438502)

I just use a 16 lb sledge for everything.

Re:Pfah. (1)

sarkeizen (106737) | more than 3 years ago | (#33438274)

Isn't that a very specific context though? The underlying assumption seems to be that there is one dataset per application. Which may well be the general case - in other words what is the "same way your application does but without mapping" when your applications are written in different frameworks, languages or when the data is accessed via say an reporting environment?

Re:Pfah. (1)

Tablizer (95088) | more than 3 years ago | (#33438492)

For this reason I suggest that app language designers work on better fitting RDBMS and SQL rather than the other way around (at least for data-driven apps). OOP may be nice, but it inherently conflicts with relational concepts and patterns. Generally, one is based around attribute-handling idioms and the other behavior-handling idioms. OOP also tends to be nested, hierarchical, and/or graph-shaped; while relational is set-centric. Either you de-emphasize one or the other, or deal with complicated and expensive translation layers. Barring some revolutionary breakthrough, something has to give. Right now it's like men wearing womens' underwear and vice-verse.

Re:Pfah. (2, Insightful)

GWBasic (900357) | more than 3 years ago | (#33438568)

NoSQL is not really about scalability, it is about modelling your data the same way your application does.

I 100% agree. Earlier this year I created a moved a prototype application built around SQLite and flat files to MongoDB. MongoDB is SQL-like in its ability to have queries and indexes; but it stores its data in a way that doesn't require me to deconstruct all of my data structures into tables. This dramatically reduced complexity in code that used to deal with 5-6 SQLite tables. In the case of MongoDB, I was able to replace 5-6 tables with a single collection of structured documents. MongoDB lets me write queries against data that's deeply-nested, yet it can return the full data structure so I don't have the performance hit (and programmer time hit) of running (and writing) many queries to hydrate data structures around foreign key relationships.

The other advantage to MongoDB is that its schemaless approach makes it much easier to handle inheritance. I can have documents with common parts for base classes, and varying parts for child classes. This is much harder in SQL, because I either need to design a super-table that can handle all variations of the base class, or I need to use a multi-join around all potential classes that I can query. MongoDB's document-based approach, as opposed to SQL's table approach, lets me write a single query that can handle future subclassing of the data, and future variations of the data.

Re:Pfah. (0)

Anonymous Coward | more than 3 years ago | (#33438860)

In the case of MongoDB, I was able to replace 5-6 tables with a single collection of structured documents. MongoDB lets me write queries against data that's deeply-nested, yet it can return the full data structure so I don't have the performance hit (and programmer time hit) of running (and writing) many queries to hydrate data structures around foreign key relationships.

That's why we have views, stored procedures, and triggers. Do it on the data base and then suddenly you don't need to do all the other bullshit. That's the point.

All too often people do things poorly because they first learned to do things poorly on MySQL. Use a proper SQL RDBMS and then use the provided capabilities properly, and suddenly you won't have to do things poorly, over and over again, in all of your applications. You'll find you only have to do it right, once, and all your applications suddenly benefit.

Re:Pfah. (0)

Anonymous Coward | more than 3 years ago | (#33437908)

Google? No. They pretty much pioneered most of the NoSQL stuff. Their internal BigTable data store was the inspiration for stuff like Casandra (which Digg uses).

Re:Pfah. (5, Interesting)

bluefoxlucid (723572) | more than 3 years ago | (#33437914)

NoSQL never was necessary. Traditional SQL database - not just terascale, but even simple ones like MySQL - regularly deal with data volumes at Google

Google uses BigTable, a NoSQL database.

Re:Pfah. (2, Insightful)

TooMuchToDo (882796) | more than 3 years ago | (#33438084)

Google initially used MySQL for Adwords, tried to switch away from it, and then switched back (if I recall correctly). Your Googling May Vary.

Re:Pfah. (5, Funny)

Splab (574204) | more than 3 years ago | (#33438160)

"Your Googling May Vary."

Yes, that is exactly the problem with NoSQL.

Re:Pfah. (4, Funny)

GooberToo (74388) | more than 3 years ago | (#33438942)

Funny. Insightful. Informative. So many options with your post. I'm sure at least one moderator will get it figured out.

Re:Pfah. (1)

Shados (741919) | more than 3 years ago | (#33437976)

Depends for what part, but Walmart's site runs at least partly on a "NoSQL" (I use the term loosely in this case) system.

Re:Pfah. (4, Insightful)

DragonWriter (970822) | more than 3 years ago | (#33438114)

NoSQL never was necessary. Traditional SQL database - not just terascale, but even simple ones like MySQL - regularly deal with data volumes at Google and Walmart that make the sites that built these databases in desperation look positively tiny.

Database size was never the main driving force beyond the new move toward NoSQL databases. Support for distributed architectures is. In part, this is about handling lots of queries rather than handling lots of data; it also -- particularly if you are Google -- deals with latency when the consumers of data are widely distributed geographically.

And note that one of the companies that is heavily involved in building, using, and supplying non-SQL distributed databases is Google, who, as you so well point out, is very much aware of both the capabilities and limits of scaling with current relational DBs.

This new research may offer new prospects for better databases in the future -- but TFA indicates that the new design has a limitation which seems common in distributed, strongly-consistent system "It turns out that the deterministic scheme performs horribly in disk-based environments".

In fact, given that it proposes strong consistency, distribution, and relies on in-memory operation for performance, it sounds a lot like existing distributed, strongly-consistent systems based around the Paxos algorithm, like Scalaris. And it seems likely to face the same criticism from those who think that durability requires disk-based persistence, and that replacing storage on disks (which, one should keep in mind, can also fail) with storage in-memory simultaneously on a sufficient number of servers (which, yes, could all simultaneously fail, but durability is never absolute, its at best a matter of the degree to which data is protected against probable simultaneous combinations of failures.)

So -- reading only the blog post that is TFA announcing the paper and not the paper itself yet -- I don't get the impression that this is necessary are giant leap forward, though more work on distributed, strongly-consistent databases is certainly a good thing.

Re:Pfah. (2, Funny)

Tablizer (95088) | more than 3 years ago | (#33438708)

After all, MySql is why slashdot is so relia~ `} v* m& + ' ,

Re:Pfah. (0)

Anonymous Coward | more than 3 years ago | (#33439168)

This technology has been around since the beginning.... Ah yes, they used to call it a mainframe. NOSQL database = mess, no sorry, job security.

Duh (0)

Codename Dutchess (1782238) | more than 3 years ago | (#33437612)

It all just depends on how long you want to trip, man.

Re:Duh (-1, Troll)

Pojut (1027544) | more than 3 years ago | (#33437672)

Liquid, paper tabs, gel tabs...it's all scalable, man.

Re:Duh (0)

Anonymous Coward | more than 3 years ago | (#33438398)

Liquid, paper tabs, gel tabs...it's all scalable, man.

Hey mods, put down the tard-goggles.

digg does not need to worry anymore (5, Funny)

Dan667 (564390) | more than 3 years ago | (#33437670)

digg has chased all their users away with the new version of their site so they could probably change over to MS Access and be ok.

Re:digg does not need to worry anymore (2, Insightful)

Pojut (1027544) | more than 3 years ago | (#33437728)

offtopic:

Considering how fanatical digg users can be, I can't possibly imagine why they thought it was a good idea to implement the changes they've made.

Re:digg does not need to worry anymore (2, Interesting)

Kaboom13 (235759) | more than 3 years ago | (#33437902)

Because the entire site had been completely overwhelmed by spammers? Digg went from a great site to go see whats new to a glorified RSS feed for cracked.com , college humor and reddit. They had to change something,

Re:digg does not need to worry anymore (5, Funny)

BabyDuckHat (1503839) | more than 3 years ago | (#33438384)

Yeah, now instead of being a glorified RSS feed for reddit, they're an actual RSS feed for reddit. Great change!

Re:digg does not need to worry anymore (4, Insightful)

Dan667 (564390) | more than 3 years ago | (#33438456)

actually most of the change was to allow auto submitting of stories from big publishers/companies. They basically changed digg into a paid for RSS ad service. If you hated the gaming of the old site digg I am sure you just stopped using the new site digg all together. No one goes to a website to read ads.

Berkeley DB (3, Funny)

nacturation (646836) | more than 3 years ago | (#33437690)

Didn't Berkeley prove back in the 60s and 70s that acid was scalable?

Re:Berkeley DB (0)

jgagnon (1663075) | more than 3 years ago | (#33437752)

If you're talking about the drug, I'm sure they did... ;)

Re:Berkeley DB (1, Funny)

Anonymous Coward | more than 3 years ago | (#33437806)

Your grasp of the obvious is on par with a character from a Dan Brown novel.

Re:Berkeley DB (-1, Flamebait)

Anonymous Coward | more than 3 years ago | (#33437972)

Maybe his penis is named Obvious?

Re:Berkeley DB (1)

Zak3056 (69287) | more than 3 years ago | (#33437974)

Didn't Berkeley prove back in the 60s and 70s that acid was scalable?

At the very least, they proved it was salable...

Cowboy Chic (0, Offtopic)

Ukab the Great (87152) | more than 3 years ago | (#33437700)

But ACID still lacks NoSQL's "Cowboy Chic".

Re:Cowboy Chic (1)

Tablizer (95088) | more than 3 years ago | (#33439130)

"Referential integrity? We ain't need no stinkin' referential integrity."

DUDE! (0)

Frosty Piss (770223) | more than 3 years ago | (#33437742)

Yale Researchers Prove That ACID Is Scalable...

It's all in their minds!

Re:DUDE! (-1, Offtopic)

Pojut (1027544) | more than 3 years ago | (#33437778)

::fast, alternating chopping motion back and forth::

YOU'RE RUNNING DOWN A HALL! YOU'RE RUNNING DOWN A HALL! ::more chopping motion::

Re:DUDE! (0, Offtopic)

nschubach (922175) | more than 3 years ago | (#33437932)

... someone opens a door. ::smack:: Stop running down the hall!

Please (0)

Anonymous Coward | more than 3 years ago | (#33437768)

Tell the yalies to stick to drama and sophistry. What they know about computer stuff?!

Interesting thesis (5, Interesting)

Peeteriz (821290) | more than 3 years ago | (#33437876)

In essence, TFA claims that if the traditional ACID guarantee "if three transactions (let's call them A, B and C) are active ... the resulting database state will be the same as if it had run them one-by-one. No promises are made, however, about which particular order execution it will be equivalent to: A-B-C, B-A-C, A-C-B" is not abandoned (as in NoSQL systems), but is even strengthened to a guarantee that the result will always be as if they arrived in A-B-C order, then it solves all kinds of possible replication problems, requires less networking between the many servers involved, and allows for high scaling while also keeping all the integrity constraints.

Re:Interesting thesis (1)

capnchicken (664317) | more than 3 years ago | (#33438204)

Determinism solves many things in DB design that's why things like WITH SCHEMABINDING for views and user defined functions in MS SQL make things run so much faster. With over 40 years of RDMS design, it's odd that this path has never been gone down before. But the whole turning "out that the deterministic scheme performs horribly in disk-based environments" makes perfect sense if this is something that scales very well in high memory environments that didn't exist until now.

Now THIS is news for nerds, it's too bad I had to scroll through so many LSD/Acid (hurr hurr drugs) jokes to get down to a comment of someone who actually read this.

LET'S ALL TAKE ACID! (0)

Anonymous Coward | more than 3 years ago | (#33437948)

IT'S SCALABLE, WHOOO!

Possible != Practical (3, Insightful)

Tablizer (95088) | more than 3 years ago | (#33437952)

A bigger issue may be the cost of ACID even if it can in theory scale. Supporting ACID is not free. A free web service may be able to afford losing say 1 out of 10,000 web transactions. Banks cannot do it, but Google Experiments can. The extra expense of big-iron ACID may not make up for the relatively minor cost of losing an occasional transaction or customer. It's a business decision.

Re:Possible != Practical (0)

Anonymous Coward | more than 3 years ago | (#33438074)

It's a business decision.

If it is a business decision banks will do it. At least if they feel that the lost transactions don't cost them more than they can save.

Re:Possible != Practical (1)

Tablizer (95088) | more than 3 years ago | (#33438268)

Accounting must balance or you have boat-loads of headaches. I've seen some spend months tracking down a few pennies of difference. The transaction paths are often too complicated to just plug the difference with a fudge; it creates unintended consequences down the line. It's kind of like trying to lie about a subject that you don't know much about; your lie-web unravels on scrutiny from experts and the diligent.

And further if you are sued and it comes out that you skipped ACID to cut costs, the jury won't be very lenient. The car industry found this out for situations where they skipped safer designs to shave some bucks.
   

Re:Possible != Practical (1)

sarkeizen (106737) | more than 3 years ago | (#33438332)

Ok all puns on "acid" aside (especially when you add adjectives like "big-iron"). The point of the article seems to be about scaling out - specifically with cheaper hardware. I agree that one's choice of tools is a business decision (so is everything in business) but it's not like using MySQL or postgres is somehow cost prohibitive.

Re:Possible != Practical (1)

Tablizer (95088) | more than 3 years ago | (#33438562)

Given all else being equal, supporting ACID is going to require more hardware than not supporting ACID (as long as living with or ignoring more integrity errors is not a huge problem, which is probably domain-specific).

Re:Possible != Practical (0)

Anonymous Coward | more than 3 years ago | (#33439144)

Using MySQL or postgres *is* cost prohibitive if it runs significantly slower than an alternative database or data-warehouse for your specific use cases, and therefore requires more expensive hardware.

So even if the thrust of TFA is correct in all respects, ACID will never come free; so there will probably be some use cases where NoSQL may prevail on its technical merits.

Re:Possible != Practical (2, Insightful)

Peeteriz (821290) | more than 3 years ago | (#33438476)

Typically the NoSQL approach just shifts the problems from the database layer to the application programmer - if it's simply ignored, a typical app can't cope with unpredictable/corrupt data being returned from db, and results in weird bugreports that cost a lot of development time to find and fix; and with these fixes parts of the ACID compliance are simply re-implemented in the app layer.

You gain some performance of the db, you lose some (hopefully less) performance in the app, and it costs you additional complexity and programmer-time in the app.

Re:Possible != Practical (1)

Tablizer (95088) | more than 3 years ago | (#33438662)

Not necessarily. For one, it could be dumping the extra processing to the user's browser, whose CPU cycles are generally not part of a company's cost (if they don't overdue it).

Second, the cost of dealing with problems may not be that big in some domains. They may simply dump, skip, or ignore the error or transaction(s). It's all about weighing the trade-offs. Do you spend X dollars to reduce the number of pissed users by Y percent, for example.

Re:Possible != Practical (1)

Troy Roberts (4682) | more than 3 years ago | (#33438792)

Now, if only you had read the article

Re:Possible != Practical (1)

Tablizer (95088) | more than 3 years ago | (#33439104)

Perhaps the word "big-iron" was misleading on my part. My apologies. As I mentioned nearby, ACID is going to cost more than non-ACID even if cheap boxes are being used. ACID will require more "cheap boxes" on average for the same user volume.

I hate SQL and Databases in General... (0, Troll)

paulsnx2 (453081) | more than 3 years ago | (#33437984)

... because on every application I have ever worked on, the Database has always been the performance bottleneck. Testing of DB applications is always a problem, because the running of tests generally changes the database, rendering tests unrepeatable without reseting the database. Configuring applications to use this database or that database also ends up being a problem for most applications.

Furthermore, while programming in general has continued to progress through many languages, exploring many different ways to describe problems, SQL is still SQL. SQL is fixed in a syntax and written with naming conventions and styles that can best be described as neo-Cobal.

Bottom line: SQL is tedious, ugly, slow, and difficult to test. And don't get me started on stored procedures and the difficulty of using source code management with stored procedures.

Last gripe: A traditional Relational database imposes ACID overhead on every application, even if you don't really need it or use it. This is like a programming language that imposes a SORT overhead on all your data structures even if you rarely or never need to sort them.

Why is it that we continue to use a technology based on a 1960's view of a problem when clearly there ARE other solutions and ways to approach said problem?

Re:I hate SQL and Databases in General... (5, Insightful)

jeff4747 (256583) | more than 3 years ago | (#33438182)

Why is it that we continue to use a technology based on a 1960's view of a problem when clearly there ARE other solutions and ways to approach said problem?

Because it works.

"It's old" is a terrible reason to replace something. Go back to your previous arguments an you have a case. After all, a Core i7 is based on a 1960's view of a problem with an enormous number of band-aids applied in the intervening years, but you don't seem too concerned with replacing that.

Re:I hate SQL and Databases in General... (0, Redundant)

paulsnx2 (453081) | more than 3 years ago | (#33438266)

The "old" part isn't the problem, but that there ARE other solutions and better solutions (that might address all previous issues) that are being actively ignored.

You are not going to win points for reading comprehension if you don't read the whole sentence.

Re:I hate SQL and Databases in General... (1)

jeff4747 (256583) | more than 3 years ago | (#33438542)

You are not going to win points for reading comprehension if you don't read the whole sentence.

Irony much?

You might wanna read my 2nd sentence. I know, I know. That's really far into my post.

Re:I hate SQL and Databases in General... (1)

paulsnx2 (453081) | more than 3 years ago | (#33438936)

Uh.... If I never said that "being old" is a reason to replace something.... As you would have known if you actually read the sentence you quoted. Given this observation, what am I to say about the fact that the Core i7 is based on a 1960's view of a problem? Besides, the Core i7 ISN'T a 1960's based solution, but is based on a 1960's solution. There is an important difference between the two statements.

Everything we do in CS is based on work that goes back to 1939 and even earlier. However, in the case of the Core i7 (as an example) we CHANGE the approach to try and fix various problems we have with our performance.

Personally, I think going back to old ideas and realizing that we can now implement them better/faster/cleaner is a great way to approach many problems. That a solution is "old" isn't a problem, but it is a problem if a solution has known issues, and we just live with them.

Re:I hate SQL and Databases in General... (5, Informative)

poet (8021) | more than 3 years ago | (#33438188)

Spoken with proud ignorance.

Anyone who has properly scaled an application knows the database isn't the problem. If it was, it wouldn't take 12 applications servers to bring the thing to its knees. That said, most of your gripes equate to:

I am not a DBA and therefore I do not understand DBA and therefore I must complain.

Further SQL has nothing to do with ACID. AT ALL!

Re:I hate SQL and Databases in General... (1)

paulsnx2 (453081) | more than 3 years ago | (#33438508)

I will absolutely agree that a well designed database does not have performance issues. However, I work in a segment of the industry that works with Health and Human services, and the databases have issues that make any reasonable DBA sick.

None the less, database throughput is always an issue. Our applications scale just fine for our needs (as you imply) but it remains that even if only one person is running one application against the database, the through put is just "meh" at best. This is because every operation requires queries against the database to move significant amounts of data from many different tables. Could we build applications with better performance? Absolutely, and using traditional Relational Databases too, if the Schema was properly designed.

All of this begs the question. The real question is why we use a technology that is so sensitive to bad schema design? Why use a technology that has such a high baseline overhead? Why use a technology that is so tedious? Why use a technology that is so hard to test?

Absolutely the developer doesn't have to build applications that inherit all these problems from the database. I have designed applications that sit on databases, and have none of these faults. But unfortunately not all the applications I work on were designed to avoid these issues.

Now you ARE right that I am not a DBA. But if I have a fault, it isn't because I don't understand the DBA, but that I don't understand the database....

And yeah, in my rant I criticized SQL and ACID and relational databases in general as if they were all the same. They are not, and in fact need not have any overlap at all. Still, I'll stand by my rant as an expression of my annoyance with various aspects (these and others) of this particular approach to the persistence problem.

Re:I hate SQL and Databases in General... (1)

gbjbaanb (229885) | more than 3 years ago | (#33439042)

c'mon we use web services and only a few people complain about the inefficiencies there, we use XML and only some people complain about sprawling XML documents you can get.

You need to go learn a bit about DBs. SQL is pretty easy, once you've grasped the list-based concepts behind it. Stick to the simple bits and you're 90% done. They're not as bad as you think - its just your ignorance that's confusing you.

All technology suffers from the flaws you point out, all technology is fragile and easy to create total crap out of. (I know, I've worked with some 'professional' developers who make the most godawful mess, some of them even think they really are god's gift to coding).

DBs incidentally are one of those strange technologies where a 'clean, elegant and well designed' schema is a bad thing. If you over-normalise a DB performance will suffer, as will the code you have to write to use it. If you cobble everything into a few tables, it actually goes faster and is easier to code against. Strange, but true.

Re:I hate SQL and Databases in General... (2, Insightful)

GooberToo (74388) | more than 3 years ago | (#33439160)

All of this begs the question. The real question is why we use a technology that is so sensitive to bad schema design? Why use a technology that has such a high baseline overhead? Why use a technology that is so tedious? Why use a technology that is so hard to test?

Because fairly consistently, for the past forty years, every time someone says they've created something better than SQL and released to the market, the market proves them woefully and completely wrong. As such, as much as people piss and moan about SQL, SQL has consistently proven to be an excellent, general purpose solution and amazingly poorly understood by the masses. And solutions such as MySQL has only made things worse. That's not to say there are not superior niche solutions, only that SQL is one of the few database technologies which has continued to survive for decades as a general purpose solution, and rightfully so.

Its like the world suddenly doing their own plumbing, framing, and mechanical work and then proudly exclaiming the state of architecture and the car industry stinks because the world is falling apart around them. In reality, that means we need far more qualified DBAs and far fewer people who can barely spell, "SQL", designing and condemning the world around us.

Its literally been years since I've run into a qualified DBA, despite the fact "DBA" was part of their title. Turns out, being able to spell, "DBA" is all too often enough to qualify one for such a position. And don't get me started on the all the more common case of people who don't even know what a DBA does and yet they are responsible for actually creating the schema/data model.

Re:I hate SQL and Databases in General... (1)

medv4380 (1604309) | more than 3 years ago | (#33438672)

This reminds me of a quote I have at my desk.

Normalization [simple-talk.com] is not just some plot by database programmer to annoy application programmers (That is merely a satisfying side effect!)

Re:I hate SQL and Databases in General... (1)

mugnyte (203225) | more than 3 years ago | (#33438222)

Actually, if you look at set theory and declarative languages, SQL is coming to more traditionally procedural environments. (MS's LINQ, for example.) It's an amazing language, good at what it's supposed to do. You could nearly complain the same about XML transforms as SQL. They just collect & format data. It's the programmers who make it complex.

  Unavoidable bottlenecks in systems come from storage, searches and transforms. If you want to remove the DB from the equation, what layer of your system should be performing these things?

  BTW: The math in set theory hasn't changed since the 1960's, it doesn't "get old" and need replacing. And you should learn to spell COBOL, your rants will appear more credible.

Re:I hate SQL and Databases in General... (1)

DragonWriter (970822) | more than 3 years ago | (#33439092)

BTW: The math in set theory hasn't changed since the 1960's, it doesn't "get old" and need replacing.

Its worth noting that, in additional to the arguments from proponents of non-relational databases, SQL also gets criticism from proponents of actually doing set theory right (e.g., Date and Darwen.)

Really, SQL and the databases using it are shaped as much by optimization of disk-based storage using popular computing architectures of the time at which it took shape as any mathematical model of data.

As computing architectures and performance attributes (not speed, but relative costs of different access patterns) of storage media change, underlying database implementations and the languages that best leverage them may change, even when you want to be generally guided by set theory.

You hate what you don't understand (5, Insightful)

frist (1441971) | more than 3 years ago | (#33438224)

Sounds like you don't really understand what you're talking about. The reason we continue to use ACID compliant RDBMS is because they work and they work well. If you don't think that RDBMS have changed over the years, you're simply lacking experience. I feel this is most likely the case as you comlain about the interface language (SQL), and don't understand how to CM stored procedures, or how to test a DB (OMG I have to make a copy of the DB to test - so hard!) Comlaining about the overhead of using an RDBMS in an application that doesn't require an RDBMS is tantamount to complaining about how hot you get while wearing a spacsuit when you jog in the park.

Re:I hate SQL and Databases in General... (0)

Anonymous Coward | more than 3 years ago | (#33438408)

Ok, first, there were no ACID compliant databases in the 1960s. Try late 1970s. I'm guess your not old enough to remember that. So remember this; hyperbole is an enemy of critical thinking. If you want a successful career in IT, drop the exaggeration. It also won't help you construct successful arguments to convince your colleges to listen to your ideas.

Second, if you don't need ACID compliant database...don't use an ACID compliant database. Programming, and engineering in general, is about choosing the right solution for the problem you have.

There is no shortage of choices, NoSQL databases being one, but certainly not the first. Unfortunately, many young programmers today use a database when they should be using something much simpler...Eg, a file on disk does a great job of storing information, is very fast.

Re:I hate SQL and Databases in General... (1)

Have Brain Will Rent (1031664) | more than 3 years ago | (#33438764)

Absolutely true. I rewrote an application that had a 70 table database to use a simple tree structured representation - it ran two orders of magnitude faster and the code was easier to understand because the data representation conformed well to the actual problem domain. Relational databases are great but they aren't always the appropriate answer.

But as an aside I don't think hyperbole is the enemy of critical thinking - it is just a tool (perhaps weapon) the proper employment of which requires immensely more skill than most people possess.

Re:I hate SQL and Databases in General... (1)

gbjbaanb (229885) | more than 3 years ago | (#33438946)

hmm. or you could have put an index on the right columns... which generally are implemented as tree structures. I'm sure your code was perfectly understandable to all who came after you, thinking they were working with a DB :)

Re:I hate SQL and Databases in General... (1)

amorsen (7485) | more than 3 years ago | (#33438418)

The parent is not a troll, it is spot on. The problem is that the database backend and the language frontend are tied together. To invent a new query language you need to invent a database backend to go with it, and you can't try out a new query language on an existing database deployment. Similarly, any innovations in the database backend are hampered by the limited syntax of SQL. If you can't make a small extensions to SQL to get it working, then you can forget about implementing it at all. This pretty much means game over for any database innovations.

Even Relational Algebra is infinitely easier to understand than the pseudo-English mess that is SQL. Much like even Haskell is easier to read than COBOL.

Re:I hate SQL and Databases in General... (1)

Peeteriz (821290) | more than 3 years ago | (#33438518)

Any decent framework abstracts out the SQL syntax for you in a nice manner (say, ARel in the Rails 3.0 framework is quite nice) , but gain a lot of compatibility by using SQL, allowing to choose from engines from SQLite in a flat file to Oracle on a cluster.

Hear, hear. (1)

Cyberax (705495) | more than 3 years ago | (#33439008)

Yes, I'd like to be able to work with RDMBS data in REAL languages, not in ugly SQL or even more uglier DB internal languages.

DB tables can be represented with lists, on which composable pure (side-effect free) functions could operate. So JOINs can be expressed as list comprehensions. 'where' naturally is expressed as filters, etc. Care should be taken to maintain purity of functions used in queries, so they can be optimized efficiently.

LINQ in C# has beginnings of something similar.

PS: Am I describing Haskell, by any chance? :)
PPS: If your query requires complex complex and non-trivial optimizations by the RDBMS engine, then it's a bad query.

Re:I hate SQL and Databases in General... (1)

PotatoFarmer (1250696) | more than 3 years ago | (#33438500)

Why is it that we continue to use a technology based on a 1960's view of a problem when clearly there ARE other solutions and ways to approach said problem?

Which problem? Storing your data, retrieving your data, modifying your data while guaranteeing transactional integrity, analyzing your data in aggregate, providing ways to recover your data, providing ways to reset your data to a previous state?

I'm not saying a traditional relational database is the perfect solution to everything, but it's silly to think that every approach will address the same set of concerns.

Re:I hate SQL and Databases in General... (2, Informative)

davidbrit2 (775091) | more than 3 years ago | (#33438658)

...because on every application I have ever worked on, the Database has always been the performance bottleneck.

What alternative have you seen that handles the same workload more efficiently? Flat files? I've seen plenty of database-related performance issues, but it's almost never inherent in the database - it's the idiot that wrote the lousy table-scanning code that's reading a couple rows out of a table with millions that's the problem.

Testing of DB applications is always a problem, because the running of tests generally changes the database, rendering tests unrepeatable without reseting the database.

If only you could start something like a "transaction", which you could then "roll back" after finishing the test, leaving the database in its original state. And if you could somehow "back up" the database and "restore" it on a test server, or under a different name. That would be awesome.

And don't get me started on stored procedures and the difficulty of using source code management with stored procedures.

Checking your create/change scripts into source control is no more difficult than checking your C source in prior to compiling it.

SQL is fixed in a syntax and written with naming conventions and styles that can best be described as neo-Cobal.

While I don't totally disagree on this point, calling SQL "fixed" is a bit like saying C# and Java are the same. I promise you any meaty SQL Server code will not run on Oracle without very significant changes that will have to be done by someone that will cost you a lot of money (and likewise with Oracle to SQL Server). The capabilities vary wildly by platform, and the syntax is only identical for the simplest of CRUD statements.

Last gripe: A traditional Relational database imposes ACID overhead on every application, even if you don't really need it or use it. This is like a programming language that imposes a SORT overhead on all your data structures even if you rarely or never need to sort them.

I have to give this one a LOLWUT. If you're using a big RDBMS, it's likely a multi-user system. If you've got multiple users and connections, you want ACID. This isn't like imposing sorting overhead on data structures, it's like imposing the basic memory protection, process isolation, and filesystem durability you find in any competent operating system. If you want to see what it's like without those protections, go use Mac OS 9 for a week or so, or an Access database used by a few dozen people over a network.

Re:I hate SQL and Databases in General... (1)

iamhigh (1252742) | more than 3 years ago | (#33438834)

SQL is still SQL. SQL is fixed in a syntax and written with naming conventions and styles that can best be described as neo-Cobal.

Has relational algebra changed (no, it's complete)? Why would the basics of SQL change then? Sounds like you just don't understand relational math and structured informaion basics.

Re:I hate SQL and Databases in General... (1)

Have Brain Will Rent (1031664) | more than 3 years ago | (#33439006)

I don't know if you are using SQL and "relational database" as equivalent... it seems that way. Anyhow a long time ago there were many different database solutions and most of them weren't relational databases. Then relational databases became popular and anything else almost seemed to disappear. I didn't really get this enormous shift because there are lots of domains where a relational database is not the natural representation of the information being modelled. But for most applications that most people are interested in relational databases work well and SQL represents the ideas behind relational databases quite well. So SQL is still here relatively unchanged decades later because nothing better has come along - apparently it fills its niche quite well - well enough that it hasn't been dislodged.

As for "neo Cobol" I think it was either Wirth or Dijkstra that said that typing speed was not the limiting factor in programming.

Re:I hate SQL and Databases in General... (0)

Anonymous Coward | more than 3 years ago | (#33439218)

We'll make you use punched IBM cards for 5 years, and you'll come around.

ACID does not imply SQL (2, Insightful)

LightningBolt! (664763) | more than 3 years ago | (#33438094)

For instance, Neo4J is a scalable graph-based "nosql" DB with ACID.

NoSQL is also about arbitrary schemas (1)

scorp1us (235526) | more than 3 years ago | (#33438206)

NoSQL's two big features are scalability and the arbitrary schemas. While the paper covers the first (though I still think map/reduce has its place) NoSQL does do taxonomy-based (hierarchical) schema better. The only way to do that in SQL is to have a property table, where the parent object is a object RID, and a huge table of attached properties and values to that. You might be able to get your indexes to perform reasonably well, but only by duplicating the some data. And on top of that, just try writing a query for hierarchical data! You'll have sub-selects for each level of hierarchy. This means in order to to something relatively simple, like KPCOFGS of species classifications, you'll need a select and 6 sub-selects. At least that one is well defined to . If its not, you just don't know how many, and you have to write a recursive function to generate your select query, or process the results from it. Either way, you repeatedly consider 99% useless records at every level. True, you can cheat at this because there are always 7 levels. But that is not true for most other trees.

Re:NoSQL is also about arbitrary schemas (1)

capnchicken (664317) | more than 3 years ago | (#33438402)

There is more than one way to do Hierarchical Query's, it just depends on the RDMS. Oracle has had it for years and SQL Server implemented it in the 2005 edition. You don't need sub-selects.

Re:NoSQL is also about arbitrary schemas (0)

Anonymous Coward | more than 3 years ago | (#33438596)

Oracle's CONNECT BY is much much slower than a custom index based on nested sets. I was using both (and implemented the latter in PL/SQL) and the difference was 100x in most cases. Tell me something about default SQL implementations...

Re:NoSQL is also about arbitrary schemas (1, Informative)

Anonymous Coward | more than 3 years ago | (#33438440)

ANSI defines a query mechanism for walking heirarchies in Common Table Expressions. Granted, the garbage that is MySQL has no support for this, but the latest releases of Postgresql, Oracle, DB2 and Microsoft SQL Server all do using the same syntax. I will say that SQL is not really the best suited for such things, but it does work.

As for schema-less data, there are a couple of solutions which I believe are all DB-specific. Microsoft SQL Server allows storage of XML data as well as SQL extensions utilizing XQuery to query into that data. It also supports indexing that data and using XML schemas to constrict the nature of that data if necessary. Microsoft SQL Server 2008 also added sparse table support which is built on top of XML storage which allows a table to have 30,000 columns and optimized for the majority of those columns being NULL on any particular row. I know that ANSI does get into XML storage a little bit but I'm not sure which DBs actually implement the standard, if any, especially to a level where it would be a workable solution.

They answered the wrong question (2, Insightful)

mysidia (191772) | more than 3 years ago | (#33438348)

We knew ACID can scale already.

With enough money poured into it, and new implementations, ACID can scale.

They solved some problems with scaling out, not necessarily the problems with it scaling up. Scaling does not necessarily just mean replicas and quick failover -- it means good performance without millions spent on hardware too, in terms of overhead, storage requirements, storage performance, server performance.

NoSQL scales in certain cases less expensively, with less work, and doesn't require complicated DBM algorithms. The representation of data is also simpler, and requires less work to maintain than tables.

It's just a result of major existing SQL implementations being so expensive with large datasets, that sometimes it costs more in terms of performance and required hardware, than simply using NoSQL.

I also love this gem from the article:

If the system is also stripped of the right to arbitrarily abort transactions (system aborts typically occur for reasons such as node failure and deadlock), then problem (b) is also eliminated. ... given an initial database state and a sequence of transaction requests, there exists only one valid final state. In other words, determinism.

I suppose the authors are from a land where hard drive space is infinite, database server resources are always guaranteed ahead of time... I/Os never have unrecoverable errors, syscalls never return error codes, RAM is infinite, programs never crash.

The conclusion that ACID alone is the bottleneck is not necessarily true. The SQL language itself requires a complex implementation just to parse and implement queries, that can add latency.

Not Proven (Yet) (1)

TheNinjaroach (878876) | more than 3 years ago | (#33438558)

I don't think they've proven it yet, they simply offer some solutions to what they admit is a very difficult problem. In other words, we'll see how their ideas pan out.

Just in case anybody else doesn't know... (3, Informative)

elwin_windleaf (643442) | more than 3 years ago | (#33438582)

From the Wikipedia Article (http://en.wikipedia.org/wiki/ACID [wikipedia.org] )

"In computer science, ACID (atomicity, consistency, isolation, durability) is a set of properties that guarantee database transactions are processed reliably. In the context of databases, a single logical operation on the data is called a transaction."

NoSQL is about a lot of things. (2, Interesting)

Ouija (93401) | more than 3 years ago | (#33438592)

SQL syntax is dated and very obtuse. Just look at the different syntax between insert and an update. ...wouldn't you rather just have "save"?

Object-relational mapping is cumbersome and mis-matched in SQL. 1:many either yields n+1 queries or a monster cartesian product set. And, what about inheritance? It just doesn't jive.

It isn't about losing ACID- although not every purpose needs ACID. Your average shared drive filesystem isn't ACID, for example.

When you have anemic domains that aren't nailed down and need to be readily flexible without big re-designs, JSON-based No-SQL works very well.
When you want to avoid n+1 and have well-defined data needs with 4MB of data across your object graph, No-SQL works... very very well.
When you want to segregate the business services and its backing data store from the separate concern of BI, No-SQL keeps the riff-raff out of your data store.

It's different. It solves different problems. Keep your mind open.

Not NoACID, NoSchema (2, Interesting)

bokmann (323771) | more than 3 years ago | (#33438606)

Interesting article )and yes, I read the article), but the point of the NoSQL movement isn't so much about SQL, or ACID, as much as it is about Schema.

Most applications today are written in object-oriented languges like Java, C#, Ruby, etc... and most common frameworks in these languages use object-relational models to essentially 'unpack' the object into a relational model, and then reconstitute the objects on demand. this post [tedneward.com] explains the kinds of problems better than most.

NoSchema is about storing data closer to the format we process it in today. Key-Value pairs. XML. Sets and Lists. Object-Oriented data structures. This is about abstractions that make developers more productive. It is a tool in a toolbox, and useful in some circumstance and not in others.

SQL databases do not have to be the 'one persistence data mechanism to rules them all'. We don't need one; we need many that solve differing classes of problems well.

Err... (1)

WSOGMM (1460481) | more than 3 years ago | (#33438706)

Acid is definitely scalable if you use blotter paper.

Prove? (1)

Troy Roberts (4682) | more than 3 years ago | (#33438842)

The editors have a loose definition of the work prove. I read the article and they provide some compelling arguments. However, I saw no proof in a mathematics or scientific way.

It's scalable allright. (1)

Major Downtime (1840554) | more than 3 years ago | (#33438878)

Of course ACID is scalable, but you have to be very careful with the dosage. Even Albert Hofmann himself never doubted that.

Relaying my comments from the blog (1)

Cyberax (705495) | more than 3 years ago | (#33438898)

To achieve 'nonconcurrency' one needs to introduce a global ordering of transactions. Which WILL require a shared resource among ALL of the transactions. No way around it, sorry.

And what's funny, this resource some of the problems of ACID systems. However, there should be advantages (no need for rollbacks, etc.).

Besides, all of this doesn't tackle another advantage of NoSQL systems: working with HUGE amounts of data. There'll still be problems in ACID systems if data access requires communication between several storage nodes.

And don't forget the CAP theorem. You can't get Consistency, Atomicity and Partition Tolerance at the same time. RDBMS typically 'solve' it by dropping the requirement for the partition tolerance. Usually by using quorum sensing schemas, etc.

Hmmm NoSQL Database You Say... (0)

Anonymous Coward | more than 3 years ago | (#33439046)

This has been around for years, yeah I think used to call it a mainframe.

The holiest of holy wars (0, Troll)

Angst Badger (8636) | more than 3 years ago | (#33439100)

I've never been sure why it is, but SQL (and the relational model it can be used to implement if you know what you're doing) attracts more wild-eyed fanatics than the Amiga and Ruby. Nowhere else will you find so many people so confidently and aggressively certain that the have the One True Way to do things, at least not without getting into actual religion. That anyone, anywhere does things differently (or even thinks about it) seems to deeply threaten them and provoke the sort of contempt that normal people reserve for child pornography. It frankly baffles me. DBA compensation isn't that good, and certainly not all DBAs can be such one-trick ponies.

The simple fact of the matter is that the relational model is probably the best general purpose data storage model we have, and it has the advantage of logical rigor and, as a result, the advantage of being extremely well-understood. But this in no way changes the fact that any general purpose approach, at least in some (but probably many if not most) cases, will be outperformed by a well-designed application-specific method. This remains the case no matter what your methodological hobby horse is, except in the tiny minority of cases where a truly optimal method can be rigorously proven.

Worse -- and this is true of all kinds of fanaticism, computer science-related and otherwise -- it tends to discourage research into unexplored areas that might yield new and better methods. E.F. Codd developed the relational model through precisely such an expedition into the mathematical unknown, and someday, the model that surpasses it (at least for certain cases) will be produced in the same way. It might be a descendant of one of the current so-called NoSQL approaches. It could be a reaction to their shortcomings. It might come from a completely unexpected corner. But wherever it comes from, you can be certain that we will enjoy its benefits later than we had to because it will have to push through the reactionary resistance of people who've stared at the relational model for so long that they can conceive of nothing else.

Field calls (1)

Florian Weimer (88405) | more than 3 years ago | (#33439116)

This seems to be a reinvention of field calls, with a slightly different purpose.

Load More Comments
Slashdot Login

Need an Account?

Forgot your password?

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>