Enthusiasts Convene To Say No To SQL, Hash Out New DB Breed

timothy posted more than 4 years ago

Data Storage 423

ericatcw writes "The inaugural NoSQL meet-up in San Francisco during last month's Yahoo! Apache Hadoop Summit had a whiff of revolution about it, like a latter-day techie version of the American Patriots planning the Boston Tea Party. Like the Patriots, who rebelled against Britain's heavy taxes, NoSQLers came to share how they had overthrown the tyranny of burdensome, expensive relational databases in favor of more efficient and cheaper ways of managing data, reports Computerworld."

Quit Whining (5, Funny)

KingPin27 (1290730) | more than 4 years ago | (#28565105)

Just use flat text files --- no need for expensive db's .... think of the freedom!

Re:Quit Whining (4, Insightful)

Anonymous Coward | more than 4 years ago | (#28565157)

The horrible lag I get when using address completion in Firefox 3 makes me wish more people thought that way!

Re:Quit Whining (1)

phantomfive (622387) | more than 4 years ago | (#28565553)

This is one of the main objectives of ReiserFS, to make such things easy, a project which unfortunately has run into some difficulty of late.

Re:Quit Whining (0)

Anonymous Coward | more than 4 years ago | (#28565591)

Like his wife?

Re:Quit Whining (1, Funny)

Anonymous Coward | more than 4 years ago | (#28565655)

She was screwing Mark Sanford!

Re:Quit Whining (2, Funny)

MichaelSmith (789609) | more than 4 years ago | (#28565727)

This is one of the main objectives of ReiserFS, to make such things easy, a project which unfortunately has run into some difficulty of late.

I wonder if I could sneak Hans an eeepc inside a birthday cake...

Re:Quit Whining (1)

rubycodez (864176) | more than 4 years ago | (#28565775)

I"ve lost data in two filesystems thanks to the Slasher's shoddy work.

Re:Quit Whining (4, Funny)

Paradise Pete (33184) | more than 4 years ago | (#28566195)

I"ve lost data in two filesystems thanks to the Slasher's shoddy work.

Have you looked near Redwood Regional Park? On the side of a hill?

A time and place for everything (3, Insightful)

Marillion (33728) | more than 4 years ago | (#28565143)

There is a time and place for SQL. There is a time and place to avoid SQL.
SQL is great for financial data. SQL is terrible for genetic data.

Re:A time and place for everything (0, Offtopic)

snl2587 (1177409) | more than 4 years ago | (#28565263)

It was the best of times, it was the worst of times,
it was the age of wisdom, it was the age of foolishness,
it was the epoch of belief, it was the epoch of incredulity,
it was the season of Light, it was the season of Darkness,
it was the spring of hope, it was the winter of despair,
we had everything before us, we had nothing before us,
we were all going direct to heaven, we were all going direct the other way

Re:A time and place for everything (3, Insightful)

Bromskloss (750445) | more than 4 years ago | (#28565367)

It would be interesting to hear why this is.

Re:A time and place for everything (0)

SendBot (29932) | more than 4 years ago | (#28565467)

I'm just throwing a half-educated guess out there, but genetic algorithms have so many outputs tied back into its inputs, all changing around quite frequently, such that an sql implementation would be painfully contorted.

But then, I don't quite see how neural network programs need mass replication the way db's do.

or would they.... ?

this is an interesting issue!

Re:A time and place for everything (3, Insightful)

Carewolf (581105) | more than 4 years ago | (#28565521)

Design an efficient table relating a tree structure. Then design queries to answer questions such as:
* Find the nodes in the subtree under B.
* Find all ancesters of G
* Find the nearest common ancestor of D and H

Trees is a wellknown problem of SQL, but the fact is that SQL can't handle most datastructures and complex relations, only very simple one dimensional ones.

Re:A time and place for everything (1)

Threni (635302) | more than 4 years ago | (#28566007)

One dimensional? So my really-fast snowflake design doesn't exist in your world? 2D tables no good? Sigh...and I was so happy with them.

Re:A time and place for everything (5, Interesting)

E IS mC(Square) (721736) | more than 4 years ago | (#28566043)

>> Trees is a wellknown problem of SQL, but the fact is that SQL can't handle most datastructures and complex relations, only very simple one dimensional ones.

Sorry, that's not true. Have you tried analytical functions? You would be amazed how complex scenarios can be handled easily with them. And they are part of ANSI SQL standards. And db providers (Oracle etc) have taken the concept and improved a lot on it.

I think the anti-sql 'movement' has more to do with new (internet era) languages and their developers than so called 'lack' of features. In my limited experience, I have observed people coming from C (and such) background have no problem with sql, while java developers (and this is probably true for most developers working on web-based applications) are the worst kind when it comes to understanding even basics of sql. All they want is their objects.

I strongly believe that a competent programmer designing/developing system which includes data and data-storage should at least know normalization, indexes, and what does it mean by 3NF. Programming language is one thing, database is another, and knowledge of both is required to build a decent system.

Re:A time and place for everything (2, Insightful)

MichaelSmith (789609) | more than 4 years ago | (#28565531)

It would be interesting to hear why this is.

My guess would be that because SQL is a Structured Query Language it is best used for handling structured data. If you have serial, unstructured data you have to invent your own format for it to use inside the database, and then the query language isn't helping you.

Re:A time and place for everything (3, Insightful)

Marillion (33728) | more than 4 years ago | (#28565879)

Right, I went into a little detail on another post. When I said, "genetic," I mean genes - DNA. There are four main Nucleic Acid types in DNA: Adenosine, Cytosine, Guanine, and Thymidine. Abbreviated ACGT. So you could store a gene sequence as ACGCCTGCAATC. But in other populations, Asian for example, the same gene is more commonly found as ACTCCTGCAATC. (The third nucleotide is different) Exact string matches won't find matches between different population groups. So they create wild-card letters that represent either G or T -> K. So ACKCCTGCAATC would match either the both of sequences commonly found in western and eastern populations. Data of this nature has no business being in a relational database. For that matter, it doesn't belong in these pseudo databases either.

Re:A time and place for everything (2, Interesting)

Marillion (33728) | more than 4 years ago | (#28565613)

Genetic sequences are long strings alphabetic characters. One of the most common representations is the FASTA [wikipedia.org] which deals with the most common type of nucleotide polymorphisms. You can't use exact string searching to find a match which makes BLOBS and CLOBS useless. That said, the meta-data of genetic data is reasonably structured and does load into relational databases fairly well.

Re:A time and place for everything (0)

Anonymous Coward | more than 4 years ago | (#28565615)

See, I don't think there is ever a good time or place for SQL. Anyone who says so has never had to use it. I like to compare it with JavaScript. It's a language that is difficult to refactor, maintain, and while it's a standard, the standard is so vague that it's useless. Like JavaScript, people are trying to build other languages on top of it to hide its shortcomings -- for javascript you have tools like GWT, and for SQL you have HQL, Linq, etc.

Not to say that there is anything wrong with relational databases, we just lack a good tool to interface with them.

This is what happens (4, Funny)

Anonymous Coward | more than 4 years ago | (#28565145)

When you get a lot of morbidly obese nerds with no life to program for you.

Meanwhile SQL users get laid.

Re:This is what happens (5, Funny)

Anonymous Coward | more than 4 years ago | (#28565323)

It's true. I do a lot of INNER JOINing. Often with multiple tables.

Re:This is what happens (1)

Anonymous Coward | more than 4 years ago | (#28565971)

Jocks get to SELECT * FROM sys.tables, so they always get the tables with the lovely columns and big BLOB's. The ones we can access have a lot of constraints, but also integrity.

Don't Like Traditional Relational Databases? (3, Funny)

ChoboMog (917656) | more than 4 years ago | (#28565149)

Go fork yourself!

Re:Don't Like Traditional Relational Databases? (2, Funny)

CorporateSuit (1319461) | more than 4 years ago | (#28565931)

It seems an idiot has modded you down because they don't understand very basic database expressions.

No need to get mad at Slashdot's mod point system, because, after all, if they outlaw giving mod points to stupids, then only stupid outlaws will have mod points... or something like that.

Tilting at windmills (5, Insightful)

Anonymous Coward | more than 4 years ago | (#28565151)

Seems to be a silly thing to be against. Relational databases and the stuctured query language may not be perfect, but I bet these people could die in their 90's and people will still be using relational dbs and sql.

If you want to tout open or cheap dbs and more lightweight types of storage/db servers, then they might have some points, but being against sql is just plain dumb.

Re:Tilting at windmills (5, Insightful)

Qzukk (229616) | more than 4 years ago | (#28565391)

SQL isn't the only way possible to query relational databases. It's nice and does a really good job for even mildly complex queries and I would not want to ditch it just yet, but seriously... who hasn't had a business need for multiple levels of aggregates (eg averages of sums across multiple groupings, say "average across all customers' total balances") As it is, you end up splitting the logic between the database and the application, or creating a view of the first level of aggregation, then querying against that and hoping that the performance doesn't suck total ass.

Re:Tilting at windmills (2, Insightful)

profplump (309017) | more than 4 years ago | (#28565579)

I agree, there are problems SQL doesn't solve well. But I think it's unlikely that other, better solutions to those problems will also be superior to SQL where it *does* perform well. As such, "no SQL" is probably not the right plan any more than "SQL only".

Re:Tilting at windmills (0)

Anonymous Coward | more than 4 years ago | (#28565587)

As it is, you end up splitting the logic between the database and the application

I always thought that was intentional. The DBMS is for data integrity and access. Business logic belongs in the applications.

RDBMS and application logic (4, Insightful)

gd2shoe (747932) | more than 4 years ago | (#28566053)

That is one view. It's nice and all, but incomplete. The issue is performance.

Any time you're dealing with a large quantity of data, it's always easiest to process or filter where it's located. Transmitting it, processing it, and transmitting back changes adds an unreasonable amount of overhead. Hence, SQL is a "Query" language. In other words, you have the RDBMS do reasonable data processing and filtering of records for you. Your application should only need to specify the operations performed, and should only process data if your computation is particularly unusual. This makes feasible computations that would otherwise be entirely unreasonable. (note that an application working on the same machine generally has the same issue as one working on a separate system. SQL servers present the application with a stream of data - pipe, socket, etc)

My opinion: SQL is horrendous. It's a pain to use, and many basic data transforms cannot be described in that language (at least without some huge, awful, convoluted command == maintenance nightmare).

Re:Tilting at windmills (0)

Anonymous Coward | more than 4 years ago | (#28565735)

(...) who hasn't had a business need for multiple levels of aggregates (eg averages of sums across multiple groupings, say "average across all customers' total balances") (...)

I think it's called partitioning. Recent versions of PostgreSQL have it and i think Oracle has it too.

Re:Tilting at windmills (3, Informative)

Strudelkugel (594414) | more than 4 years ago | (#28565861)

OLAP [wikipedia.org] was designed to answer that type of question. MDX [wikipedia.org] is the language used to perform multi-dimensional queries.

Re:Tilting at windmills (1)

quantum bit (225091) | more than 4 years ago | (#28565959)

who hasn't had a business need for multiple levels of aggregates (eg averages of sums across multiple groupings, say "average across all customers' total balances")

Funny you should mention that. Window functions in the SQL:2003 standard address that need, and there was an article on Slashdot earlier today about PostgreSQL 8.4 being released with support for them. Oracle has for a while now.

Re:Tilting at windmills (1)

CrashandDie (1114135) | more than 4 years ago | (#28565749)

Cheaper and more lightweight than Oracle?

Next thing we're going to hear people wanting a free DBMS...

Protest taxes, not databases (0, Troll)

Anonymous Coward | more than 4 years ago | (#28565167)

Too bad they can't protest the current regimes taxes with as much enthusiasm. At least it would be a protest against something that actually matters.

Flat Earth (3, Insightful)

Seumas (6865) | more than 4 years ago | (#28565199)

I've seen strong reactions from various camps with regard to concern over saying no to SQL. I'm not sure why people freak out over it. First, you have to strike out toward new things if you want to progress the world. Second, SQL hasn't caused people to stop using spreadsheets or Access databases. Third, there are groups that get together to dispute that the earth is round; insisting that it is flat. Or that gray aliens are visiting earth regularly and probing our anuses.

Bring on the next fascinating data technology. SQL will continue to have a major place for many years to come, no matter what happens.

Re:Flat Earth (3, Interesting)

syzler (748241) | more than 4 years ago | (#28565321)

I've seen strong reactions from various camps with regard to concern over saying no to SQL.. Third, there are groups that get together to dispute that the earth is round; insisting that it is flat.

Corporations represented in this group included the likes of Google, Last.fm, Amazon, and Facebook. Hardly the same caliber of people who claim the earth is still flat. I'm inclined to listen to engineers from these companies if they say that an SQL database does not scale well for vast amounts of data.

Re:Flat Earth (5, Insightful)

MightyMartian (840721) | more than 4 years ago | (#28565385)

And yet where the other corporations; the oil companies, the banks, large merchant conglomerates. In IT we seem to have this sort of myopic view that if it isn't an IT company of some kind, it doesn't exist. Google, as compared to the huge companies that use tools like Oracle, is a bit player. I know that's hard for all of us who have sucked at the teat of silicon valley for so long have a hard time dealing with, but a significant amount of data that has nothing to do with social networking and finding pr0n goes on and does use tools like SQL.

Re:Flat Earth (1)

hachete (473378) | more than 4 years ago | (#28565659)

I work for a financial company and if the rest use their bright shiny oracle databases like we do - and I don't think we're atypical - then, no, they have no idea how to use a database. Or build applications. At all. Not a clue.I can't begin to describe the inability, the sheer awesome crap-ness of what they do. The amount of work-arounds that the programmers implement to short-circuit the crap-ness. Really, you have no idea what you're talking about.

Re:Flat Earth (3, Informative)

Vellmont (569020) | more than 4 years ago | (#28565713)

And so you're saying this is all the fault of the relational database, and would all be solved by using some sort of object based database? That's the topic at hand here, not developers dealing with legacy systems patched together.

Re:Flat Earth (1)

kraut (2788) | more than 4 years ago | (#28565673)

Actually, the oil companies almost certainly have huge amounts of non-SQL data; I'm not sure whether seismology data comes in HDF, but it certainly doesn't come in SQL ;) Ditto banks have enormous amounts of non-SQL market data in specialised tick databases. That doesn't stop them from also having other important systems using SQL.

Vice versa, I'm pretty sure that while Google doesn't store its petabytes of web indexing info in a relational database (why on earth would you?), I'm equally sure that its billing, accounting and HR systems use relational databases; why on earth wouldn't they? Same thing applies to Amazon.

Horses for course may be an old saying, but it's still true.

Re:Flat Earth (1)

syzler (748241) | more than 4 years ago | (#28565677)

In IT we seem to have this sort of myopic view that if it isn't an IT company of some kind, it doesn't exist.

I understand that not all companies that maintain large data sets are technology companies. My only point was that when a group of companies known to manage large sets of data say that SQL does not always fit the bill, then I am inclined to listen rather than calling them nuts.

Re:Flat Earth (1, Interesting)

Anonymous Coward | more than 4 years ago | (#28565693)

You sure there's absolutely no difference between the nature of a bank and the nature of a massive search engine?

And how sure are you that a bank's IT staff are on the leading edge of innovative technologies? If anything, they lag behind because it's "safer" than risking the untested new thing.

Try a few of the Post-Relational databases, read up on the CAP Theorem, understand the -nature- of the problem you're talking about, and then come back.

Or I'll save you some time. RDBMS systems focus on Consistency, and trade Availability for it. Your bank's computer can be down for an hour... inconvenient, but acceptable. But they cannot, under ANY circumstances, be incorrect. Period. Google, on the other hand, can handle some slightly incorrect data... but being offline is totally unacceptable.

Amazon's CTO gave a great example. He talked about how a Shopping Cart must have Availability, and slight inconsistencies in the data as that data propagates a network are acceptable. In the end, the data is eventually consistent anyways, and you NEVER want your customer to not be able to add a cart item. The checkout, however, is financial, and heavily needs Consistency. Alternatively, after the order is done, the list of past transactions again can lose consistency a tiny bit (since it's read-mostly anyways) in exchange for always being up.

Hmm... more to the issue than you thought? XD

Re:Flat Earth (1)

fabs64 (657132) | more than 4 years ago | (#28565813)

I look to the oil companies to innovate in drilling technologies.
I look to financial companies to hopefully not innovate too much anywhere :-)
I look to IT companies to innovate in IT.

I dunno about you, but I've seen an incredible amount of money spent in the last 10 years or so attempting to change those massive relational databases into formats that can be reported on, as well as huge amounts of energy put into moving from one relational schema to another.

Pretending the big conglomerates present the best answer just because they're big is a recipe for non-movement.

Re:Flat Earth (1)

Vellmont (569020) | more than 4 years ago | (#28565645)

I'm inclined to listen to engineers from these companies if they say that an SQL database does not scale well for vast amounts of data.

This statement, taken as a whole is pure nonsense. "Databases" scale quite well for "vast" amounts of data. There's retailers that store millions of transactions a day on relational databases that would be out of business if they didn't.

If I had to guess, I'd say that relational databases might not be a great solution for a quickly evolving web company with possibly constantly changing data structures and new requirements being added. Doing all that glue code sucks, and patchwork solutions like Hibernate aren't much better (and IMO worse).

It shouldn't be surprising that a tool developed for one purpose isn't well suited to all purposes. Creating some kind of "movement" out of it is about as stupid as being against hammers in favor of screwdrivers. Down with hammers! Yeah screwdrivers!!

Re:Flat Earth (1)

leenks (906881) | more than 4 years ago | (#28565969)

There's retailers that store millions of transactions a day on relational databases that would be out of business if they didn't.

He said vast...

Re:Flat Earth (2, Interesting)

MightyMartian (840721) | more than 4 years ago | (#28565345)

The whole thing is just reactionary mumbo-jumbo. There are kinds of data that relational databases are fantastic for, and kinds of data they're not, and sometimes none of it is exactly perfect. SQL is actually a pretty damned good, single-purpose language. It's not hard to learn, and once you learn it, the differences between RDBMS implementations becomes a little like Javascript, just something you have to put up with, not that a lot of people actually have to worry all that much about writing fully-portable SQL queries.

Re:Flat Earth (1)

moderatorrater (1095745) | more than 4 years ago | (#28565585)

Agreed. SQL is a generalized solution that works well for a lot of different things and works extremely well for a subset of those thing. For other applications (like indexing the internet), more specialized solutions are going to kick its ass. It's the same way as any programming you do: the easier and more general the tool, the more you sacrifice for it in terms of speed, efficiency, scalability, whatever.

Re:Flat Earth (2, Informative)

Threni (635302) | more than 4 years ago | (#28566045)

> Second, SQL hasn't caused people to stop using spreadsheets or Access databases

If if weren't for SQL there wouldn't be any Access databases...

SQL.... (-1)

Darkness404 (1287218) | more than 4 years ago | (#28565227)

SQL has some great uses that its meant for. However, like all OSS tools it can get to where it is used enough for a certain purpose people will try to reuse it to varying success. SQL is great for financial data, however for some of the places it is in, SQL just doesn't do the job well.

Re:SQL.... (1)

rjstanford (69735) | more than 4 years ago | (#28565997)

SQL is an OSS tool?

Why'd I pay so much to Informix around 1990 then I wonder?

For that matter... SQL is a thing? I always thought it was a spec and a language :)

I've been using text files and Excel (0)

Anonymous Coward | more than 4 years ago | (#28565235)

I keep track of all my car bills and cat names with Notepad and Excel. I don't know why anyone would need anything more than that. If I need to sort my text file, I go to this thing called the command line and use the "SORT" command. If I need to find something in my text file, likewise, I use the command line and the "FIND" command


RDB (1)

MichaelSmith (789609) | more than 4 years ago | (#28565265)

I thought DEC RDB was a pretty good query language. I never got into SQL as a result. I am glad people are thinking about alternatives.

Next Up... (1, Funny)

grepya (67436) | more than 4 years ago | (#28565269)

...say no to the tyranny of... er.. English. Let's stick with the combination of grunts, squeals, crying and gesturing that has proven so effective for toddlers all over the globe for thousands of years. And if we surrendered the traditional languages that we are so irrationally attached to, who knows what revolutionary new communication scheme the next-generation kids will come up with.


The problem is performance not SQL (3, Interesting)

presidenteloco (659168) | more than 4 years ago | (#28565289)

The problem is the performance of transactions and persistence and distribution of data techniques, not
whether we are using a logic-like STRUCTURED QUERY LANGUAGE to ask for data matching certain conditions.

The latter is still, and will continue to be, very useful.

It's just that now that we can assume local clusters and WANs worth of co-operating data stores, there
are probably better, more performant ways of implementing persistence, replication, distribution of data
than traditional RDBMS implementations.

The two concerns: The logical model of how we QUERY for data (or combine it in bulk), which is the core of SQL,
and how we persist it and retrieve it quickly, now have more options for being separated.

Re:The problem is performance not SQL (1)

Crias (1388217) | more than 4 years ago | (#28565763)

The problem, I think, runs a little deeper than all that though.

SQL is unfortunately tied fairly tightly to an RDBMS implementation. All those "join" statements, various ways of expressing "constraints" such as "foreign keys" - all are considered "integral" parts of SQL.

No, you don't have to provide them. A Post-Relational like Amazon SimpleDB could, theoretically, use SQL for querying and just trim back the feature-set.

But perhaps it'd be wiser to look at a query language more specific to the Post-Relational model?

Perhaps SQL stopped being "SQL" and started being "Structured Relational Query Language". *shrugs*

Re:The problem is performance not SQL (4, Insightful)

oGMo (379) | more than 4 years ago | (#28565955)

It's just that now that we can assume local clusters and WANs worth of co-operating data stores, there are probably better, more performant ways of implementing persistence, replication, distribution of data than traditional RDBMS implementations.

You can also assume magical fairy dust and free energy, but that doesn't make it so. You can ask if there are better ways, but you can't assume it, and in the end you will find there is no magic.

Clusters and replication are NOT NEW. Not even remotely new. There is, in fact, nothing new architecturally at all that would indicate some new capability that hasn't already been repeatedly analyzed and tried. That doesn't mean you can't tweak something for a situation, or that you need a giant Oracle database for everything, but "the web" and "cheap hardware" change the equation by precisely nothing.

What has changed the equation is cheap, unimportant data, which covers the majority of the web. "Real" applications, where data integrity is important (like say, your bank account), and immediate accuracy guaranteed, require the main thing you use a database for: data integrity. Your facebook page, your google search, that blog entry, or some video on youtube: these don't matter. If it's a little slow, or doesn't update immediately, or you get an error, no one is losing money. No one cares.

In essence, if a reliable database isn't important for your app, your app isn't really handling important data. This may be fine; in the mainstream, there's a lot of noncritical stuff. But this doesn't make databases unimportant.

Not mutually exclusive (3, Insightful)

JobyOne (1578377) | more than 4 years ago | (#28565357)

It's pretty easy to say "yes" to alternatives without saying "no" to SQL.

Just because a crowbar can pull out a stubborn nail better doesn't mean they should replace all the hammers. Then what would we put nails in with? Different tools for different jobs.

Nailguns (1)

tehdaemon (753808) | more than 4 years ago | (#28565505)

Most nails are put in with nailguns. Hammers these days are mostly used for demolition of various sorts, including pulling nails. T

RDBs are good, but SQL is horrible (0)

Cyberax (705495) | more than 4 years ago | (#28565369)

The idea of RDB is cool, relational algebra is quite neat. But SQL itself is horrible.

I'd like to have a language which will allow me to access intermediate tuples cleanly and return hierarchic structures. For example, if I want to fetch all customers and all their bids in one query I have to use inner join. And that results in LARGE number of rows (Cartesian product of customers and their bids).

Also, I'd like to see stuff which is not easily expressed in relational algebra, like running sums or grouping on a computed field.

Re:RDBs are good, but SQL is horrible (1)

larry bagina (561269) | more than 4 years ago | (#28565447)

If you want hierarchical data, you could use a hierarchic database.

Re:RDBs are good, but SQL is horrible (1)

Cyberax (705495) | more than 4 years ago | (#28565519)

I do not want hierarchical data storage. I want to create trees from relational data.

I don't see anything that prevents me from doing this in theory. In fact, ANSI SQL already has support for hierarchic queries (which makes it Turing-complete, BTW).

Re:RDBs are good, but SQL is horrible (1)

Marcos Eliziario (969923) | more than 4 years ago | (#28565827)

The way to store tree like structures on a relational database is using nested sets, not pointer-like ids.
The main current backslash against databases is that most developers don't have a clue about set theory, relational algebra, let aside the inner workings of a concurrent database system.
Many of the problems solved by RDBMS are going to have to be solved again by those new tools that are promised to replace RDBMS.

Those who ignore history, are condemned to repeat it.

Re:RDBs are good, but SQL is horrible (0)

Anonymous Coward | more than 4 years ago | (#28565497)

The idea of RDB is cool, relational algebra is quite neat. But SQL itself is horrible.

I'd like to have a language which will allow me to access intermediate tuples cleanly and return hierarchic structures. For example, if I want to fetch all customers and all their bids in one query I have to use inner join. And that results in LARGE number of rows (Cartesian product of customers and their bids).

Also, I'd like to see stuff which is not easily expressed in relational algebra, like running sums or grouping on a computed field.

If you're getting a Cartesian product from a query like that, either the DB architect was a moron or (most likely) you need to learn about the WHERE clause.

Re:RDBs are good, but SQL is horrible (1, Informative)

Anonymous Coward | more than 4 years ago | (#28565539)

select * from customers c, bids b where c.customer_id=b.fk_customer_id order by c.customer_id, b.bid_date

Seems pretty simple. What's wrong with an inner join? Your getting exactly the number of rows that you need to answer your question, no more no less.

A cartesian product would be more like: select * from customers c, bids b. But that's not what you want.

As for hierarchical structures, Oracle db has ways to do this, although I admit the syntax isn't that straight forward: http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/queries003.htm

Re:RDBs are good, but SQL is horrible (1)

godrik (1287354) | more than 4 years ago | (#28565593)

For example, if I want to fetch all customers and all their bids in one query I have to use inner join. And that results in LARGE number of rows (Cartesian product of customers and their bids).

I am not sure I get your point. If you do an inner join it means you want all the tuple < player,bid > that makes sense. If there is a lot of them, well, there is a lot of them, there is nothing to do about it. If you complain about each player being repeted on several bid (since they bid more than once). It should not be a problem, as long as you stay in the RBMS, this should not incur any overhead. When you read them, you can just compress them on the fly.

If you really want to remove those "extra" player values, why do you want to have a single query ? You can just make a query for each player.

Also, I'd like to see stuff which is not easily expressed in relational algebra, like running sums or grouping on a computed field.

technically, they cannot be expressed in relationnal algebra, you have to add non algebraic operator to do that. SUM and GROUP BY in SQL are not part of relationnal algrebra. The problem with those operators is that it is difficult to do any optimization on them. Howver, the user may still want to have them. I would also be interested in a language that can express such things efficiently

Re:RDBs are good, but SQL is horrible (1)

jyx (454866) | more than 4 years ago | (#28565641)

For example, if I want to fetch all customers and all their bids in one query I have to use inner join. And that results in LARGE number of rows (Cartesian product of customers and their bids)

You might want to explain yourself a bit more there, if I want all customers and all their bids I would expect a LARGE number of rows. What magic algorithm is out there that will give you all your data, but at the same time make it less than what it is?

Or do you want just one row of customer data and then all there orders under that? Good luck getting your admin staff creating reports off that spreadsheet.

I think what you are interested in reporting tools - they do the things you ask for often in a nice drag and droppy way - but you still need to get the data to those reports and I haven't found anything better for that job than SQL (yet).

Data is hard work, eventually any solution to querying databases is going to be as complicated as SQL because there is a infinite number of ways people will eventually want to look at it.

Re:RDBs are good, but SQL is horrible (1)

Cyberax (705495) | more than 4 years ago | (#28565761)

Let's suppose that we have 1000 customers and each customer has 100 bids, and each bid has 5 sub-items.

If we retrieve all of them using inner joins - we'll have to transmit and read 100*1000*5 rows. Quite a large number.

If we first fetch customers and then fetch their bids (using a second query) and then sub-items we'll have to read 1000+1000*100+1000*100*5 rows. However, each time we fetch only relevant data which can result in huge savings of bandwidth (some database protocols are naive enough to transmit full rows).

Re:RDBs are good, but SQL is horrible (1)

caerwyn (38056) | more than 4 years ago | (#28565897)

I'm not sure what your point here is. People do that sort of sequential querying all the time- each query simply asks for the subset of data of interest. What, exactly, are you unable to do in SQL that you want to be able to do?

Re:RDBs are good, but SQL is horrible (1)

Cyberax (705495) | more than 4 years ago | (#28565993)

Why do I need to do several queries? It would be nice to be able to do this in a single query.

Re:RDBs are good, but SQL is horrible (1)

caerwyn (38056) | more than 4 years ago | (#28565919)

Also, I'd like to see stuff which is not easily expressed in relational algebra, like running sums or grouping on a computed field.

Grouping on a computed field is quite easy, so if you're waiting for SQL to support it... you've been waiting too long, it already does.

As for running sums, that's the sort of thing that Oracle already has and just went into the Postgres 8.4 release that was on slashdot the other day.

Your SQL complaints are a bit out of date. :)

Cartesion Product? (2, Insightful)

gbutler69 (910166) | more than 4 years ago | (#28565941)

Epic Fail. You're wrong. It in now way results in a "Cartesion Product". That would be a "Cross Join", not an "inner join". From my experience, people who complain about SQL and relational database, are, for the most part, ignorant. They really don't even understand what they are saying or what they are talking about. I've seen so much abuse and misunderstanding of relational data and SQL in my career, that I just have to laugh at this sort of thing.

Yeah, so why are they better? (5, Insightful)

Anonymous Coward | more than 4 years ago | (#28565371)

If I was to read the article, I bet somewhere someone would be wittering on about Key Value Datastores.

The brainchild of a generation brought up on high level collections, they learn one (in this case Map) and apply it to everything.

Sadly SQL, and RDBMS, works for most people. It maps object data well (oh whaaaa, i have to do foreign keys - GROW SOME FUCKING BALLS YOU LAZY GRADUATE!) and it is well understood. And with abstractions like LINQ to query them, even the lazy dumb Windows .NET programmer doesn't have to strain their brain to learn SQL.

And when you have terabytes of specific unique data, you clearly should go away to work out how best to store it. Even a RDBMS/SQL solution is too generic for all problems.

Re:Yeah, so why are they better? (1)

Hurricane78 (562437) | more than 4 years ago | (#28565577)

Well, file systems, databases, object inheritance trees, etc, they all are based on the incomplete concept of hierarchical trees and maps. While in reality, everything can be generalized trough graphs. Generic graphs. Of course everyone got its own poor fix for this. File systems have links, databases have foreign keys, and OO languages have interfaces or multiple inheritance. It's a mess, because it is an afterthought.

I stopped using all those approximations of data structures, and use my own high-performance ontologic graph library for everything that I would use a treelike structure for. I also can stick it on top of a file system or RDBMS, and even have a UI element to browse it. I do not look back. :)

Re:Yeah, so why are they better? (1)

schnablebg (678930) | more than 4 years ago | (#28565929)

You use your own hand rolled libraries for standard data structures? I really hope I don't end up on a project with you or inheriting one you've worked on.

Re:Yeah, so why are they better? (1)

spiffmastercow (1001386) | more than 4 years ago | (#28565703)

I'd say LINQ is significantly harder to use than SQL most of the time. The only real exception is when you need to convert the value of a subquery into a comma-delimited list

Re:Yeah, so why are they better? (3, Insightful)

fabs64 (657132) | more than 4 years ago | (#28565869)

Saying RDMS's map object data well is a bit of a stretch, they map relational data well and that's it.

http://www.codinghorror.com/blog/archives/000621.html [codinghorror.com] for some good background on the problems.

For me using an RDMS as the persistence layer for an object-oriented application has ALWAYS felt like a bit of a kludge. Like we're using it just because it's what we have, rather than the best tool for the job.

What's the benefit exactly? (3, Insightful)

SendBot (29932) | more than 4 years ago | (#28565389)

I'm not seeing anything that offers a real advantage over using advanced features like one finds in postgres combined with memcached. Some of my program likes to think of its data as a structured object while other parts like seeing that data as rows in a table (they even link up to other tables through foreign keys!).

Re:What's the benefit exactly? (1)

phantomfive (622387) | more than 4 years ago | (#28565611)

The main problem with relational databases is that they use a completely different storage scheme than your program does. Databases are organized into tables, rows and columns, but programs are organized into random access variables, structs, and classes. Thus, to use data from a relational database in your program, you need to have a conversion layer that converts from tables to random access, and back. These guys are saying it would be nice if we had a way to store this stuff that didn't require a conversion layer.

And I agree. However I also think it would be nice if I could keep all my data in RAM all the time, for easy access. It's just not practical. If all your queries are straightforward selects on ID, then there really is no great reason to use a full database. But once you start doing more complex joins and searches, a database is, while not always convenient, still more convenient than the alternatives.

Re:What's the benefit exactly? (1)

hibiki_r (649814) | more than 4 years ago | (#28565707)

A conversion layer is wasteful when there's only one way to look at your data. In that case, key value pairs can perform better, no question.

The problem lies in situations where you need to look at the data in 5 different ways. or 50. Then, a single object model for your data is a whole lot less practical than having a conversion layer, and have the data in a very flexible format, like a relational model.

Re:What's the benefit exactly? (1)

SendBot (29932) | more than 4 years ago | (#28566039)

Well, I have a conversion layer to create the object my program uses, but I can't think of a need to convert it back. All the things that make it what it is are a result of all the little things that interact with the db. Using triggers, it knows when to update parts of itself. The parts that interact with the db often don't care about the object, even if it's being used as in input to those parts.

When I DO need to care about the object, replicate it, or maintain persistence, then I use...


memcached. (I rtfa'd and even amazon's thing said it was a basically a key -> value system)

If I did this exclusively with my object instead of sql, I don't know right away how I would do all my searches and processing because everything is so hugely related... and I think the whole point of this nosql thing is that it's a non-relational alternative for when things are pretty basic, but comprise enormous data size.

Except in the in end (1)

xednieht (1117791) | more than 4 years ago | (#28565475)

The Patriots themselves levied their own heavy taxes emulating those against which they had originally rebelled

In the end it's all just 1's and 0's.

What else is there to use besides SQL (1)

Orion Blastar (457579) | more than 4 years ago | (#28565485)

go back to flat files aka DAT files.

Use the old DBase III standard DBF files?

Use the old Lotus 123 WK1 files?

Use MS-Office MS-Access MS-Excel etc files?

Use comma separated values files?

SQL set a standard for relational databases, a structured query language that almost any database can use and then build extensions to it.

Will the Post-SQL age begin, and will it be object oriented and a fifth generation language?

Re:What else is there to use besides SQL (1)

godrik (1287354) | more than 4 years ago | (#28565635)

go back to flat files aka DAT files.

Technically, that is what they do. Basically, they just say that they do not need classical RDBMS to do their job. I agree with them that RDBMS makes poor implementation of big dictionnaries. :)

How about saying yes to the alternative (4, Insightful)

syousef (465911) | more than 4 years ago | (#28565535)

Saying no to SQL and relational databases is just fine if you've got something better to replace it with. However I know of no such thing. The reason they're popular is that they are so powerful for data storage. If something better came along you wouldn't even need to say no to SQL. You'd just say yes to the newer better rival.

XML / XPATH / XQUERY / XSLT / Xhausted (0)

Anonymous Coward | more than 4 years ago | (#28565555)

SQL can suck. The alternatives the PHB might choose aren't necessarily better. Be careful what you wish for.

Whatever (0)

Anonymous Coward | more than 4 years ago | (#28565557)

Like unix being dead - someone else thinks SQL is dead and worthless.

I disagree, there is no ONE solution. SQL works great for many types of data access. But an object based db can be great for other types.

SQL is dead. Long live SQL. :-)

Misses the point (1)

kc8jhs (746030) | more than 4 years ago | (#28565583)

There are plenty of ways to store data inexpensively in a RDBMS. There are plenty of GPL and low cost RDBMS available.

The real issue is that the more and more we move into complex data structures and we push the limits of what an ORM can do with those simple, inexpensive RDBMS, the more problems we run into trying to map our objects into rows in tables.

Here [appspot.com] is one of the more interesting solutions that I've seen to the problem, but it only work over relatively simplistic data where managing indexes by hand is ok, and it's okay for the indexes to be incomplete at any given moment. Ironically, that gives them more availability than trying to force MySQL to do indexes. But it really depends on the data and needs.

SQL is not a database (5, Insightful)

j. andrew rogers (774820) | more than 4 years ago | (#28565627)

SQL is not a database, it is a standard interface to a feature set commonly associated with relational models. Before everyone standardized on SQL, there were other relational query languages. The "No" part of "NoSQL" refers to the fact that some basic elements of relational implementations cannot be usefully expressed using a much simpler distributed hash table model.

All the "NoSQL" does is eliminate all the parts of traditional relational databases that do no scale -- discarding the bottleneck rather than fixing it. These are things like joins and external indexing. Unfortunately, discarding those things means you discard a lot of very important functionality as a practical matter, notably the ability to do fast, complex analytics. Adopting the NoSQL architecture runs contrary to the trend toward more real-time, contextual analytical processing. There are a great many analytical applications that are not amenable to batch-mode pattern-matching, and the NoSQL model is a lot less applicable than I think some people want to acknowledge. In its domain, it is a great tool but it has many, many prohibitive limits. We are essentially trading power for scale.

That said, do not take this as an endorsement of traditional SQL relational databases either, as they have a number of serious limitations themselves. As just mentioned, a number of the core analytical operations those models support are based on algorithms that scale poorly. The SQL language itself has mediocre support for many abstract data types (e.g. spatial) and data models (e.g. graph), which in part reflects the inadequacies of the assumed underlying database algorithms (e.g. B-trees) that are implicit in SQL. The inability to efficiently do event-driven/real-time applications is also more a reflection of the access methods used in databases than any intrinsic weakness in SQL; SQL may be clunky for that purpose, but that is not the real limiter.

A truly revolutionary deviation from SQL would usefully implement a superset of the features SQL supports, not take them away. Of course, we would need access methods more capable than hash tables and B-trees to useful implement those features, which is a lot more work than discarding features that scale poorly. NoSQL is a stopgap technical measure for that small subset of applications where the serious tradeoffs are acceptable.

Pros & Cons of non-relational solutions (5, Interesting)

kpharmer (452893) | more than 4 years ago | (#28565701)

Note that most of these solutions come from the interwebs, social networks, etc. And it isn't so much anti-sql as it is anti-relational database (sql != rdb).

The basic premise is that we need different solutions that: can scale very high for very narrowly scoped reads & writes, don't need to perform ranged queries / reporting /etc, and don't need ACID compliance. And that may be the case. Sites like slashdot, facebook, reddit, digg, etc don't need the data quality that ebay needs.

On the other hand, ebay achieves scalability AND data quality with relational databases. And when I've worked with architectures that scale massively and avoid the relational trap for better solutions - they inevitably later regret the lack of data quality and complete inability to actually get trends and analysis of their data. It *always* goes like this:
    Me: So, is this thing (msg type, etc) increasing?
    Developer: No idea.
    Me: Ok, so lets find out.
    Developer: How?
    Me: I don't know - typical approach - lets query the database.
    Developer: It'll take four+ hours to write & test that query and then days to run. And when it's done we might find that we wrote the query wrong.
    Me: What?!?
    Developer: We had to do it this way, you can't report on 10TB databases anyhow
    Me: What?!? Are you on crack? there are dozens of *100TB* relational databases out there that people are reporting on
    Developer: well, we probably don't need to know what that trend is anyhow
    Me: I'm outta here

Data out-lives applications (5, Insightful)

4to6Offshore (594235) | more than 4 years ago | (#28565865)

First: my mantra: Data belongs to the organization, not the application... if the app fails and data is accessible then we all go on - if the data fails or is locked away - what was the point of the app again?

In a SQL database then data is understood by the organisation, DBAs and data architects. If left to app developers taking an app-centric approach to data... I get nervous quickly.

So long as the data is just as definable and accessible as current SQL databases then all good - give me an app with some odd-ball storage then it is bye-bye.

