Beta

Slashdot: News for Nerds

×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Is the One-Size-Fits-All Database Dead?

kdawson posted more than 7 years ago | from the specialized-and-optimized dept.

Databases 208

jlbrown writes "In a new benchmarking paper, MIT professor Mike Stonebraker and colleagues demonstrate that specialized databases can have dramatic performance advantages over traditional databases (PDF) in four areas: text processing, data warehousing, stream processing, and scientific and intelligence applications. The advantage can be a factor of 10 or higher. The paper includes some interesting 'apples to apples' performance comparisons between commercial implementations of specialized architectures and relational databases in two areas: data warehousing and stream processing." From the paper: "A single code line will succeed whenever the intended customer base is reasonably uniform in their feature and query requirements. One can easily argue this uniformity for business data processing. However, in the last quarter century, a collection of new markets with new requirements has arisen. In addition, the relentless advance of technology has a tendency to change the optimization tactics from time to time."

cancel ×

208 comments

Perl & CSV (1, Funny)

baldass_newbie (136609) | more than 7 years ago | (#17534108)

How did Perl & CSV fare?

Re:Perl & CSV (5, Funny)

Ingolfke (515826) | more than 7 years ago | (#17534202)

How did Perl & CSV fare?

It failed the "relational" part of the test. But it failed very quickly.

Re:Perl & CSV (1)

Tablizer (95088) | more than 7 years ago | (#17534602)

It would be interesting to see how it does on joins that don't fit in RAM.
     

Re:Perl & CSV (4, Funny)

patio11 (857072) | more than 7 years ago | (#17534854)

It failed the "relational" part of the test. But it failed very quickly.

Yep. On the plus side, the Perl hacker who put it together only wasted the time it took to write one line. Granted, the line was 103,954 characters long. He considered breaking it up into two lines to improve readability but ultimately rejected the notion -- anyone not capable of reading the program clearly had no business messing with it anyhow. (Quick question aside from the snark: since Perl has associative arrays can't it emulate a relational database? It was my understanding that after you've got associative arrays you can get to any other conceivable data structure... assuming you're willing to take the performance hit.)

Re:Perl & CSV (3, Interesting)

nuzak (959558) | more than 7 years ago | (#17534930)

> It was my understanding that after you've got associative arrays you can get to any other conceivable data structure

Once you have lambda you can get to any conceivable data structure. The question is, do you really want to?

sub Y (&) { my $le=shift; return &{sub {&{sub {my $f=shift; &$f($f)}}(sub {my $f=shift; &$le(sub {&{&$f($f)}(@_)})});}}}

Re:Perl & CSV (1)

Anpheus (908711) | more than 7 years ago | (#17535882)

Could a perl hacker explain the code for those of us who can't read crazy?

(No offense, I'm just not a fan of Perl syntax. If you can make working programs out of it, then it's my loss by not being able to debug them if I ever have to use Perl.)

Re:Perl & CSV (4, Interesting)

patio11 (857072) | more than 7 years ago | (#17536216)

I think it implements a Y combinator. Then again, it could just print out "Just another perl hacker". But I'm guessing on the Y combinator. Lets break it down so its readable:

sub Y (&) {
    my $le=shift;
    return &{
        sub { ## SUB_A
          &{
              sub { ## SUB_B
                  my $f=shift;
                  &$f($f)
              }
          } ##Close SUB_A's block
          (sub { ## SUB_C
              my $f=shift;
              &$le(sub { ##SUB_D
                  &{
                      &$f($f)
                  }
                  (@_)
              }## END SUB_D
                  )} ##END SUB_C
                  ); ##End the block enclosing SUB_C
          } ## END SUB_A
    } ## Close the return line
} ##Close sub Y

Y can have any number of parameters you want (this is sort of a "welcome to Perl, n00b, hope you enjoy your stay" bit of pain). The first line of the program assigns le to the first parameter and pops that one off the list. That & used in the next line passes the rest of the list to the function he's about to declare. So we're going to be returning the output of that function evaluated on the remaining argument list. Clear so far?

OK, moving on to SUB_A. We again use the & to pass the list of arguments through to ... another block. This one actually makes sense if you look at it -- take the first argument from the list, evaluate it as a function on itself. We're assuming that is going to return a function. Why? Because that opening parent means we have arguments, such as they are, coming to the function.

OK, unwrapping the arguments. There is only one argument -- a block of code encompassing SUB_C. (Wasted 15 minutes figuring that out. Thats what I get for doing this in Notepad instead of an IDE that would auto-indent for me. Friends don't let friends read Perl code.)

By now, bits and pieces of this are starting to look almost easy, if no closer to actual readable computer code. We reuse the function we popped from the list of arguments earlier, and we use the same trick to get a second function off of the argument list. We then apply that function to itself, assume the result is a function, and then run that function on the rest of the argument list. Then we pop that up the call stack and we're, blissfully, done.

So, now that we understand WTF this code is doing, how do we know its the Y combinator? Well, we've essentially got a bunch of arguments (f, x, whatever). We ended up doing LAMBDA(f,(LAMBDA(x,f (x x)),(LAMBDA(x,f (x x)))) . Which, since I took a compiler class once and have the nightmares to prove it, is the Y combinator.

Now you want to know the REALLY warped thing about this? I program Perl for a living (under protest!), I knew the answer going in (Googled the code), and I have an expensive theoretical CS education which includes all of the concepts trotted out here... and the Perl syntax STILL made me bloody swim through WTF was going on.

I. Hate. Perl.

And the reason I hate Perl, more than the fact that the language makes it *possible* to have monstrosities like that one-liner, is that the community which surrounds the language actively encourages them. Its considered *clever* and a mark of great skill that you can strip out all the code that actually explains WTF your code is doing and be left with the perfectly compressed version. This is modeled as good perl style to folks just starting with the language, the Llama book has lots of code which looks like that, and code samples you find will look like it too. It appears that the community largely does not teach perl like it is a language that needs to be read.

Want a good example of a counterpoint? Download the code for POPFile sometime -- it proves you can make Perl readable by deciding on One Way To Do It, using descriptive variable names and comments, and avoiding use of the syntactic sugar that is murder to decode. But this is just anathema to people who love Perl, so when Perl code comes across my desk it will look a heck of a lot more like the grandparent's quoted snippet than like POPFile.

Write-only languages (4, Insightful)

mysticgoat (582871) | more than 7 years ago | (#17536252)

As any English teacher will tell you, any language that will support great poetry and prose will also make it possible to write the most gawdawful cr*p. Perl bestows great powers, but the perl user must temper his cleverness with wisdom if he is to truly master his craft.

However in this specific case Google reveals that

## The Y Combinator
sub Y (&) {
my $le=shift;
return &{
sub {
&{sub { my $f=shift; &$f($f) } }
(sub { my $f=shift; &$le(sub { &{&$f($f)}(@_) }) });
}
}
}
was simply "borrowed" from y-combinator.pl [synthcode.com] . This is an instance of Perl being used in a self-referential manner to add a new capability (the Y combinator allows recursion of anonymous subroutines (why anyone would bother to do such an arcane thing comes back to the English teacher's remarks)). Self-referential statements are always difficult to understand because, well, they just are that way (including this one).

Re:Perl & CSV (1)

mysticgoat (582871) | more than 7 years ago | (#17536016)

since Perl has associative arrays can't it emulate a relational database?

I've built actual relational databases to run in memory using Perl's hashes. This was a good way of doing some prototyping for user feedback before telling the MUMPS coders what it was exactly that we wanted them to do. (Their titles were "Programmer/Analyst", but neither one had any interest or skill in analyzing clinical needs: they were both happy to be just codemonkeys.) Performance with Perl was pretty snazzy but my constant worry was that some clever user would find a repeatable way to thrash the disk cache and make the project look bad— but that never happened. Persistence was with modified csv files (using the pipe char as the delimiter since it never occurred in the data sets). The memory resident tables were loaded on startup and written back to disk on shutdown, and we didn't worry about losing data in crashes since these were prototypes, not live. We could open up the disk files between runs with Excel, and use it to do some sanity checking, or introduce strange conditions. The biggest problem was cajoling the doctors and nurses to drop by and play with the prototype, and then try to get useful feedback out of some of them.

"In the last quarter century..." (2, Funny)

AndroidCat (229562) | more than 7 years ago | (#17534110)

Well it's about time we had some change around here!

Stonebreaker has a vested interested in Stream Dbs (2, Informative)

Anonymous Coward | more than 7 years ago | (#17534732)

He's the CTO of Streambase, so he's not just a "neutral" academic.

http://www.streambase.com/about/management.php [streambase.com]

Was there ever a one-size-fits-all database? (1)

Ant P. (974313) | more than 7 years ago | (#17534118)

The closest thing I can think of that fits that description is Postgres.

Re:Was there ever a one-size-fits-all database? (2, Funny)

Architect_sasyr (938685) | more than 7 years ago | (#17534740)

There's a difference between fitting and being forced to fit into something ;)

Re:Was there ever a one-size-fits-all anything? (1)

EmbeddedJanitor (597831) | more than 7 years ago | (#17534790)

Languages, OSs, file systems, databases, microprocessors, cars, VCRs, diskdrives, pizzas, .... none of these are one-size-fits-all.

There never has been, and probably never will be. A small embedded database will never be replaced by a fat-asses SQL database any more than Linux will ever find aplace in the really bottom-end microcontroller systems.

Re:Was there ever a one-size-fits-all anything? (3, Funny)

Fred_A (10934) | more than 7 years ago | (#17536310)

Languages, OSs, file systems, databases, microprocessors, cars, VCRs, diskdrives, pizzas, .... none of these are one-size-fits-all.

There never has been, and probably never will be.
Aren't most condoms sold in one size fits all ?

Maybe they could make rubber databases ?

(or it's a bit of a stretch)

Re:Was there ever a one-size-fits-all anything? (0, Flamebait)

EmbeddedJanitor (597831) | more than 7 years ago | (#17536538)

Speak for yourself weener-boy!

Re:Was there ever a one-size-fits-all database? (1)

mwanaheri (933794) | more than 7 years ago | (#17536186)

Well, you might say that an xxl-sized shirt fits all, but only if you say that if you can get in, it fits you. For most of my s-uses, postgres offers far more than I need (still, postgres is my default).

Re:Was there ever a one-size-fits-all database? (1)

trACE666 (731643) | more than 7 years ago | (#17536228)

If I am not mistaken, both Oracle and IBM use the same code base for all the versions of their RDBMS products.

Performance benefits (-1, Troll)

Anonymous Coward | more than 7 years ago | (#17534122)

A factor of 10 or more generally isn't that significant.

Noticed how roll your own is faster? (2, Interesting)

BillGatesLoveChild (1046184) | more than 7 years ago | (#17534126)

Have you noticed when you code your own routines for manipulating data (in effect, your own application specific database) you can produce stuff that is very, very fast? In the good old days of the Internet Bubble 1.0 I took an application specific database like this (originally for a record store) and generalized it into a generic database capable of handling all sorts of data. But every change I made to make the code more general also made it less efficient. The end result wasn't bad by any means: we solid it as an eCommerce database to a number of solutions, but as far as the original record store database went, the original version was by far the best. Yes. I *know* generic databases with fantastic optimization engines designed by database experts should be faster, but noticed how much time you have to spend with the likes of Oracle or MySQL trying to get it to do what to you is an exceedingly obvious way of doing something?

Re:Noticed how roll your own is faster? (4, Interesting)

smilindog2000 (907665) | more than 7 years ago | (#17534244)

I write all my databases with the fairly generic DataDraw database generator. The resulting C code is faster that if you wrote it manually using pointers to C structures (really). http:datadraw.sourceforge.net [sourceforge.net] . Its generic, and faster than anything EVER.

Re:Noticed how roll your own is faster? (5, Informative)

Anonymous Coward | more than 7 years ago | (#17534430)

Looks interesting, will check it out. Working URL for the lazy: http://datadraw.sourceforge.net/ [sourceforge.net]

Re:Noticed how roll your own is faster? (1)

BillGatesLoveChild (1046184) | more than 7 years ago | (#17535148)

That links is broken. Is actually http://datadraw.sourceforge.net/ [sourceforge.net] but thanks SmilingDog. Checking it out now. Looks interesting.

Re:Noticed how roll your own is faster? (1)

trimbo (127919) | more than 7 years ago | (#17535154)

Looks interesting. Would be nice if it worked with C++ clases. Has anyone tried creating a C++ app around this?

Re:Noticed how roll your own is faster? (1)

The Real Nem (793299) | more than 7 years ago | (#17535246)

It's hard to take any project seriously (professional or not) when it's web page has such glaring mistakes as random letter b's in its source (clearly visible in the all the browsers I've tried), more white space than anyone can reasonably shake a stick at and poor graphics (I'm looking at the rounded corners of the main content).

As interesting as it sounds, it makes me wonder what could be wrong with the code...

Taken seriously (3, Funny)

matria (157464) | more than 7 years ago | (#17535698)

Almost as bad as trying to take seriously someone who dosn't know his it's from his its, right?

Re:Taken seriously (0)

Anonymous Coward | more than 7 years ago | (#17536300)

considering he only got 1 out of 3 wrong I would say that puts him head and shoulders above 90% of /. poster's.

Prediction... (4, Insightful)

Ingolfke (515826) | more than 7 years ago | (#17534140)

1) More and more specialized databases will begin cropping up.
2) Mainstream database systems will modularize their engines so they can be optimized for different applications and they can incorporate the benefits of the specialized databases while still maintaining a single uniform database management system.
3) Someone will write a paper about how we've gone from specialized to monolithic...
4) Something else will trigger specialization... (repeat)

Dvorak if you steal this one from me I'm going to stop reading your writing... oh wait.

Re:Prediction... (3, Interesting)

Tablizer (95088) | more than 7 years ago | (#17534506)

2) Mainstream database systems will modularize their engines so they can be optimized for different applications and they can incorporate the benefits of the specialized databases while still maintaining a single uniform database management system.

I agree with this prediction. Database interfaces (such as SQL) do not dictate implimentation. Ideally, query languages only ask for what you want, not tell the computer how to do it. As long as it returns the expected results, it does not matter if the database engine uses pointers, hashes, or gerbiles to get the answer. It may however require "hints" in the schema about what to optimize. Of course, you will sacrifice general-purpose performance to speed up a specific usage pattern. But at least they will give you the option.

It is somewhat similar to what "clustered indexes" do in some RDBMS. Clusters improve the indexing by a chosen key at the expense of other keys or certain write patterns by physically grouping the data by that *one* chosen index/key order. The other keys still work, just not as fast.
       

Re:Prediction... (2, Interesting)

Pseudonym (62607) | more than 7 years ago | (#17535468)

Interfaces like SQL don't dictate the implementation, but they do dictate the model. Sometimes, the model that you want is so far from the interface language, that you need to either extend or replace the interface language for the problem to be tractable.

SQL's approach has been to evolve. It isn't quite "there" for a lot of modern applications. I can forsee a day when SQL can efficiently model all the capabilities of, say, Z39.50, but we're not there now.

Re:Prediction... (1)

Tablizer (95088) | more than 7 years ago | (#17535660)

Z39.50 is specific to text searches, no? SQL and Z39.50 are apples and oranges.

     

Re:Prediction... (1)

dkf (304284) | more than 7 years ago | (#17536608)

SQL and text searching? Check out the FTS1 module [sqlite.org] for SQLite [sqlite.org] ...

Re:Prediction... (1)

theshowmecanuck (703852) | more than 7 years ago | (#17535328)

The reasons for this "one size fits all" (OSFA) strategy include the following:
Engineering costs...
Sales costs...
Marketing costs...

What about the cost of maintenance for the customer?

Maybe people will keep buying 'one size fits all' DBMSs if they meet enough of their requirements and they don't have to hire specialists for each type of databases they might have for each type of application. That is, it is easier and cheaper to maintain a smaller number of *standard* architectures (e.g. one) for a company. Otherwise you have to pay for all sorts of different types of specialists. Now if your company only does say, data warehousing, then that is another matter and it is smart to purchase a specialized system. Or if you are a mega corporation you might be able to afford to have a number of specialist teams for each type of system. But I think smaller shops might need to make do with the poor old vanilla DBMS.

No specifics (1)

PlatinumRiver (846268) | more than 7 years ago | (#17534156)

I was hoping the article would mention specific relational databases (Oracle, PostgreSQL) results versus specialized ones.

Re:No specifics (2, Interesting)

dedrop (308627) | more than 7 years ago | (#17536612)

There's a reason for that. Many years ago, the Wisconsin database group (David DeWitt in particular) authored one of the first popular database benchmarks, the Wisconsin benchmarks. They showed that some databases performed embarrassingly poorly, which made a lot of people really angry. In fact, Larry Ellison got so angry, he tried to get DeWitt fired (Ellison wasn't clear on the concept of tenure). Since then, major databases have a "DeWitt clause" in their end-user license, which says that the name of the database can't be used when reporting benchmark results.

And this years ahead of Microsoft not allowing users to benchmark Vista at all!

one size fits 90% (5, Insightful)

JanneM (7445) | more than 7 years ago | (#17534158)

It's natural to look at the edges of any feature or performance envelope. People that want to store petabytes of particle accellerator data, do complex queries to serve a million webpages a second, have hundreds of thousands of employees doing concurrent things to the backend.

But for most uses of databases - or any back-end processing - performance just isn't a factor and haven't been for years. Enron may have needed a huge data warehouse system; "Icepick Johhny's Bail Bonds and Securities Management" does not. Amazon needs the cutting edge in customer management; "Betty's Healing Crystals Online Shop (Now With 30% More Karma!)" not so much.

For the large majority of uses - whether you measure in aggregate volume or number of users - one size really fits all.

Re:one size fits 90% (1)

smilindog2000 (907665) | more than 7 years ago | (#17534296)

This is more true all the time. I work in the EDA industry, in chip design. The databases sizes I work with are naturally well correlated with More's Law. In effect, I'm a permanent power user, but my circle of peers is shrinking into oblivion...

Re:one size fits 90% (1, Insightful)

TubeSteak (669689) | more than 7 years ago | (#17535984)

For the large majority of uses - whether you measure in aggregate volume or number of users - one size really fits all.
I'm willing to concede that...
But IMO it is not 100% relevant.

Large corporate customers usually have a large effect on what features show up in the next version of [software]. Software companies put a lot of time & effort into pleasing their large accounts.

And since performance isn't a factor for the majority of users, they won't really be affected by any performance losses resulting from increased specialization/optimizations. Right?

this is goa%tsex (-1, Troll)

Anonymous Coward | more than 7 years ago | (#17534190)

Imagine that.... (4, Insightful)

NerveGas (168686) | more than 7 years ago | (#17534210)

... a database mechanism particularly written for the task at hand will beat a generic one. Who would have thought?

steve

(+1 Sarcastic)

Dammit (4, Insightful)

AKAImBatman (238306) | more than 7 years ago | (#17534238)

I was just thinking about writing an article on the same issue.

The problem I've noticed is that too many applications are becoming specialized in ways that are not handled well by traditional databases. The key example of this is forum software. Truly heirarchical in nature, the data is also of varying sizes, full of binary blobs, and generally unsuitable for your average SQL system. Yet we keep trying to cram them into SQL databases, then get surprised when we're hit with performance problems and security issues. It's simply the wrong way to go about solving the problem.

As anyone with a compsci degree or equivalent experience can tell you, creating a custom database is not that hard. In the past it made sense to go with off-the-shelf databases because they were more flexible and robust. But now that modern technology is causing us to fight with the databases just to get the job done, the time saved from generic databases is starting to look like a wash. We might as well go back to custom databases (or database platforms like BerkeleyDB) for these specialized needs.

Re:Dammit (1)

Tablizer (95088) | more than 7 years ago | (#17534570)

The key example of this is forum software. Truly heirarchical in nature, the data is also of varying sizes, full of binary blobs, and generally unsuitable for your average SQL system. Yet we keep trying to cram them into SQL databases, then get surprised when we're hit with performance problems and security issues. It's simply the wrong way to go about solving the problem.

But is this actually happening? Has slashdot had to abandon general-purpose RDBMS? Not all slashdot display orders are hierarchical anyhow.
         

Re:Dammit (1)

modmans2ndcoming (929661) | more than 7 years ago | (#17534836)

soooooo... you set up the code that deals with comments to access a hierarchical DB and everything else on a sql DB.

Re:Dammit (2, Insightful)

AKAImBatman (238306) | more than 7 years ago | (#17534868)

But is this actually happening? Has slashdot had to abandon general-purpose RDBMS?

I wasn't referring to Slashdot in particular, but rather general web forum software. Your PhpBB, vBulletins, and JForums of the world are more along the lines of what I'm referring to. After dealing with the frustrations of setting up, managing, and hacking projects like these, I've come to the conclusion that the backend datastore is the problem. The relational theories still hold true, but the SQL database implementations simply aren't built with CLOBs and BLOBs in mind.

That being said, Slashdot is a fairly good example of how they've worked around the limitations of their backend database at a cost equalling or far exceeding the cost of building a customized data store. A costly venture that bit them in the rear [slashdot.org] when they reached their maximum post count.

Not that I'm criticizing Slashcode. Hindsight is 20/20. It's just becoming more and more apparent that for some applications the cost of using an off-the-shelf database has become greater than the cost of building a custom datastore.

Re:Dammit (2, Insightful)

Tablizer (95088) | more than 7 years ago | (#17534984)

The relational theories still hold true, but the SQL database implementations simply aren't built with CLOBs and BLOBs in mind.

That is very true. They haven't seemed to have perfected the performance handling of highly variable "cells".

That being said, Slashdot is a fairly good example of how they've worked around the limitations of their backend database at a cost equalling or far exceeding the cost of building a customized data store. A costly venture that bit them in the rear

Last night we crossed over 16,777,216 comments in the database. The wise amongst you might note that this number is 2^24, or in MySQLese an unsigned mediumint. Unfortunately, like 5 years ago we changed our primary keys in the comment table to unsigned int (32 bits, or 4.1 billion) but neglected to change the index that handles parents.


It would be nice if more RDBMS offered flexible integers such that you didn't have to pick a size up front. Fixed sizes (small-int,int,long) are from the era where variable-sized column calculations were too expensive CPU-wise. Since then CPU is cheap compared to "pipeline" issues such that variable columns are just as efficient as fixed ones, but only take the space they need.

But it would not have been hard for slashdot to use a big integer up-front. They chose to be stingy and made a gamble, it was not forced on them. It may have cost a few cents more early, but would have prevented that disaster. Plus, bleep happens no matter what technology you use. I am sure dedicated-purpose databases have their own gotcha's and trade-off decision points. Being dedicated probably means they are less road-tested also.
     

Re:Dammit (1)

dcam (615646) | more than 7 years ago | (#17535670)

...the SQL database implementations simply aren't built with CLOBs and BLOBs in mind.

This is extremely true.

I work on a web application that stores a lot of documents (on of our clients stores +50Gb). The database back end is SQL Server (yeah I know). When it was designed (~8 years ago) we decided to store the documents in the filesystem and store the paths in the database. This was largely for performance reasons, although some other considerations were the size of database backups and general db management. It was anticipated that in the future we would moce the documents into the db when performance improved sufficiently. It hasn't.

According to Inside SQL Server 2000 [microsoft.com] , all data in SQL server is stored on 8K pages in B trees. BLOBs and CLOBs are broken up into 8K chunks. Performance on reading and writing this data is obviously not fantastic, particularly when you have largish files (we have files that are +100Mb, average size of files would be ~2Mb). In addition the tools in SQL server for adding and retrieving BLOBs are a major headache.

SQL Server is not designed for BLOBs. I can't comment on other relational databases, but I suspect that they would suffer similar issues.

Re:Dammit (1)

newt0311 (973957) | more than 7 years ago | (#17535736)

bah. limited experience. postgres has very easy methods for handling files (called large objects). The basic methods are the I/O methods which take in a file location and return an OID and those which will take a path and an oid and extract the lo to the specified path. There are also methods for searching. Infact, searching and retrieving parts of the file from inside the db is actually slightly faster because the db is very good at optimizing disk access.

Re:Dammit (1)

dcam (615646) | more than 7 years ago | (#17535988)

Comments like this one [thescripts.com] would suggest that others have different experience. I've been hunting for details on the storage mechanisms of pgsql to try to work out whether it might be faster but no luck so far.

Re:Dammit (2, Funny)

Jason Earl (1894) | more than 7 years ago | (#17534894)

Eventually the folks working on web forums will realize that they are just recreating NNTP and move on to something else.

Re:Dammit (1)

Jerf (17166) | more than 7 years ago | (#17535050)

Truly heirarchical in nature, the data is also of varying sizes, full of binary blobs, and generally unsuitable for your average SQL system.
Actually, I was bitching about this very problem [jerf.org] (and some others) recently, when I came upon this article about recursive queries [teradata.com] on the programming reddit [reddit.com] .

Recursive queries would totally, completely solve the "hierarchy" part of the problem, and halfway decent database design would handle the rest.

My theory is that nobody realizes that recursive queries would solve their problems, so nobody asks for them, so nobody ever discovers them, so nobody ever realizes that recursive queries would solve their problem. I don't know of an open source DB that has this, and I'd certainly never seen this in my many years of working with SQL. I wish we did have it, it would solve so many of my problems.

Now, if we could just deal with the problem of having a key that could relate to any one of several tables in some reasonable way... that's the other problem I keep hitting over and over again.

Re:Dammit (0)

Anonymous Coward | more than 7 years ago | (#17535966)

Now, if we could just deal with the problem of having a key that could relate to any one of several tables in some reasonable way... that's the other problem I keep hitting over and over again.

In other words, distributed foreign keys.. this has been discussed to some length by Chris Date and other people who work with the relational model. It's a pretty basic constraint, yet no SQL database seems to implement it.

A quick Google turns up a hopeful PostgreSQL discussion [postgresql.org] , but it quickly turns to PostgreSQL's "table inheritance" feature which is a very flawed idea.

I'd love to see this implemented in a mainstream database, I think about 3 out of every 4 apps I've written needed this.

Re:Dammit (1)

a_ghostwheel (699776) | more than 7 years ago | (#17536044)

Or just use hierarchical queries - like START WITH / CONNECT BY clauses in Oracle. Probably other vendors have something similar too - not sure about that.

Re:Dammit (1)

Imsdal (930595) | more than 7 years ago | (#17536506)

My theory is that nobody realizes that recursive queries would solve their problems, so nobody asks for them, so nobody ever discovers them, so nobody ever realizes that recursive queries would solve their problem.


It used to be that execution plans in Oracle were retreived from the plan table via a recursive query. Since even the tiniest application will need a minimum amount of tuning, and since all db tuning should start by looking at the execution plans, everyone should have run into recursive queries sooner rather than later.


My theory is instead that too few developers are properly trained. They simply don't know what they are doing or how it should be done. During my years as a consultant, I spent a lot of time improving db performance, and never even once did I run into in-house people who even knew what en execution plan was, let alone how to interpret it. (And, to be honest, not all of my consultant colleagues knew either...)


Software development is a job that requires the training of a surgeon, but it's staffed by people who are trained to be janitors or, worse, economists. (I realise that this isn't true at all for the /. crowd. I'm talking about all the others all of us has run into on every job we have had.)

Duh (5, Insightful)

Reality Master 101 (179095) | more than 7 years ago | (#17534258)

Who thinks that a specialized application (or algorithm) won't beat a generalized one in just about every case?

The reason people use general databases is not because they think it's the ultimate in performance, it's because it's already written, already debugged, and -- most importantly -- programmer time is expensive, and hardware is cheap.

See also: high level compiled languages versus assembly language*.

(*and no, please don't quote the "magic compiler" myth... "modern compilers are so good nowadays that they can beat human written assembly code in just about every case". Only people who have never programmed extensively in assembly believe that.)

Re:Duh (5, Informative)

Waffle Iron (339739) | more than 7 years ago | (#17534468)

*and no, please don't quote the "magic compiler" myth... "modern compilers are so good nowadays that they can beat human written assembly code in just about every case". Only people who have never programmed extensively in assembly believe that.

I've programmed extensively in assembly. Your statement may be true up to a couple of thousand lines of code. Past that, to avoid going insane, you'll start using things like assembler macros and your own prefab libraries of general-purpose assembler functions. Once that happens, a compiler that can tirelessly do global optimizations is probably going to beat you hands down.

Re:Duh (4, Insightful)

wcbarksdale (621327) | more than 7 years ago | (#17534670)

Also, to successfully hand-optimize you need to remember a lot of details about instruction pipelines, caches, and so on, which is fairly detrimental to remembering what your program is supposed to do.

Re:Duh (1)

mparker762 (315146) | more than 7 years ago | (#17534932)

It sounds like "global optimization" means something different to you than it does to a compiler writer. For a compiler, this simply means optimizing across basic blocks, not optimizing across functions and files (that's usually called "whole-program optimization" or something like that). Humans optimize across basic blocks very easily, it's actually difficult to stop a programmer from doing fairly extensive optimizations at this scale -- programs just look untidy and needlessly redundant without it. Compilers still have trouble doing a decent job of this type of optimizations for non-functional languages (like C).

Even using assembler macros and prefab libraries of general-purpose assembler functions you're generally no worse off than the compiler. What the heck to you think the standard C runtime is?

The bigger danger to doing lots of code in assembler is that you're tempted to use simpler algorithms over tricky-but-fast ones, and you're tempted to optimize too early (though this is a problem in any language. Assembler just makes this trap particularly easy to fall into).

Re:Duh (1)

Waffle Iron (339739) | more than 7 years ago | (#17535136)

Even using assembler macros and prefab libraries of general-purpose assembler functions you're generally no worse off than the compiler.

I don't know how true that is, given that assembler macros and fixed assembler APIs won't be particularly good at inlining calls and then integrating the optimizations of the inlined code with the particular facets of the surrounding code for each expansion.

Re:Duh (1)

Pseudonym (62607) | more than 7 years ago | (#17535424)

The reason why assembly programmers can beat high-level programmers is they can write their code in a high-level language first, then profile to see where the hotspots are, and then rewrite a 100 line subroutine or two in assembly language, using the compiler output as a first draft.

In other words, assembly programmers beat high-level programmers because they can also use modern compilers.

Re:Duh (1)

smilindog2000 (907665) | more than 7 years ago | (#17534558)

I've never heard the "magic compiler myth" phrase, but I'll help educate others about it. It's refreshing to hear someone who understands reality. Of course, a factor of 2 to 4 improvement in speed is less and less important every day...

Re:Duh (2, Interesting)

suv4x4 (956391) | more than 7 years ago | (#17534580)

"modern compilers are so good nowadays that they can beat human written assembly code in just about every case". Only people who have never programmed extensively in assembly believe that.

Only people who haven't seen recent advancements in CPU design and compiler architecture will say what you just said.

Modenr compilers apply optimizations on a so sophisticated level that would be a nightmare for a human to support such a solution optimized.

As an example, modern Intel processors can process certain "simple" commands in parallel and other commands are broken apart into simpler commands, processed serially. I'm simplifying the explanation a great deal, but anyone who read about how a modern CPU works, branch prediction algorithms and so on is familiar with the concept.

Of course "they can beat human written assembly code in just about every case" is an overstatement, but still, you gotta know there's some sound logic & real reasons behind this "myth".

Re:Duh (2, Insightful)

mparker762 (315146) | more than 7 years ago | (#17534776)

Only someone who hasn't recently replaced some critical C code with assembler and gotten substantial improvement would say that. This was MSVC 2003 which isn't the smartest C compiler out there, but not a bad one for the architecture. Still, a few hours with the assembler and a few more hours doing some timings to help fine-tune things improved the CPU performance of this particular service by about 8%.

Humans have been writing optimized assembler for decades, the compilers are still trying to catch up. Modern hand-written assembler isn't necessarily any trickier or more clever than the old stuff (it's actually a bit simpler). Yes compilers are using complicated and advanced techniques, but it's still all an attempt to approximate what humans do easily and intuitively. Artificial intelligence programs use complicated and advanced techniques too, but no one would claim that this suddenly makes philosophy any harder.

Your second point about the sophistication of the CPU's is true but orthogonal to the original claim. These sophisticated CPU's don't know who wrote the machine code, they do parallel execution and branch prediction and so forth on hand-optimized assembly just like they do on compiler-generated code. Which is one reason (along with extra registers and less segment BS) that it's easier to write and maintain assembler nowadays, even well-optimized assembler.

Re:Duh (0)

Anonymous Coward | more than 7 years ago | (#17535074)

Still, a few hours with the assembler and a few more hours doing some timings to help fine-tune things improved the CPU performance of this particular service by about 8%.

Woo hoo, so that one little loop that accounts for 5% of the total running time was sped up 8%! Did you not have anything better to do?

Modern hand-written assembler isn't necessarily any trickier or more clever than the old stuff (it's actually a bit simpler).

And after a day you might have a decent for loop. After that I have an entire web service.

The speed of execution goes down in importance as the speed to market goes up and the number of programmers goes up. Why waste your time using a hammer when doing a roof, get a nail gun.

Re:Duh (1, Insightful)

suv4x4 (956391) | more than 7 years ago | (#17535618)

This was MSVC 2003 which isn't the smartest C compiler out there, but not a bad one for the architecture. Still, a few hours with the assembler and a few more hours doing some timings to help fine-tune things improved the CPU performance of this particular service by about 8%... These sophisticated CPU's don't know who wrote the machine code, they do parallel execution and branch prediction and so forth on hand-optimized assembly just like they do on compiler-generated code. Which is one reason (along with extra registers and less segment BS) that it's easier to write and maintain assembler nowadays, even well-optimized assembler.

Do you know which types of commands when ordered in quadruples will execute at once on a Core Duo? Incidentally those that won't on a Pentium 4.

I hope you're happy with your 8% improvement, enjoy it until your next CPU upgrade that requires different approach to assembly optimization.

The advantage of a compiler is that compiling for a target CPU is a matter of a compiler switch, so compiler programmers can concentrate on performance and smart use of the CPU specifics, and you can concentrate on your program features.

If you were that concerned about performance in first place, you'd use a compiler provided by the processor vendor (Intel I presume) and use the intel libraries for processor specific implementations of common math and algorithm issues needed in applications.

Most likely this would've given you more than 8% boost and still keep your code somewhat less bound to a specific CPU, than with assembler.

An example of "optimization surprise" i like, is the removal of the barrel shifter in Pentium 4 CPU-s. You see, lots of programmers know that it's faster (on most platforms) to bit shift, and not multiply by 2, 4, 8, etc (or divide).

But bit shifting on P4 is handled by the ALU, and is slightly slower than multiplication (why, I don't know, but it's a fact). Code "optimized" for bit shifting would be "antioptimized" on P4 processors.

I know some people adapted their performance critical code to meet this new challenge. But then what? P4 is obsolete and instead we're back to the P3 derived architecture, and the barrel shifter is back!

When I code a huge and complex system, I'd rather buy a 8% faster machine and use a better compiler than have to manage this hell each time a CPU comes out.

Re:Duh (3, Insightful)

try_anything (880404) | more than 7 years ago | (#17535854)

Modenr compilers apply optimizations on a so sophisticated level that would be a nightmare for a human to support such a solution optimized.

There are three quite simple things that humans can do that aren't commonly available in compilers.

First, a human gets to start with the compiler output and work from there :-) He can even compare the output of several compilers.

Second, a human can experiment and discover things accidentally. I recently compiled some trivial for loops to demonstrate that array bounds checking doesn't have a catastrophic effect on performance. With the optimizer cranked up, the loop containing a bounds check was faster than the loop with the bounds check removed. That did not inspire confidence.

Third, a human can concentrate his effort for hours or days on a single section of code that profiling revealed to be critical and test it using real data. Now, I know JIT compilers and some specialized compilers can do this stuff, but as far as I know I can't tell gcc, "Compile this object file, and make the foo function as fast as possible. Here's some data to test it with. Let me know on Friday how far you got, and don't throw away your notes, because we might need further improvements."

I hope I'm wrong about my third point (please please please) so feel free to post links proving me wrong. You'll make me dance for joy, because I do NOT have time to write assembly, but I have a nice fast machine here that is usually idle overnight.

Re:Duh (2, Insightful)

kfg (145172) | more than 7 years ago | (#17534606)

The reason people use general databases is not because they think it's the ultimate in performance, it's because it's already written, already debugged, and -- most importantly. . .

. . .has some level of definable and gauranteed data integrity.

KFG

I thought I was an assembler demon (1)

ratboy666 (104074) | more than 7 years ago | (#17534622)

I had a "simple" optimization project. It came down to one critical function (ISO JBIG compression). I coded the thing by hand in assembler, carefully manually scheduling instructions. It took me days. Managed to beat GNU gcc 2 and 3 by a reasonable margin. The latest Microsoft C compiler? Blew me away. I looked at the assembler it produced -- and I don't get where the gain is coming from. The compiler understands the machine better than I do.

Go figure -- I hung up my assembler badge. Still a useful skill for looking at core dumps, though. And for dealing with micro-controllers.

So, have you had at it and benchmarked your assembler vs. a compilers?

Re:I thought I was an assembler demon (1)

Reality Master 101 (179095) | more than 7 years ago | (#17535350)

I looked at the assembler it produced -- and I don't get where the gain is coming from. The compiler understands the machine better than I do.

All that proves is that the compiler knew a trick you didn't (probably it understood which instructions will go into which pipelines and will parallelize). I bet if you took the time to learn more about the architecture, you could find ways to be even more clever.

I'm not arguing for a return to assembly... it's definitely too much of a hassle these days, and again, hardware is cheap, and programmers are expensive. Just that given enough programmer time, humans can nearly always do better than the compiler, which shouldn't be surprising since humans programmed the compiler, and humans have more contextual knowledge of what a program is trying to accomplish.

Bad myth! (0)

Anonymous Coward | more than 7 years ago | (#17534730)

Hardware is cheap.

Developer time slightly less so.

Operational expenses will break your kneecaps and charge you for the bat. So often I have heard developers piss and moan about another debugging stage or standardisation in terms of installation procedures or configuration options or whatever, and haul out the hoary chestnut about their time being heinously expensive, when in fact what they could fix once will mean major benefits again, and again, and again, often on a weekly or monthly basis for scheduled tasks. Even accelerating a simple weekly data update from a three-hour, five-man task to a one-hour, one-man task saves you fourteen man-hours. (This example taken from real life with details stripped to protect the idiotic.) You don't have to have an MBA from Harvard to count these beans.

Remember, kids:
Up front costs (including dev time) are cheap.
Recurring costs are expensive.

Re:Duh (1)

BillGatesLoveChild (1046184) | more than 7 years ago | (#17535190)

> "modern compilers are so good nowadays that they can beat human written assembly code in just about every case". Case in point: SQL Optimizers in every SQL Database Product I have ever used. Often, they will find a very stupid way of doing something where a human (who has greater insight into the data despite the UPDATE STATISTICS command) can find a much faster way. Much of my time maintaining databases is trying to get it not to do stupid things. No one trusts automatic C++ code generators. So why do we trust automatic SQL code generators?

Re:Duh (1)

Electrum (94638) | more than 7 years ago | (#17535444)

No one trusts automatic C++ code generators. So why do we trust automatic SQL code generators?

We don't. That's why we have explain plans and hints.

Re:Duh (1)

BillGatesLoveChild (1046184) | more than 7 years ago | (#17535498)

> That's why we have explain plans and hints.

That's right. To fix the fact they're broken.

Parallel databases (1)

meta-monkey (321000) | more than 7 years ago | (#17534288)

This reminds me of the parallel databases class I took in college. Sure, specialized parallel databases (not distributed, mind you, parallel) using specialized hardware were definitely faster than the standard SQL-type relational databases...but so what? The costs were so much higher they were not feasible for most applications.

Specialized software and hardware outperforms generic implementations! Film at 11!

SQL is Dead - Long Live SQL (1)

Doc Ruby (173196) | more than 7 years ago | (#17534488)

SW platform development always features a tradeoff between general purpose APIs and optimized performance engines. Databases are like this. The economic advantages for everyone in using an API even as awkward and somewhat inconsistent as SQL are more valuable than the lost performance in the fundamental relational/query model.

But it doesn't have to be that way. SQL can be retained as an API, but different storage/query engines can be run under the hood to better fit different storage/query models for different kinds of data/access. A better way out would be a successor to SQL that is more like a procedural language for objects with all operators/functions implicitly working on collections like tables. Yes, something like object lisp, best organized as a dataflow with triggers and events. So long as SQL can be automatically compiled into the new language, and back, for at least 5 years of peaceful coexistence.

Re:SQL is Dead - Long Live SQL (1)

Tablizer (95088) | more than 7 years ago | (#17535080)

A better way out would be a successor to SQL that is more like a procedural language for objects with all operators/functions implicitly working on collections like tables.

If you mean like cursor-oriented approaches (explicit loops), that tends to make automatic optimization harder. If you note, in SQL you generally don't specify order nor explicit loops. The idea is that the RDBMS figures out the best performance path so that you don't have to. It is like refacting a math equation. The less your query language is like math and the more like procedural steps (loops and conditionals), the harder it is to auto-optimize.

As far as "objects", I think OOP set relational progress back 15 years. OO conflicts with relational on a number of fronts and has bloated up code with ugly translation layers. But that is another debate for another day.

As far as replacing SQL with a more flexible query language, I have proposed SMEQL [c2.com] (Structured Meta-Enabled Query Language). It allows things like column lists to be "calculated" via a query also. It also makes it easier to split big queries into smaller ones by using named references instead of (or in additional to) nesting. Some have complained about the "string clauses", but these are just one of multiple implementation approaches.
     

Re:SQL is Dead - Long Live SQL (1)

Tablizer (95088) | more than 7 years ago | (#17535118)

Here is an example of the draft language SMEQL:

This example returns the top 6 earners in each department based on this table schema: table: Employees, columns: {empID, dept, empName, salary}

    srt = orderBy(Employees, (dept, salary), order)
    top = group(srt, ((dept) dept2, max(order) order))
    join(srt, top, a.dept=b.dept2 and b.order - a.order <= 5)

Re:SQL is Dead - Long Live SQL (1)

Doc Ruby (173196) | more than 7 years ago | (#17535352)

Collections don't (necessarily) have an order.

Objects don't have to be C++ objects. They can be just class blueprints inherited from other classes, for instantiated objects, which are just related logic and the data accessed.

Your SMEQL looks a lot like lisp.

Something like object lisp for large collections of multidimensional (even asymmetric) objects could bring benefits of encapsulation/reuse and relations to a syntax that better reflects both the data model and the sequence of operations, in rules like policies. A dataflow version would be easy to read, debug and maintain.

Re:SQL is Dead - Long Live SQL (1)

Tablizer (95088) | more than 7 years ago | (#17535684)

Collections don't (necessarily) have an order.

I am not sure what this is a response to.

Objects don't have to be C++ objects. They can be just class blueprints inherited from other classes, for instantiated objects, which are just related logic and the data accessed.

There is no uniform practice/standard for handling collections of objects. Under pure OO, each object/class handles itself. Thus, standardizing collection handling is against encapsulation. Standardizing it turns the OO engine into a quasi-database, but 60's style navigational databases are the kind of things that frustrated Dr. Codd and motivated him to invent relational. Navigational and relational battled it out in the 70's, and relational won......until OO fans tried to bring back navigational. Relational offers more discipline and consistency to design than navigational. This is largely because it barrowed from set theory and predicate logic.

Your SMEQL looks a lot like lisp.

I disagree. It does barrow heavily from functional programming, but the similarity to Lisp ends there.
     

Re:SQL is Dead - Long Live SQL (1)

hypersql (954649) | more than 7 years ago | (#17535418)

A successor to SQL - NewSQL: http://newsql.sf.net/ [sf.net]

This has been known for years already (2, Interesting)

TVmisGuided (151197) | more than 7 years ago | (#17534566)

Sheesh...and it took someone from MIT to point this out? Look at a prime example of a high-end, heavily-scaled, specialized database: American Airlines' SABRE. The reservations and ticket-sales database system alone is arguably one of the most complex databases ever devised, is constantly (and I do mean constantly) being updated, is routinely accessed by hundreds of thousands of separate clients a day...and in its purest form, is completely command-line driven. (Ever see a command line for SABRE? People just THINK the APL symbol set looked arcane!) And yet this one system is expected to maintain carrier-grade uptime or better, and respond to any command or request within eight seconds of input. I've seen desktop (read: non-networked) Oracle databases that couldn't accomplish that!

Re:This has been known for years already (1)

Tablizer (95088) | more than 7 years ago | (#17535824)

Look at a prime example of a high-end, heavily-scaled, specialized database: American Airlines' SABRE.

But are there *other* airlines that are doing fine with "standard" RDBMS, such as Oracle or DB2?
     

Re:This has been known for years already (3, Insightful)

sqlgeek (168433) | more than 7 years ago | (#17535930)

I don't think that you know Oracle very well. Lets say you want so scale and so you want clustering or grid functionality -- built into Oracle. Lets say that you want to partition your enormous table into one physical table per month or quarter -- built in. Oh, and if you query the whole giant table you'd like parallel processes to run against each partition, balanced across your cluster or grid -- yeah, that's built in too. Lets say you almost always get a group of data together rather than piece by piece so you want it physically colocated to reduce disk i/o -- built in.

This is why you pay a good wage for your Oracle data architect & DBA -- so that you can get people who know how to do these sort of things when needed. And honestly I'm not even scratching the surface.

Consider a data warehouse for a giant telecom in South Africa (with a DBA named Billy in case you wondered). You have over a billion rows in your main fact table, but you're only interested in a few thousand of those rows. You have an index on dates and another index on geographic region and another region on customer. Any one of those indexes will reduce the 1.1 billion rows to 10's of millions of rows, but all three restrictions will reduce it to a few thousand. What if you could read three indexes, perform bitmap comparisons on the results to get only the rows that match the results of all three indexes and then only fetch those few thousand rows from the 1.1 billion row table. Yup, that's built in and Oracle does it for you for behind the scenes.

Now yeah, you can build a faster single-purpose db. But you better have a god damn'd lot of dev hours allocated to the task. My bet is that you'll probably come our way ahead in cash & time to market with Oracle, a good data architect and a good DBA. Any time you want to put your money on the line, you let me know.

MIT, not berkley (0)

Anonymous Coward | more than 7 years ago | (#17534616)

Back when he did postgres, it was at berkley. He then moved on to the private world to do a start-up from it. So now he is at MIT. Well, at least MIT picks up good ones.

Re:MIT, not berkley (1)

hey hey hey (659173) | more than 7 years ago | (#17535622)

Back when he did postgres, it was at berkley. He then moved on to the private world to do a start-up from it.

He was still a professor at Berkeley while working with RTI/Ingres Inc. He didn't leave until his wife wanted to move back near her family (which was after the sale of both Ingres and Illustra). I was at (no doubt just one of) the going away lunches.

So now he is at MIT. Well, at least MIT picks up good ones.

Certainly true. Mike is as bright as they come.

Please reduce lameness (4, Insightful)

suv4x4 (956391) | more than 7 years ago | (#17534660)

We're all sick with "new fad: X is dead?" articles. Please reduce lameness to an acceptable level!
Can't we get used to the fact that specialized & new solutions don't magically kill existing popular solution to a problem?

And it's not a recent phenomenon, either, I bet it goes back to when the first proto-journalistic phenomenons formed in early uhman societies, and haunts us to this very day...

"Letters! Spoken speech dead?"

"Bicycles! Walking on foot dead?"

"Trains! Bicycles dead?"

"Cars! Trains dead?"

"Aeroplanes! Trains maybe dead again this time?"

"Computers! Brains dead?"

"Monitors! Printing dead yet?"

"Databases! File systems dead?"

"Specialized databases! Generic databases dead?"

In a nutshell. Don't forget that a database is a very specialized form of a storage system, you can think of it as a very special sort of file system. It didn't kill file systems (as noted above), so specialized systems will thrive just as well without killing anything.

Re:Please reduce lameness (-1)

Anonymous Coward | more than 7 years ago | (#17535228)

man, you're a fag and an idiot.

Re:Please reduce lameness (1)

suv4x4 (956391) | more than 7 years ago | (#17535704)

man, you're a fag and an idiot.

The world's not perfect you know. You're a troll, anonymous and a coward.

I'd still pick me over you if given the chance.

Re:Please reduce lameness (2, Funny)

msormune (808119) | more than 7 years ago | (#17535752)

I'll chip in: Public forums! Intelligence dead? Slashdot confirms!

Death to Trees! (2, Interesting)

Tablizer (95088) | more than 7 years ago | (#17535770)

Don't forget that a database is a very specialized form of a storage system, you can think of it as a very special sort of file system. It didn't kill file systems

Very specialized? Please explain. Anyhow, I *wish* file systems were dead. They have grown into messy trees that are unfixable because trees can only handle about 3 or 4 factors and then you either have to duplicate information (repeat factors), or play messy games, or both. They were okay in 1984 when you only had a few hundred files. But they don't scale. Category philosophers have known since before computers that hierarchy taxonomies were limited.

The problem is that the best alternative, set-based file systems, have a longer learning curve than trees. People pick up hierarchies pretty fast, but sets take longer to click. Power does not always come easy. I hope that geeks start using set-oriented file systems and then others catch up. The thing is that set-oriented file systems are enough like relational that one might as well use relational. If only the RDBMS were performance-tuned for file-like uses (with some special interfaces added).

Re:Death to Trees! (2, Insightful)

suv4x4 (956391) | more than 7 years ago | (#17535874)

Anyhow, I *wish* file systems were dead. They have grown into messy trees that are unfixable because trees can only handle about 3 or 4 factors and then you either have to duplicate information (repeat factors), or play messy games, or both.

You know, I've seen my share of RDBMS designs to know the "messiness" is not the fault of the file systems (or databases in that regard).

Sets have more issues than you describe, and you know very well Vista had lots of set based features that were later downscaled, hidden and reduced, not because WinFS was dropped (because the sets in Vista don't use WinFS, they work with indexing too), but because it was terribly confusing to the users.

Isn't it just stating the obvious? (5, Funny)

Dekortage (697532) | more than 7 years ago | (#17534804)

I've made some similar discoveries myself!

  • Transporting 1500 pounds of bricks from the store to my house is much faster if I use a big truck rather than making dozens (if not hundreds) of trips with my Honda Civic.
  • Wearing dress pants with a nice shirt and tie often makes an interview more likely to succeed, even if I wear jeans every other day after I get the job.
  • Carving pumpkins into "jack-o-lanterns" always turns out better if I use a small, extremely sharp knife instead of a chainsaw.

Who woulda thought that specific-use items might improve the outcome of specific situations?

it's all (okay, mostly) in the queries (1)

yagu (721525) | more than 7 years ago | (#17535032)

I've seen drop dead performance on flat file databases. I've seen molasses slow performance on mainframe relational databases. And I've seen about everything in between.

What I see as a HUGE factor is less the database chosen (though that is obviously important) and more how interactions with the database (updates, queries, etc) are constructed and managed.

For example, we one time had a relational database cycle application that was running for over eight hours every night, longer than the alloted time for all night time runs. One of our senior techs took a look at the program, changed the order of a couple of parentheses, and the program ran in less than fifteen minutes, with correct results.

I've also written flat file "database" applications, specialized with known characteristics that operated on extremely large databases (for the time, greater than 10G), and transactions were measured in milliseconds, typically .001 - .005 seconds) under heavy load. This application would never have held up under any kind of moderate requirement for updates, but I knew that.

I've many times seen overkill with hugely expensive databases hammering lightweight applications into some mangle relational solution.

I've never seen the world as a one-size-fits-all database solution. Vendors of course would tell us all different.

Databases are like history and mistakes (1, Troll)

mlwmohawk (801821) | more than 7 years ago | (#17535042)

The problem with database articles like this is that they pretty much ignore the core objectives of a database. Wait, I hear you, "That's the point, but ignoring ACID and transactions, we can improve performance." There in lies the difference between understanding the "whole problem" vs a part of the problem.

Specialized solutions typically accomplish their objective by removing one or more aspects of good database design.

Databases are complex beasts and the good ones encompass a LOT of expertise and theory of data access, stuff that takes many months or even years to really understand. Specialized systems tend to focus exclusively on the highest level "problem" while ignoring the inherent problems of data access and modifications.

While there are specialized solutions that work within a limited range of criteria, and, in fact, improve performance, it should be the exception rather than the rule, because the good SQL databases are REALLY good (MySQL is not one of them) at parsing and creating a good query.

Re:Databases are like history and mistakes (0, Redundant)

NineNine (235196) | more than 7 years ago | (#17535284)

Well said.

No way. (0)

Anonymous Coward | more than 7 years ago | (#17535072)

We still use Access every day. It's not dead yet!

This is news? (0)

Anonymous Coward | more than 7 years ago | (#17535254)

Who has _not_ known this for years? The only reasons for generic DB's has been lowcost and flexibility. Does it take studys to point out common knowledge now?

This is spot on -- I did some benchmarking, too (1)

relstatic (1049148) | more than 7 years ago | (#17535386)

I actually stared a Free/Open Source Social Networking project (FlightFeather [bosstats.com] ) based on similar reasoning, which I also covered in a talk at LinuxWorld San Francisco 2006 [linuxworld.com] .

Specifically, when dealing with Web applications, get as much of the stored data as possible in HTML format -- then serve the resulting static pages. Much faster than pulling the data out of a generic SQL database, and rebuilding the page on every request. Even the page generation side can be very efficient. My benchmarking indicates that a single one-CPU machine should be able to handle several hundred comments per second in a discussion forum.

in other news (1, Redundant)

timmarhy (659436) | more than 7 years ago | (#17535700)

boats perform better on water then cars, stones don't work well as parachutes and fire warms your house better then blocks of ice. how long did it take this genious to come up with this?

One size still fits all (1)

bytesex (112972) | more than 7 years ago | (#17536348)

It's just not called SQL driven RDBMS. It's called Sleepycat.
Load More Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Create a Slashdot Account

Loading...