Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

The NoSQL Ecosystem

kdawson posted more than 4 years ago | from the no-relation dept.

Databases 381

abartels writes 'Unprecedented data volumes are driving businesses to look at alternatives to the traditional relational database technology that has served us well for over thirty years. Collectively, these alternatives have become known as NoSQL databases. The fundamental problem is that relational databases cannot handle many modern workloads. There are three specific problem areas: scaling out to data sets like Digg's (3 TB for green badges) or Facebook's (50 TB for inbox search) or eBay's (2 PB overall); per-server performance; and rigid schema design.'

Sorry! There are no comments related to the filter you selected.

Why worry? (5, Funny)

Anonymous Coward | more than 4 years ago | (#30042488)

Microsoft Access is here!

Re:Why worry? (4, Funny)

MichaelSmith (789609) | more than 4 years ago | (#30042494)

Don't forget excel!

Re:Why worry? (1)

socceroos (1374367) | more than 4 years ago | (#30042628)

Ah yes, Excel 97 - the days when you could be in a flight simulator and legitimately tell your boss you were crunching numbers.

Re:Why worry? (1, Funny)

Anonymous Coward | more than 4 years ago | (#30043222)

ya don't forget!

Re:Why worry? (1)

mikael_j (106439) | more than 4 years ago | (#30043006)

Sadly there is plenty of production code that uses Access databases for things they just shouldn't be used for, at a previous job I actually built several production websites that used Access as the db backend because the client didn't want to use MySQL (Open source is scary!) and they didn't want to pay for MSSQL...

/Mikael

Re:Why worry? (1)

rainhill (86347) | more than 4 years ago | (#30043296)

Seriously... why the +5 funny?

Re:Why worry? (4, Funny)

Manos_Of_Fate (1092793) | more than 4 years ago | (#30043360)

Because there's no "scary because it's true" mod.

Re:Why worry? (2, Insightful)

Linker3000 (626634) | more than 4 years ago | (#30043502)

Oh Great! I have just migrated 5 offices from a veterinary management system based around Access 97 onto the new, MS-SQL-based one.

How can I expect to maintain my value to the company if they stick with old, reliable systems instead of moving onto more sophisticated 'solutions' that require a shit-load of tweaking and technical guesswork to keep them running smoothly?

And I am missing it greatly on Linux (2, Interesting)

Errol backfiring (1280012) | more than 4 years ago | (#30043582)

MS-Access had some really great features: it could be accessed with both SQL and with a blazingly fast (because almost running on the bare OS) ISAM-style library. I am still missing anything like it on Linux. SQLite is a file-system database, but why on earth should it parse full-blown SQL at runtime and why on earth should my program write another program in SQL at runtime just to load some data? Get serious. Parsing and building SQL is just overhead, and especially parsing SQL is no easy and light task.

Since I switched to OO programming, most (95%) of my queries are "This table/index. Number 5 please." In essence that is the get/put method, or the ISAM style method. I really would like something like that to exist on Linux. The closest thing around is MySQL's HANDLER statement, but that can only be used for constant data (because it does dirty reads) and for reading only.

SQLite could even be faster if it just accepted some basic "get row by index" and "put row by index" commands that do not try to parse, optimize or outsmart anything. The problem with "modern" databases is that they are either "SQL" or "NoSQL". That's awful. Some programs speak SQL (because of compatibility, because it is a reporting program or just because the programmer does not know anything else) and some programs are better off with direct row management. That does not mean that the data should not be accessible by both programs. I really wish that the regular SQL databases would develop ISAM-style access methods. Programming would be a hell of a lot easier then, and the programs themselves would speed up significantly was well.

This is no idle remark. I worked a lot with MS-Access and most rants about it being slow comes from the fact that most programmers treat the file-system database as a server. So it must emulate itself as a server and do a lot of household parsing and does not even have a physical server to relieve its load.
But if you know how to program a file-system database with ISAM-style methods, MS-Access is by far the fastest database I ever encountered. No Joke. Really. It can be fast because there is no need to do all these household jobs to just dig up a row.

Re:And I am missing it greatly on Linux (1)

QuoteMstr (55051) | more than 4 years ago | (#30043606)

Parsing and building SQL is just overhead, and especially parsing SQL is no easy and light task.

No optimization without quantification. Parsing is very fast, especially compared to disk IO. Are you sure the SQL is slowing your program to any appreciate (or even measurable) degree? You should be able to measure any supposed effect with a profiler.

Nevertheless, if SQLite so offends your sensibilities, you can always use Berkeley DB [oracle.com] . It gives you a similarly powerful storage engine without the necessity (or ability) to write SQL to access it.

Re:And I am missing it greatly on Linux (1)

Errol backfiring (1280012) | more than 4 years ago | (#30043654)

Nevertheless, if SQLite so offends your sensibilities, you can always use Berkeley DB [oracle.com]. It gives you a similarly powerful storage engine without the necessity (or ability) to write SQL to access it.

Well, that's exactly the problem, isn't it? Once I put my data in a "NoSQL" database, I can not reach it with SQL anymore and vice versa. For some applications (especially reporting), SQL is the best tool. The problem is that "the warehouse determines what trucks you should use". And I just want to be able to use light and small trucks for small urgent jobs and large trucks for bulk storage.

bad design (2, Insightful)

girlintraining (1395911) | more than 4 years ago | (#30042510)

So... every time I open my inbox in Facebook, it has to search through 50TB of data? That sounds like a design problem. What has always floored me is why people think everything needs to be stuffed into a database. Terabyte sized binary blobs? You know, there's a certain point where people need to stop and actually think about the implimentation.

Re:bad design (5, Funny)

bennomatic (691188) | more than 4 years ago | (#30042518)

I'm a terabyte sized binary blob, you insensitive clod!

Re:bad design (1)

Tablizer (95088) | more than 4 years ago | (#30043016)

Go on a diet then.

Re:bad design (-1, Troll)

Anonymous Coward | more than 4 years ago | (#30042526)

That would require intelligence. Havent looked outside of the basement lately have you?

Re:bad design (5, Insightful)

munctional (1634709) | more than 4 years ago | (#30042560)

Ever heard of bloom filters? Sharding? Indexes? They are clearly not doing a table scan on 50gb of data every time you open your Facebook inbox.

You know, there's a certain point where people need to stop and actually think about the implimentation.

Um, they do. They regularly blog about their solutions to their problems and open source their solutions and contributions to existing projects. They come up with amazing solutions to their large scale problems. They're running over five million Erlang processes for their chat system!

http://developers.facebook.com/news.php?blog=1 [facebook.com]

http://github.com/facebook [github.com]

Also, when was the last time you tried to visit Facebook and it was down? They're doing quite well for people who need to stop and actually think about their "implimentation".

Re:bad design (5, Funny)

socceroos (1374367) | more than 4 years ago | (#30042650)

Ever heard of bloom filters? Sharding? Indexes?

Don't forget flux capacitors, FTL drives and crossfading splicers.

Re:bad design (2, Funny)

TheModelEskimo (968202) | more than 4 years ago | (#30042684)

Yeah. And those guys down the street, the tweakers, nose jobs, and johnny-come-latelys.

Re:bad design (1)

Linker3000 (626634) | more than 4 years ago | (#30043504)

..reverse tachyon beams...oh, and more cowbell.

Oh no... (0, Funny)

Anonymous Coward | more than 4 years ago | (#30042786)

I just sharded

Re:bad design (2, Insightful)

kestasjk (933987) | more than 4 years ago | (#30042798)

They use bloom filters for messaging? What for?

Re:bad design (1)

DarkAxi0m (928088) | more than 4 years ago | (#30042844)

clearly to make the messages look better for the people with fancy video cards...
http://www.vgcats.com/comics/?strip_id=224

Re:bad design (1, Informative)

Anonymous Coward | more than 4 years ago | (#30042942)

Bloom Filter [wikipedia.org]

Been around before it was used to describe computer graphics lighting effects.

Re:bad design (1)

oldhack (1037484) | more than 4 years ago | (#30043160)

To reverse the polarity when the flux capacitor is overloaded.

Re:bad design (0, Offtopic)

syousef (465911) | more than 4 years ago | (#30042924)

Ever heard of bloom filters? Sharding? Indexes?

What does World of Warcraft have to do with it?

Re:bad design (1)

Hal_Porter (817932) | more than 4 years ago | (#30043000)

They're running over five million Erlang processes for their chat system!

http://developers.facebook.com/news.php?blog=1 [facebook.com]

If the objective were to maximize the number of Erlang processes, that would be an indeed be an impressive achievement.

Re:bad design (2, Interesting)

gutter (27465) | more than 4 years ago | (#30043152)

Sounds like you don't know much about Erlang. Erlang processes are MUCH lighter weight than unix processes, and are designed to scale to millions of processes. Generally, you want one Erlang process for each concurrent task in the system, like maybe one process for each active chat session. So, having 5 million Erlang processes would be as designed.

Re:bad design (2, Funny)

Zombywuf (1064778) | more than 4 years ago | (#30043676)

Sounds like you don't understand sarcasm. I'll spell it out for you: Simply because Facebook are running 5 million processes is neither here nor there. The impressive thing is that it actually works (from what I hear it does any way. If it did it with one process or 5 million it has nothing to do with the relative weight of Erlang and Unix processes.

Next up, tying your own shoelaces...

Re:bad design (3, Interesting)

Ragzouken (943900) | more than 4 years ago | (#30043072)

"Also, when was the last time you tried to visit Facebook and it was down? They're doing quite well for people who need to stop and actually think about their "implimentation"."

When was the last time you tried to use Facebook or Facebook chat and didn't get failed transport requests, unsent chat messages, unavailable photos, or random blank pages?

Re:bad design (1)

KiwiSurfer (309836) | more than 4 years ago | (#30043202)

Mod parent up. Facebook chat is probably one of the most unreliable IM systems I've used to date.

Perhaps you just aren't that popular? (1, Insightful)

Anonymous Coward | more than 4 years ago | (#30043232)

Has this ever occured to you: Maybe people just choose not to answer you? :)

Re:bad design (1)

donaggie03 (769758) | more than 4 years ago | (#30043240)

None of that ever happens to me, and I use facebook all the time. Maybe facebook just doesn't like you!

Re:bad design (4, Interesting)

JavaPunk (757983) | more than 4 years ago | (#30042600)

Yes it does (look through 50TB of data), and how would you design it? It has to access all of your friends and find their postings. Robert Johnson gave an excellent talk on facebook's design two weeks ago at OOPSLA (it should be in the ACM digital library soon). He stated that there is no clear segregation of data, the (friend) network is too connected and extracting groups of friends isn't possible. Basically they have a huge mysql farm with memcached on top. Loading an inbox will hit multiple servers (maybe even a different server for each of your friends) across the farm.

Re:bad design (4, Funny)

ErikTheRed (162431) | more than 4 years ago | (#30042712)

So... every time I open my inbox in Facebook, it has to search through 50TB of data? That sounds like a design problem. What has always floored me is why people think everything needs to be stuffed into a database. Terabyte sized binary blobs? You know, there's a certain point where people need to stop and actually think about the implimentation.

Could be worse. They could try to find something on my desk.

Re:bad design (1)

mabhatter654 (561290) | more than 4 years ago | (#30042818)

all electronic data is stored in a "database"... even a file system is a type of database you just can't query it with SQL... it's just that database programs that use SQL as the front end have more tools built for them.

You have a point that the implementation has to be thought about. If you RTFA you'll see the issue is more that RDMS implementations like Oracle rely on breaking up clusters so you can cram as much of the data into RAM as possible on to fast CPUs... in the case of something like Facebook that's just not possible, it's not reasonable to build that type of system, so they have to optimize their systems in new ways. This is also why mainframes and midrange systems still rule the enterprise arena because they have massive amounts of ram, cache, and disk directly connected . They don't architecturally fix the problem, just throw more resources at it. Google and Facebook are built on small, cheap machines in massive arrays...but RDMS doesn't really fit that storage model. Also internet clusters don't get "batch downtime" like mainframes do... time to cleanup tables and reset indexes. The big joke I thought of in my Oracle class was how much manually "tending" the package took....where Oracle systems try to have PERFECT data, Facebook or Google simply "evolves" the data as they build new banks of servers and throws out the unoptimized nodes, they know they'll never have perfect data... stop employing armies of DBAs to rearrange the house of cards. (in their defense Oracle is building "perfect" systems because that's how accounting, inventory, shipping, etc NEED to be... if you lose a Facebook post or Google loses a months worth of Bot data it's not the end of the world... lose a month of TPS reports and there's hell to pay!)

In a parallel problem it's the same on your home computer. Why do I care about my OS having a "file system" and why can't my data, contacts, pictures, music, document, etc be in some kind of database that is self contained, properly metatagged, and enables a simple method of backup. In many ways the first 30 years of home computing was about making things generalized and marketable... the next will be building systems that ignore "files" and manage the raw data and metatags directly.

Re:bad design (1)

Tablizer (95088) | more than 4 years ago | (#30043036)

So... every time I open my inbox in Facebook, it has to search through 50TB of data?

You do if you are Paris Hilton.
   

hmm (4, Insightful)

buddyglass (925859) | more than 4 years ago | (#30042514)

With regard to scalability, it strikes me that the problem isn't so much SQL but the fact that current SQL-based RDBMS implementations are optimized for smaller data sets.

Re:hmm (5, Insightful)

phantomfive (622387) | more than 4 years ago | (#30042602)

The biggest problem is the cloud. A lot of cloud APIs don't allow full relational database access, so now it seems we are coming up with all these justifications for why we don't really need it. Notice that this blog is from a company pushing a cloud based solution.

Re:hmm (4, Insightful)

MightyMartian (840721) | more than 4 years ago | (#30042642)

That's my take as well. We have these crippled semi-databases that lack a lot of useful features that anyone that has used RDBMSs over the last few decades have gotten used to, so suddenly it becomes a justification game; "Well, SQL doesn't deliver the output we need, so here's some half-way-to-SQL tools which are really better, kinda... oh yes, and Netcraft confirms it, SQL is dying!!!!"

I have a feeling that this part hype, part inept programmers who don't actually understand SQL, or database optimization. The first sign for me that someone is selling bullshit is when they try to act like this is some never before seen problem, when in fact there is a good four decades of research of database optimization.

Re:hmm (0)

Anonymous Coward | more than 4 years ago | (#30042704)

Well, that and the fact that relational/transactional guarantees don't scale particularly well, especially if you want to cluster your solution.

Sometimes the only way to get the performance you need is to change the model.

Re:hmm (1)

Tablizer (95088) | more than 4 years ago | (#30042756)

One can skip transactions for speed in some RDBMS. MySql grew in popularity largely because it has an engine choice that excludes transaction handling, making it faster at the expense of the risk of incomplete or inconsistent data. It's a matter of choosing your trade-offs.

Re:hmm (1)

tkinnun0 (756022) | more than 4 years ago | (#30043442)

Isn't that the point, though? Drop all the features of SQL databases that hinder scalability and you end up with NoSQL.

Re:hmm (4, Interesting)

buchner.johannes (1139593) | more than 4 years ago | (#30042874)

The first sign for me that someone is selling bullshit is when they try to act like this is some never before seen problem, when in fact there is a good four decades of research of database optimization.

Your point is valid, but I think there is more to it. And the problems these solutions try to solve are quite old too. For example:

Ever tried to design a database, but got the requirement that you should be able to reconstruct the modification history? It boils down to not deleting (ever), and 'deleted' flag fields and other uglyness. A multi-version relational database would be nice, you actually don't need modification/delete operations in this scenario, just 'updates' that add to the previous status. CouchDB [blogspot.com] does append operations.

In some cases you may not need a complete SQL database, just key->value relations, but have them scaling very well. http://project-voldemort.com/ [project-voldemort.com] states: "It is basically just a big, distributed, persistent, fault-tolerant hash table." Then they state that they provide horizontal scalability, which MySQL doesn't (OTOH, we should really look at Oracle for these things).

And you can't really say MapReduce/Hadoop [apache.org] is pointless.

Re:hmm (2, Interesting)

phantomfive (622387) | more than 4 years ago | (#30042972)

Ever tried to design a database, but got the requirement that you should be able to reconstruct the modification history? It boils down to not deleting (ever), and 'deleted' flag fields and other uglyness.

I did it by every time I did an INSERT, DELETE, or UPDATE query, taking an exact copy of the query and dumping it into a special table in the database (along with a stack trace of where it was called from). To reconstruct I could just run those commands straight from the database, to whatever point was desired. It was simple, straightforward and efficient, although I'm sure someone else has a better idea.

Re:hmm (2, Interesting)

h4rm0ny (722443) | more than 4 years ago | (#30043464)

It was simple, straightforward and efficient, although I'm sure someone else has a better idea.

I'd love someone to post it if they do. We use the same method and the one time we had to replay the sequence to get what we wanted, it took most of a day. Yes, that was because are last snapshot "starting point" was nearly a week old, but nonetheless... if technology has moved on and there's a better way of doing this, then I'm sure a lot of us will be interested.

Re:hmm (2, Insightful)

CaptainZapp (182233) | more than 4 years ago | (#30043134)

I have a feeling that this part hype, part inept programmers who don't actually understand SQL, or database optimization. The first sign for me that someone is selling bullshit is when they try to act like this is some never before seen problem, when in fact there is a good four decades of research of database optimization.

Thank you very much for this comment, you put it far more eloquently then my venting, I just wanted to grace this thread with. The real kicker though is

There are three specific problem areas: scaling out to data sets like Digg's (3 TB for green badges) or Facebook's (50 TB for inbox search) or eBay's (2 PB overall); per-server performance; and rigid schema design.

This statement is just so full of shit. And the real larff riot, for me at least, is when people or shops employing MySQL (for heavens sake!) make such statements.

Ej, folks: Rigid schema design is an asset, not a liability!

Re:hmm (3, Funny)

Hognoxious (631665) | more than 4 years ago | (#30043516)

Rigid schema design is an asset, not a liability!

Not to people who think a free format text field is the ideal place to store the price, quantity and delivery date of an order. Why not, it's long enough for it all to fit. And it saves all that moving between fields.

Re:hmm (0)

Anonymous Coward | more than 4 years ago | (#30043316)

Or it could be people who have to deal with far, far more data than any RDBMS is actually capable of handling. Dismissing everything here as "part hype, part inept programmers who don't actually understand SQL, or database optimization" is simply stupid - do you honestly think that Google, Yahoo, Facebook, eBay, Microsoft, and so on don't have one person between them who understands SQL?

Seriously - show me an RDBMS that can handle 50TB of index data without needing to resort to sharding. If you have to resort to sharding, you lose all the features that make an RDBMS worth using in the first place.

Vendor Hype Orange Alert (Re:hmm) (3, Interesting)

Tablizer (95088) | more than 4 years ago | (#30042666)

Notice that this blog is from a company pushing a cloud based solution.

That is indeed suspicious. But if they want to sell clouds, then make a RDBMS that *does* scale across cloud nodes instead of bashing SQL. (SQL as a language doesn't define implementation; that's one of it's selling points.) It may be that since there's not one out yet, they instead hype the existing non-RDBMS that can span clouds.

(I agree that SQL could use some improvements, such as named sub-queries instead of massive deep nesting to make one big run-on statement. Some dialects already have this to some extent.)
         

Re:Vendor Hype Orange Alert (Re:hmm) (1)

QuoteMstr (55051) | more than 4 years ago | (#30043224)

named sub-queries

What do you think stored [microsoft.com] functions [mysql.com] and [oracle.com] procedures [eioba.com] are?

Re:hmm (1)

Firehed (942385) | more than 4 years ago | (#30042730)

What now? The problem is that relational databases suck at scaling, and as a result we have to come up with absurd hacks like sharding to fix problems that are the fault of the storage engines (if the engine has to do that to not fall apart when dealing with large datasets, fine; but that should be entirely behind-the-scenes and transparent to the application). If these various NoSQL tools are faster than traditional databases and your data isn't particularly relational, then great! But I'd much rather see effort put into solving the lack of horizontal scalability associated with relational DBs especially since they have no problem accommodating non-relational data.

Amazon offers full MySQL cloud-based hosting (in addition to their own simpledb stuff), and every managed cloud platform I've looked at also uses standard SQL DBs (MySQL, postgres, etc.). Then again, the term "cloud" has really just come to mean "someone else's datacenter" which includes pretty much every casual web hosting plan on the planet and plenty of higher-end stuff as well. The only problem with the cloud is that the term is wildly overused.

Shards and clusters and servers, oh my! (1)

shmlco (594907) | more than 4 years ago | (#30042842)

Worse, sharding and other such solutions usually end up requiring the application to know way, way too much about the back end structure, how tables are split, where they are split, and so on.

And your solution to improving the storage engine doesn't help. At some point in a RDBMS you need to do joins and so forth, and that assumes that the machine doing the join is capable of doing so AND of handling the load and the number of transactions being tossed at it. Hence we start getting into clusters and other solutions that again need to be understood and managed.

The NoSQL solution let's you toss your request out to the "cloud" and get an answer without needing to know clusters, shards, tables, or really anything on the physical implementation side of the fence.

Re:hmm (1)

countach (534280) | more than 4 years ago | (#30042960)

"But I'd much rather see effort put into solving the lack of horizontal scalability associated with relational DBs"

I think I'd rather see the opposite: That non-relation DBs become the mainstream, and they have SQL added for the odd occasion it is useful. Relational has some nice properties for ad-hoc querying, but for everything else they are a nuisance.

Re:hmm (4, Insightful)

QuoteMstr (55051) | more than 4 years ago | (#30043190)

I think I'd rather see the opposite: That non-relation DBs become the mainstream, and they have SQL added for the odd occasion it is useful. Relational has some nice properties for ad-hoc querying, but for everything else they are a nuisance.

Berkeley DB [wikipedia.org] is a very good non-relational database with multiple language bindings, several storage engines, and transaction support. It's been around for 24 years, and has seen some appreciable use.

But that use was nothing compared to the database explosion that SQLite [sqlite.org] brought about when it was released. SQLite is almost exactly like Berkeley DB, except that it has a SQL engine on top. Almost everyone is using SQLite, and many Berkeley DB users are moving over to it.

Why? Because SQLite is relational! That constitutes some serious evidence that relationship databases are more than "a nuisance".

like the network effect and developer laziness (1)

Colin Smith (2679) | more than 4 years ago | (#30043602)

its simpler to switch to a different rdbms when your queriees are already in sql.

It's mostly just human ignorance and laziness.

Re:hmm (1)

PhrostyMcByte (589271) | more than 4 years ago | (#30042970)

There's a reason Google, Amazon, and Microsoft all designed their cloud databases without SQL -- it has a lot of features that don't scale well when your data spans a crap ton of servers. Imagine a website that does several JOIN queries for each page view -- now if you've got data spanning 50 servers, that's a hell of a lot of I/O that will be very hard to scale. When you take out these extra features, you end up not having much more than the basics -- usually just a simple insert, update, and delete with very limited transaction support. Without all the bells and whistles, there's no point in pretending you support SQL anymore.

Think of the current concurrency push, and how most people consider it very challenging to do correctly, and are hoping some magic silver bullet comes along to fix it all. The old designs don't translate to multi-core very well. This is the same thing. Cloud databases require a fundamental redesign of how you structure and query your data. The only difference is, concurrency can usually be added in steps. With Cloud databases, it's usually all-in or nothing with no hand holding.

I think a bigger problem is that the name "NoSQL" sounds like a direct attack at SQL. Really, it's just a stupid name for a larger mantra that most programmers try to abide by: use what's best for the job. SQL is great for a lot of jobs, but not all. Maybe someone will come along with that "magic silver bullet" for SQL that makes it super-scalable in all situations, but until then, we've now got more options for when it's not.

Re:hmm (2, Informative)

QuoteMstr (55051) | more than 4 years ago | (#30043174)

I don't think you've thought clearly about the problem.

If a JOIN is causing problems because it's causing too much non-local data access, then you're going to run into the same problem when you re-code the JOIN in the application. In fact, it might hit you worse because you won't benefit from the database's query optimizer.

The solution is clearly to improve locality of reference. You can do that by duplicating some data, denormalizing the database, and so on. But you can do all those things just as easily within a RDBMS, and without losing the other benefits a RDBMS gives you.

Really, your problem is that some of the things RDBMSes allow hurt when a database grows beyond a certain size. The solution is to not do the things that hurt, not ditch the things that RDBMSes do allow.

It's like complaining that your feet are sore if you walk 20 miles, then cutting off your leg to make it stop.

Re:hmm (1)

PhrostyMcByte (589271) | more than 4 years ago | (#30043490)

These databases are all schemaless, so it's not like they could use an established RDBMS... they had to make something new. I was trying to give an example of why they wouldn't bother implementing something like JOIN in something new that they planned to scale from the beginning.

I'd actually be pretty stoked to find a schemaless DB that uses mostly standard SQL, because I find schemaless makes a lot of sense for use beyond cloud-level scalability. Most of the time they let you store things in a more natural, accessible way than what tables provide.

But a big benefit they have for scalability (and I suspect the reason those companies went schemaless) is that if you decide you want to change your application-defined object schema, you don't have to copy everything to a new table or ALTER the existing one. If you've got terabytes of data, it can be well worth it to simply keep code around that knows how to read the old object type and update them on demand.

Re:hmm (1)

QuoteMstr (55051) | more than 4 years ago | (#30043534)

There exists some schema for every data set. The lack of a schema is (being generous) actually a lack of imagination and ingenuity on the part of the programmer.

you don't have to copy everything to a new table or ALTER the existing one

How is that any better than creating a second table with the new schema, and slowly (or lazily) migrating records from the old to the new table?

Re:hmm (1)

jilles (20976) | more than 4 years ago | (#30043104)

That's just another way of saying sql databases are a poor match for the requirements big websites face. SQL databases used at scale almost always throw characteristic features like transactions, joins, or even ACID out of the window in order to scale. Once you start doing that, SQL databases just become a really complicated way to store stuff. The one database that is really popular on big websites is mysql, which started out its popularity as a non transactional database. While most common features have been slapped on since, the proper way to use mysql at large scale still involves not relying on those.The way sites like facebook, ebay, etc. use it is as a dumb key-value store. Apparently, Amazon does not use database transactions. That's pretty steep for a billion dollar+ revenue ecommerce site that handles millions of financial transactions per day. I'm pretty sure they'd use database transactions if it was feasible to do so. Instead they handle transactions at the application level.

So the problem is not APIs but the fact that the underlying technology can't live up to the requirements. Never mind what is possible in theory because that's not worth shit in practice. Now there are several non sql storage systems under development with different designs that are designed from the ground up to be scalable and have all sorts of desirable qualities regarding data integrity and a growing number of people relying on them in real world situations.

Re:hmm (1)

xtracto (837672) | more than 4 years ago | (#30043352)

I find the NoSQL appraoch stupid.

Although as they say in TFA, NoSQL is more of a "Not Only SQL" more than "No Sql at all", I think people should make a difference between the "Structured SQL Language" ;-) and the implementation.

SQL is only a language ("artificial language designed to express computations that can be performed by a machine, particularly a computer" [wikipedia.org] ), used to query data from some place.

I am sure that a lot of the problems the guy is talking in his blog can be solved by using the appropriate backend technology. There is stuff like RAID for data replication, faster access, etc. There surely are other solutions for transparent distribution of data (say, a layer between the MySQL driver and the actual data) and the like.

But that does not mean that on the client side we should stop doing a "SELECT * FROM MY_FRIENDS WHERE SEX='F'".

Re:hmm (0)

Anonymous Coward | more than 4 years ago | (#30043472)

That is the main problem with this no-sql hype. I worked a while in a small company doing SaaS, with buzzwords "cloud" and "scalable" repeated in their website. They were so full of themselves that they can make "scalable" solutions, that if you wanted to talk about SQL you were mocked. Quite truly, you were not allowed to talk about SQL, or any query formalism. It was a curse word.

Their product and customers so far only basically needed a web portal for a hash table: you give a key, or a range of keys as they are actually timestamps, and the server fetches you a page with the values. Once you login that identifies the table from which the keys you are going to ask might be and every customer usually had only 1 person who might be asking these values. They didn't even have much parallelism that multiple people are going to ask the same values. So you could just put up a server for every customer. Well they did manage a small group of servers to store the actual data for each customer, but we're still talking about a handful of dedicated machines.

It was nothing like Facebook or Ebay. Basically the simplest kind of a storage problem you can imagine with just enough volume that a single Microsoft SQL database cannot handle it, and the solution was to restrict the users to be able to do only one simple query at a time.

For them this was a ground breaking discovery and they considered themselves to be a gift from god. Their CEO actually told several professors of computer science that they should concentrate on scalable systems and that they can offer some expertise on that. The same CEO also thought that their inventions were so staggering that it should not be a problem to write a few articles and get them published. Needless to say that this did not happen quite that easy.

Re:hmm (4, Insightful)

KalvinB (205500) | more than 4 years ago | (#30042644)

For the vast majority of use cases, large data sets can be made logically small with indexes or physically small with hashes.

If you're dealing with massive data you're probably not dealing with complex relationships. E-Mail servers associate data with only one index: the e-mail address. Google only associates content with keywords. E-mail servers logically and physically separate email folders. Google logically and physically separates the datasets for various keywords. So by the time you hit it, it knows instantly where to look for what you want. You don't have a whole complex system of relationships between the data. It looks at the keywords , finds the predetermined results for each and combines the results.

Re:hmm (2, Insightful)

tkinnun0 (756022) | more than 4 years ago | (#30043590)

What if you ARE dealing with massive data AND complex relationships?

Re:hmm (2, Interesting)

Prof.Phreak (584152) | more than 4 years ago | (#30042674)

Depends. We've been using Netezza with ~100T of data, and... well... it takes seconds to search tables that are 30T in size. I'd imagine Teradata, greenplum and other parallel db's get similar performance---all while using standard SQL with all the bells and whistles you'd normally expect Oracle SQL to have (windowing functions, etc.).

Re:hmm (0)

Anonymous Coward | more than 4 years ago | (#30042916)

If you're talking about table scans, it depends on how much money (and operational expense - space, maintenance, power) you spend. For a fixed time requirement for a scan of a lot of data, Teradata's probably going to cost you both arms and legs compared to Netezza which in turn is probably going to cost you an arm and leg compared to Greenplum which in turn might cost you an arm compared to Paraccel which, in turn might just cost you your little finger (depending on the nature of the constraints).

If your data is sufficiently static that you can afford to maintain a lot of indexes, the number of limbs you have to sacrifice is probably more similar across the products.

Of course, "seconds" is an eternity in a consumer facing web app.

Re:hmm (4, Insightful)

mzito (5482) | more than 4 years ago | (#30042788)

Uh, no, that is not correct. Relational DBMSes such as Oracle, Teradata, DB2, even SQL Server are all designed to scale into the multi-terabyte to petabyte range. The issue is one of a couple of things:

- Cost - "real" relational databases are expensive. I once had a conversation with someone who worked at Google, who talked about how much infrastructure they have written/built/maintain to deal with MySQL. Many of those problems were solved in an "enterprise" DBMS 3-10 years ago. However, the cost of implementing one of those enterprise DBMS is so high that it is cheaper to build application layer intelligence on top of a stupid RDBMS than purchase something that works out of the box
- Workload style - most of the literature around tuning DBMS is for OLTP or DSS workloads. Either small question, small response time (show me the five last things I bought from amazon.com) or big question, long response time (look through the last two years worth of shipping data and figure out where the best places to put our distribution centers would be). Many of these workloads are combos - there could be very large data sets and complex data interdependencies, with low latency requirements. It may be possible to write good SQL that does these things (in fact, I know a couple luminaries in the SQL space that will claim just that), but the community knowledge isn't there.
- Application development - when you're building your app from scratch, you can afford to work around "quirks" (bugs) and "gaps" (fatal flaws) to get what you need. This dovetails with the other issues, but when your core business is building infrastructure, it's worth your while to deal with this. When your core business is selling insurance or widgets, or whatever, it is not.

None of this is to say that the "nosql" movement is a bad thing, or that there's no reason for its existence, or that no one should bother looking at it. However, there is a definite trend of "this is so much better than SQL" for no good reason. SQL has scaled for years, and I know loads of companies who work with terabytes and terabytes of data on a single database without any issue.

A far more interesting discussion is the data warehouse appliance space - partitioning SQL down to a large number of small CPUs and pushing those as close to the disk as possible.

Re:hmm (1)

FlyingGuy (989135) | more than 4 years ago | (#30042894)

Well said.

Lack of imagination is another problem when people look at the dry subject of data storage and retrieval. Yes at some point the math has to work, but in the meantime looking outside the box and taking an RDBMS in different and new directions within the engine itself can give those mountains of scale that are the holy grail these days.

While I was writing that paragraph I thought it might be interesting to take a structure such as an n-node b-tree and then make the leaves table references and the nodes pointers to those tables and overlay that structural idea into a single table. That single table could then be queried to gain access to a very interesting amount of data. Now if the leaves are not required to be unique, then the entry that would normal be a record pointer could then be an instance pointer, or a server pointer, cluster pointer and simply take the original query and have the result be returned from just about anywhere.

Ehh someone undoubtedly thought of that already...

Dynamic Relational: change it, DON'T toss it (5, Interesting)

Tablizer (95088) | more than 4 years ago | (#30042546)

The performance claims will probably be disputed by Oracle whizzes. However, the "rigid schema" claim bothers me. RDBMS can be built that have a very dynamic flavor to them. For example, treat each row as a map (associative array). Non-existent columns in any given row are treated as Null/empty instead of an error. Perhaps tables can also be created just by inserting a row into the (new) target table. No need for explicit schema management. Constraints, such as "required" or "number" can incrementally be added as the schema becomes solidified. We have dynamic app languages, so why not dynamic RDBMS also? Let's fiddle with and stretch RDBMS before outright tossing them. Maybe also overhaul or enhance SQL. It's a bit long in the tooth.

More at:
http://geocities.com/tablizer/dynrelat.htm [geocities.com]
(And you thought geocities was de

Re:Dynamic Relational: change it, DON'T toss it (2, Insightful)

Prodigy Savant (543565) | more than 4 years ago | (#30042774)

What you are suggesting is to mimic a key-value design with something like a json or serialized data as the value.

This would work if you never had to index on any of the values in the json. All your sql queries must have there where parts running off the key.

This is a problem that couchdb and mongodb solve.

I am not trying to paint SQL in an unflattering shade -- there would still be a lot of situations where an RDBMS design would be optimal. Infact, I am currently working on a mongodb/mysql hybrid solution for a large web site (larger than /. )

Re:Dynamic Relational: change it, DON'T toss it (2, Interesting)

Tablizer (95088) | more than 4 years ago | (#30042976)

What prevents indexing a dynamic-relational DB? Although I said that you didn't need a data-definition language, but that doesn't mean one *must* skip the DDL (for things such as indexes). Another thing to explore is auto-indexing. If so many queries keep filtering by a given column, then it could automatically put an index on it.

Re:Dynamic Relational: change it, DON'T toss it (1)

TubeSteak (669689) | more than 4 years ago | (#30042982)

Let's fiddle with and stretch RDBMS before outright tossing them.

This isn't "it ain't broke, don't fix it"
Instead we're dealing with "I have a hammer, so every problem looks like a nail"

The desire to "fiddle with and stretch" software instead of sinking dollars into something new is
part of the reason we have a clusterfark of decades old technologies & hardware that won't go away.
Sometimes you have to accept that a hammer isn't the right tool for the job.

NoSQL? That'd Be DL/I, Right? (4, Informative)

BBCWatcher (900486) | more than 4 years ago | (#30042548)

I think I've heard of non-relational databases before. There's a particularly famous one, in fact. What could it be [ibm.com] ? Let's see: first started shipping in 1969, now in its eleventh major version, JDBC and ODBC access, full XML support in and out, available with an optional paired transaction manager, extremely high performance, and holds a very large chunk of the world's financial information (among other things). It also ranks up there with Microsoft Windows as among the world's all-time highest grossing software products.

....You bet non-relational is still highly relevant and useful in many different roles. Different tools for different jobs and all.

Re:NoSQL? That'd Be DL/I, Right? (2, Informative)

Tablizer (95088) | more than 4 years ago | (#30042708)

IMS is very efficient for known query patterns, but not very flexible for stuff not anticipated. This is a common characteristic of non-relational databases: optimize for specific query paths at the expense of general queries (variety).

Often IMS data is exported and re-mapped nightly or periodically to a RDBMS so that more complex queries can be performed on the adjusted copy. The down-side is that it's several hours "old".

Note that it's also possible to optimize RDBMS for common queries using well-planned indexing and techniques such as clustered indexes, which put the physical data in the same order as the primary or target key. Whether that can be as fast as non-relational techniques is hard to say. It may depend on the skills of the tuner.
                 

Re:NoSQL? That'd Be DL/I, Right? (1)

Trepidity (597) | more than 4 years ago | (#30042808)

To some extent relational databases are making a similar bet, optimizing for particular kinds of access at the expense of more general queries. There are more expressive database languages that support more general kinds of queries, including quantification and variable binding and such, like Datalog, but they're harder to make efficient (though they can also be optimized for common query paths). I see SQL as something of a middle-of-the-road choice: more general than tuple stores and some other approaches, but less general than Datalog and similar approaches. That middle-of-the-road choice might be the optimal one for a lot of applications, but it's not clear to me that it's some sort of global optimum.

Re:NoSQL? That'd Be DL/I, Right? (1)

Tablizer (95088) | more than 4 years ago | (#30043056)

Well, I'll buy that. Maybe DataLog performance would be improved if enough people used it to justify spending resources on engine improvements. But it has a much longer learning curve than SQL for most people. So unless somebody can show a "killer app", SQL is still the alpha male on the street.

Starting to love the idea (4, Interesting)

Just Some Guy (3352) | more than 4 years ago | (#30042562)

I'm a huge PostgreSQL fan and took classes in formal database theory in college. I'm saying this as someone who understands and thoroughly appreciates relational databases: I'm starting to love schema-less systems. I've only been playing with CouchDB for a few weeks but can certainly see what such stores bring to the table. Specifically, a lot of the data I've stored over the years doesn't neatly map to a predefined tuple, and while one-to-one tables can go a long way toward addressing that, they're certainly not the most elegant or efficient or convenient representation of arbitrary data.

I'm certainly not going to stop using an RDBMS for most purposes, but neither am I going to waste a lot of time trying to shoehorn an everchanging blob into one. Each tool has its place and I'm excited to see what niche this ecosystem evolves to fill.

Hstore (1, Informative)

Anonymous Coward | more than 4 years ago | (#30043674)

You are aware of PostgreSQL's hstore: a type representing basically a name-value mapping (think Perl hash or Python dictionary). You can put an index on it answering queries like "find all records where the field has a mapping "foo => bar", or contains mappings {foo => bar, baz => grumble} and more.

Cool stuff.

Hashes are your friend (0)

KalvinB (205500) | more than 4 years ago | (#30042564)

In the example of inbox's no user has to look at another user's inbox so the first step is to simply find the current user's mail.

I typically use MD5 since it's very good at evenly distributing information. For example stock symbols are heavily weighted to common letters so there are lots of stock symbols that start with "s". But, if you MD5 the stock symbol you get an even distribution based on the first two hash characters to put the historical data into 256 tables. You could also just put it all in one massive table and use the first two characters in their own column with an index. The advantage of using multiple tables is that it's easier to later split the tables onto multiple physical systems.

So MD5 the Facebook user ID. Use the first four characters to pick the database server. Use the next four characters to pick the table and then select from there. By the time you're even referencing the table you're down to a handful of accounts sharing one table. Searching the User's email is then trivial as the dataset is small.

Another example of MD5 awesomeness is finding a URL and associated data very quickly (useful for DMOZ data). In MySQL varchars can be up to 255 characters while URLs with various parameters can be any length so you could try to index the TEXT field OR you simply hash the URL and when you want to look up a URL you search for the easily indexed hash.

Working with large sets of data is only a problem if you don't devise ways to break up the data. If Facebook needs to search all the user's email for various stuff then they can run a script that goes through every table in every database. They don't have to run a single query which would take forever. With distinct sets of data you can quickly start getting results to verify your code is accurate and start digging through the results while the script continues to run.

Re:Hashes are your friend (2, Insightful)

MightyMartian (840721) | more than 4 years ago | (#30042616)

In the olden days you didn't have centralized message stores. That's largely a relic of PC-based networking schemes like Novell, Lotus Notes and Exchange. The Unix model used individual mailboxes (in fact, the whole breakdown was for all of a user's data being in their own hierarchy). Obviously the Unix mailbox scheme wasn't that great as we started saving many megabytes of data, so you create indexed systems, but each user's mail is still effectively independent. I've used Pine to navigate my old mbox archives and it can move through even unindexed email at speeds that put bloated monsters like Exchange to shame.

Clearly the issue with scalability in general is simply one of optimization. If you're returning relatively small pieces of information, then an RDBMS is the way to go. If all your databases are basically blobs, well then it's probably not going to be that effective. I still feel that blobs are heavily abused.

I think part of the problem with RDBMSs is simply that a lot of people don't use them properly, and create the bottlenecks through bad design.

Re:Hashes are your friend (1)

Malc (1751) | more than 4 years ago | (#30042928)

I bet it can't find old messages at the speed I do with X1 + 10 years of Exchange-based email (more than 250,000 messages). I stuck with Pine through the end of the 90s, when everybody else I worked with was switching to Netscape Communicator. I wouldn't go back now though.

Re:Hashes are your friend (1)

Firehed (942385) | more than 4 years ago | (#30042822)

That's very clever and all (and I'm sure quite effective), but it doesn't address the original issue: RDBMSs suck at scaling. We should be able to throw a rack of servers with a load balancer and a SAN at the problem and have it go away. We shouldn't have to rewrite our application logic to scale it out any more than we currently have to write special code because our hard drives are in RAID5 (read: not at all).

The storage engines and their indexing should take care of all of this nonsense automatically. You might have to help them out by being a bit more specific than key `user_id` (`user_id`) (your stock tickers are a good example), but fundamentally the code that helps scale out a database should be part of the database and not the application that's using it.

But, life isn't so kind to us. Oh well, maybe in time.

Re:Hashes are your friend (0)

Anonymous Coward | more than 4 years ago | (#30042954)

How do Teradata or Greenplum or Netezza not scale?

Sure, they will cost you and for limited query patterns, it may be cheaper to and sensible to implement a special purpose solution. The customer facing application portion of things like Facebook, gmail, Amazon, LinkedIn and the like are pretty simple and predictable so it's relatively easy to implement application specific solutions. In data warehouse and hard core BI, there are often many tables (and new ones appearing from time to time) and the queries are quite unpredictable - analysts want to ask their question now based on business questions and get an answer real soon - not submit a Work Order to the MapReduce Question Department (like in the old days).

hi monkeys (0)

Anonymous Coward | more than 4 years ago | (#30042580)

Hi Monkeys. There are MPP databases that scale way past this and allow you speedy access that includes ansi sql access (petabytes in teradata's case). The newer compresed column store engines for many uses destroy hadoop in analytics use cases, per in performance and far fewer machines, plus the ability to use sql.

Stop the trype hype.

Everything old is new again (5, Interesting)

QuoteMstr (55051) | more than 4 years ago | (#30042656)

We didn't start with relationship databases. RDBMSes were responses to the seductive but unmanageable navigational databases [wikipedia.org] that preceded them. There were good reasons for moving to relational databases, and those reasons are still valid today.

Computer Science doesn't change because we're writing in Javascript now instead of PL/1.

Curiously spurious (1)

KeensMustard (655606) | more than 4 years ago | (#30042740)

Collectively, these alternatives have become known as NoSQL databases. The fundamental problem is that relational databases cannot handle many modern workloads.

I'm sceptical. Why is the problem worse now then in the past? Relational theory in practice is abstracting the data such that a human/application can understand it as logical constructs. How the data is PHYSICALLY organised is a matter of implementation - the relational theory doesn't place any constraint (!) on how the data is organised/retrieved/updated - except that by giving a broad design pattern , duplication is minmised, and so then is processing overhead. MPP (Parallel Processing) lends itself quite neatly to any large set of data - many implementations will continue to scale linearly above the PB size (e.g Teradata). Looks to me like a sales pitch.

10 years ago, they had the same problem (2, Interesting)

johnlcallaway (165670) | more than 4 years ago | (#30042772)

I was an admin on a system that spread the data across 10 database servers. Each server had a complete set of some data, like accounts, but the system was designed so that ranges of accounts stored their transaction type data a specific server, and each server held about the same number of accounts and transactions. As data came in, it was temporarily housed on the incoming server until a background process picked it up and moved it to the 'correct' one. This is a very simplistic view, but the reality was that it worked quite well. Occasionally, there was a re-balancing that had to be done. But it was very scalable. The incoming data wasn't so time sensitive that if it took a few hours to get moved, everything was still OK. When an 'online' session needed data, it knew which server to connect to to get it. Processing was done overnight on each server, then summarized and combined as needed.

So yes .. .people have been coming up with innovative ways to solve these problems for a very long time.

And they will continue to do so.

I/O bottleneck (1, Interesting)

Begemot (38841) | more than 4 years ago | (#30042800)

Let's not forget where the bottleneck is - the I/O. It's expensive but once you build a fast and solid storage system, correctly configure it and partition your data properly over a sufficiently large number of hard drives, RAIDs, LUNs etc., you might be able to use SQL. We run a database of 10TB on MS SQL with hundreds of millions of records with an equal rate of reads and writes and could not be happier.

Re:I/O bottleneck (1)

FlyingGuy (989135) | more than 4 years ago | (#30042932)

WOW! Do you get ahold of an older SyBase version?!

But seriously, yeah even MS-SQL can hang in if you set up it up just so and never let MS patch the thing forever after.

You pick the DBMS that works for you (1)

mkairys (1546771) | more than 4 years ago | (#30042878)

Most RDBMS implementations on the web are generally only used to store data and perform very basic queries such as get and store operations. Personally I don't really see the issue of using one for a web applications since they are proven to work well and with the right design and caching solution are more than capable of handling a popular website such as Digg or Facebook. The only real issue with these sites is to prevent bottlenecks you would generally need to throw more hardware at it than may be necessary (although memory is very cheap these days so its a non-issue for most companies).

Memcached has shown to really help solve many performance issues for relational databases since the database won't constantly perform complex queries to grab data, it will just pull the result from a hashed index stored in memory. MemcachedDB http://memcachedb.org/memcachedb-guide-1.0.pdf [memcachedb.org] is looking very promising to use to get rid of a RDBMS all together for certain data such as user sessions since it focuses on performance rather than functionality. Even then I think it all really boils down to choosing the right tool for the job, if there's data that you know is going to be a performance bottleneck in the database, you look for more creative solutions to store and process that data. There's nothing stopping you from running two or more different types of databases for the task at hand.

30 Years? (1)

uncqual (836337) | more than 4 years ago | (#30042974)

...the traditional relational database technology that has served us well for over thirty years...

Hmm... Before 1979, market share for RDBMS was TINY. It really didn't begin to "serve us well" until the mid 80's.

This again (2, Interesting)

Twillerror (536681) | more than 4 years ago | (#30043158)

Wow a "object oriented" database discussion again. I've never read one of these :P I've only been doing this 15 years and I've lost count of these talks a long time ago.

What is the difference between schema less and schema rigid anyways. I don't see what that has anything to do with performance. The real issue is uptime and transaction support. People want to add a column or index without taking the system down. That is different then dealing with PBs of data. Most table structures can easily deal with that much data.

If you have a DB that is big you have lots of outs. Pay...get Enterprise version of whatever. Break it into many DB/tables and merge together. Archive. Archive I bet will get most people by. Does eBay really need all that bidding info for items over a few weeks old...only for analysis maybe. Move that old stale data out of the active heavily hit data tiers.

The fact remains that MySQL should be able to scale to TBs of data. The fact that it can't is a failure of the product. All the others have been for a while. Why can't it...I don't know...the fact that it uses a F'in different file for each index on a table. If you don't understand how old school that is start using Paradox. Just because it is open source doesn't mean it has to be so damn out of date. Please for the love of god save multiple tables/indexes in the same pre sized file...god.

Google has all the power to go and use something different. Google gets to cheat. Google is a collection of pretty static data. They scan the internet a lot, but imagine if every time you did a search Google had to scan every web page on the planet, index them, and then give you search results. That would be impractical for sure. So for now they just store big collections of blobs and a big fast index for searching keywords and links to pages. Impressive none the less, but it's not like your typical app. GMail is...funny that it is one system they've had problem with. Even then EMAIL DOESN'T CHANGE. It's user specific, but it's still f'in static. GoogleTastic if you ask me.

The fact is people are using RDBMS right now to solve real world problems. Some start up is finding a way to tweek MySQL to do something cool and then posting it on a blog...then all of the sudden RDBMS is dead. RDBMS is fine, it will be fine for at least 10 years if not longer. In that time it will evolve as well so that it will be around for even longer. MySQL in 5 years will have online index addition, performance hitless online column addition, partitioning, geo indexing, XML columns, BigASS table support, Oracle RAC like support, and a thousand other features that some RDBMSs have today and some will not see for even longer. Then developers that spent all that cash developing custom shit will revert and post comments like this one.

That's the way it goes in software development. The middle tier gets bigger, gets inept, custom shit comes out, it gets integrated into the middle tier shit....continue;

Instead of pronouncing death start talking about how dated a 2 dimensional result set is. JOINs should return N dimension result sets similar to XML with butt loads of meta data. ODBC/JDBC are dated...so updated them.

select u.login, ul.when from users u join user_logins ul as logins.login ON ul.user_id = u.user_id where u.name = 'me' should equal something like a nested XML packet instead of duplicated crap when there is more then one user_logins.

SQL is NOT the Physical Storage or RDBMS engine (1)

Invisible Now (525401) | more than 4 years ago | (#30043216)

Can we agree that SQL is a high level language for capturing the set theory query logic and is COMPLETELY INDEPENDENT of the engine and physical storage that actually generates the query plan and makes the heads fly to cache and return data?

Structured
Query
Language

not

Stupid
Quixotic
Layout
(Of tables, pages, indexes, drives, heads,spindles, SANs, etc...)

Right?

TFA is bullshit (1)

WarwickRyan (780794) | more than 4 years ago | (#30043324)

I've seen OLAP systems in the 100TB range which work fantastically well on Oracle.

Object databases could be a nice idea, but not for performance or scaling reasons. An object oriented database would be beneficial as a method to sidestep ORM. So you can, effortlessly and without any significant amount extra work persist the state of your objects.

Then you can build POxOs to represent your objects and just implement a few lines of code to have them persisted.

Not sure if anything like that already exists. I certainly don't know of anything in the C# world, but I expect there's some funky named java project which does it.

Re:TFA is bullshit (1)

QuoteMstr (55051) | more than 4 years ago | (#30043452)

How on earth is that better than using an existing ORM library? Even if you have to write your own, an ORM isn't particularly difficult to write.

Re:TFA is bullshit (1)

WarwickRyan (780794) | more than 4 years ago | (#30043568)

I can only talk about nHibernate or LINQ2SQL, but in either of those cases I have to do something in the database and/or write some XML. That's duplication of work: you've already defined the properties of the object in the class, so adding anything else on top of that is a waste of time.

With ORM you usually end up keeping three seperate definitions in sync - the database table, the ORM metadata (mappings) and the object. That costs time, and time costs money.

Rails have solved this through scaffolding and use of ActiveRecord. That's nice, but your definition's in a database. Not in your programming language. Which is dumb, because as a developer you're spending 90%+ of your time coding in your programming language. So it follows that the best place to define the entities and relationships is there. Then let an object-store handle the persistance, searching, indexing etc.

In my job (building small scale LOB application) the bottlenecks are more my time than scaling or performance (which'd be a problem with an object-store), so an object-cache solution would be idea.

Re:TFA is bullshit (1)

QuoteMstr (55051) | more than 4 years ago | (#30043586)

Is writing SQL table definitions instead of equivalent assertions in your language of choice really the bottleneck of your development? Switching to an entirely different database paradigm solely to avoid writing "CREATE TABLE" sounds inefficient.

Re:TFA is bullshit (1)

WarwickRyan (780794) | more than 4 years ago | (#30043650)

The problem is not that it's 'instead of', it's 'as well as'. Sure, you can go all ActiveRecord, but then you loose a lot when it comes to your object design. Which you really don't want to do.

Load More Comments
Slashdot Login

Need an Account?

Forgot your password?