Beta

Slashdot: News for Nerds

×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

NoSQL Document Storage Benefits and Drawbacks

samzenpus posted more than 2 years ago | from the pros-and-cons dept.

Data Storage 96

Nerval's Lobster writes "NoSQL databases sometimes feature a concept called document storage, a way of storing data that differs in radical ways from the means available to traditional relational SQL databases. But what does 'document storage' actually mean, and what are its implications for developers and other IT pros? This SlashBI article focuses on MongoDB; the techniques utilized here are similar in other document-based databases."

cancel ×

96 comments

Same as the old boss (5, Interesting)

greg1104 (461138) | more than 2 years ago | (#40213101)

It's so cute how NoSQL developers have reinvented the XML database.

Re:Same as the old boss (1)

tepples (727027) | more than 2 years ago | (#40213217)

True, but this time it's "web scale", whatever that means. And a lot of the concerns in that video [youtube.com] appear to have been addressed, with durability provided through a journal file.

Re:Same as the old boss (2, Informative)

Anonymous Coward | more than 2 years ago | (#40213257)

The article lied. It mentioned benefits and drawbacks in the title, but all it described was a collection of collections of key-value pairs. Is that really what this whole NoSQL thing is about?

Re:Same as the old boss (1)

Anonymous Coward | more than 2 years ago | (#40217411)

Yes. It's a column-based store, as opposed to a row-based store.

Re:Same as the old boss (1)

Anonymous Coward | more than 2 years ago | (#40213367)

XML as database has a hierarchie, NoSQL looks like a key (_id) value (hashes) flat unstructured mess. I'm not fond of XML/Xpath/Xwhatever but atleast I think it has more structure compared to NoSQL

Re:Same as the old boss (2)

greg1104 (461138) | more than 2 years ago | (#40213511)

JSON is relevant to NoSQL because it gives a good answer to "how do I store more complicated things than key/value lookups?", one that's even possible to decode in a web browser noadays. XML databases gave an answer to "how do I store schemaless data in a relational database?", a similar issue. Both combinations--relational + XML, NoSQL + JSON--end up providing the same basic capabilities.

Re:Same as the old boss (3, Informative)

Nadaka (224565) | more than 2 years ago | (#40213685)

JSON is crap for storing arbitrary structured data and collections for web applications.

In javascript you can easily construct an object that is both an "Array" and has named attributes (an associative array). However, you can't recreate that object with valid JSON.

JSON also introduces a fantastic new method of inserting arbitrarily executing code into a web application, demanding yet another set of defenses against insertion attacks to be developed.

It is a problem masquerading as a solution to a problem it can't actually solve.

JSON sans eval (1)

tepples (727027) | more than 2 years ago | (#40213855)

JSON also introduces a fantastic new method of inserting arbitrarily executing code into a web application

How so, if you parse the JSON in your own code [google.com] instead of eval()ing it?

Re:JSON sans eval (2)

Nadaka (224565) | more than 2 years ago | (#40213987)

That was the point. Not everyone does that.

The defacto standard for instantiating an object from json is still eval()

Re:JSON sans eval (1)

arose (644256) | more than 2 years ago | (#40214861)

Just like the standard is to feed forms straight into MySQL. That's the reason MySQL sucks after all.

Re:JSON sans eval (1)

wirelessduck (2581819) | more than 2 years ago | (#40217693)

All major web browsers have supported JSON.parse() for a long time; including IE8. Anyone who is still using eval() for parsing JSON should come out of their cave and get with the times. I doubt you would find any serious web developer who still uses eval(), except in "extreme" circumstances. There's just no real need for it in most applications.

Wrong, wrong, and wrong. (0)

Anonymous Coward | more than 2 years ago | (#40213909)

You can create associative arrays in JSON.

"associations": {
"foo":"bear",
"fu":"bar",
"one":1
}

As for executable code, all JSON is JavaScript but not all JavaScript is JSON.

You can put arbitrary code in any string, regardless the encoding.

Re:Wrong, wrong, and wrong. (1)

Nadaka (224565) | more than 2 years ago | (#40213959)

Ok, take that associative array and add non associative elements to it.

Or more accurately, take a non associative array and add associative elements to it

["a", "b", "c" "foo":"bar"] is not valid JSON
niether is {"a", "b", "c" "foo":"bar"}

Yet I can do:
var stuff = ["a", "b", "c"];
stuff.foo = "bar";

That javscript object can not be serialized to valid JSON.

Re:Wrong, wrong, and wrong. (4, Informative)

gazbo (517111) | more than 2 years ago | (#40214113)

{1:"a",2:"b",3:"c","foo":"bar"}

Sure it won't create an instance of Array, but if you're using an Array to also be an associative array then really I think JSON is the least of your worries.

Re:Wrong, wrong, and wrong. (3, Funny)

gazbo (517111) | more than 2 years ago | (#40214159)

Or if you want to be avant garde, I suppose you could begin the numbering at zero *blames wine*

JavaScript under Wine (1)

tepples (727027) | more than 2 years ago | (#40217751)

Wine? Which web browser's JavaScript engine changes its array indexing behavior in this way when run under a free reimplementation of the Windows API [winehq.org] ?

Seriously? (0)

Anonymous Coward | more than 2 years ago | (#40223181)

You really have no idea what you are doing [blogspot.com] .

Your variable, stuff is, in fact, an Array with literal strings for its first three indices. But calling stuff.foo = "bar" does not add to them. Instead, what you have created a new property on that instance called foo, which joins other Array object properties like length. Any half-intelligent JSON serialization routine will notice the object has type Array and will go about looking only at the indexed values.

Why you would ever want to do something so confusing as combining indexed and mapped values in an object this way is beyond me.

By the way, you can very easily iterate over object properties.


for (var property in object) {
    value = object[property];
}

Seriously, learn the tool before you start criticizing it. And, as it happens, this is one reason JavaScript has developed such bad reputations: clueless hacks like you apply the language in utterly bizarre and foolish ways, and then go on about how “LOL Javascripts are Teh SUX0R d00dz, it's sl0w and lAaMe. U shuld all codez teh Ruby.”

Re:Seriously? (1)

Nadaka (224565) | more than 2 years ago | (#40223567)

I know exactly what I am doing. I know I can iterate over all properties and array contents using in. You merely have poor reading comprehension.

  I am adding a name property to an object, one that also happens to be an array. Exactly like I said I was. I am fully aware that I am not adding another element to the array when I do stuff.foo = "bar";

Your half intelligent json serialization routine that ignores the properties added to an array is wrong. Just like the other guys suggestion to implement it as an associative array with numeric identifiers. If you hack it to produce valid JSON you do not get the same object out the other end of the pipe. You get something else that is almost but not quite entirely unlike tea.

I know the tool, I know the technology. My criticism is precisely because I use this shit every day.

Here is a question for you. What is the simplest and most efficient way to create an XML mapping to javascript objects. XML elements have attributes and contents that may be other xml element or strings. The names of those elements are arbitrary, one can not simply assume that an xml element wont have an attribute called "contents" or "innerXML" or whatever you decide to call it.

The answer to this question is this: You take a javascript array, the objects that the array contains are the child elements of the element. You then add the xml attributes as properties of the object (that is also an array).

More clueless moderators. (0)

Anonymous Coward | more than 2 years ago | (#40223231)

There is no reason this should be moderated anything other than “-1, Misinformed.”

Worse than the old boss (5, Interesting)

jd (1658) | more than 2 years ago | (#40213613)

The "old old boss" would be the CDF/NetCDF/HDF family of self-describing distributed storage solutions. They predate XML by a long way and are - I believe - the first true self-describing method of storing, indexing and searching data.

For the most part, they support network interconnections between instances, so you can have your virtual storage distributed over as many physical systems as you like. The users will never see the difference except in terms of speed. This gives you all the benefit of NoSQL's distributed model (which XML lacks) but with several decades more development in the database design.

But wait! There's more! If you order in the next gazillion years, you get OpENDAP absolutely free! (Which it is anyway.) OpENDAP will translate between any two data formats, so if one site wants to view the data as, say, a conventional database, another wants to look at it as a collection of spreadsheets and a third is expecting XML data, you'd have OpENDAP translate between client form and central repository form.

I have no objections to Mongo or Memcache, they're very powerful and are very useful, but we're still ultimately talking about technology everyone else has had since 1985, thanks be to NASA, and many NoSQL technologies are really just network-aware versions of the DBM/NDBM/BDB/GDBM/QDBM family which have existed since Unix began.

NoSQL definitely has a place - I would not want to try serving cached web data from HDF5 - and it's an important place. But that's just as true for Hierarchical Databases, Star Databases (aka "Data Warehouses"), "genuine" (ie: actually complies with Codd's rules) relational databases (SQL isn't truly relational in the Codd model, merely a subset), and so on.

It's time we got away from one-size-fits-all ideas, which violates the Unix ethos anyway, and get back to using best solutions for specific problems rather than passable solutions that fail at everything. These are all wonderful, highly specialized solutions to highly specific problem types. Treating them as such will always produce a better answer than force-fitting solutions into not-quite-failing with problems they aren't designed for.

Re:Worse than the old boss (1)

bzipitidoo (647217) | more than 2 years ago | (#40216423)

It's time we got away from one-size-fits-all ideas

What do you mean? We shouldn't use ASCII? Or Unicode? How about what we in the West know as the Arabic numbering system? Universality has its place. Standards are useful and important. PL/1 failed perhaps because programming is more complicated. Though computing is universal, we have not yet managed to come up with a good universal programming language. But data may be simpler.

Having said that, I think HDF5, NetCDF, JSON, YAML, and SQL (and NoSQL) all fail on the universality front. You would not want to use any of them as a basis for a file system.

I would not want to try serving cached web data from HDF5

Neither would I. But why? Why isn't HDF5 good enough for that? Same reasons none of them are suitable for file systems. JSON, YAML, and SQL have no mechanism for handling fixed size blocks. HDF and NetCDF do have such mechanisms, but they are too limited. Can't readily overlay two different fixed size block structures on the same data. Makes it difficult to have CRCs for blocks of a size that is optimized for the storage medium and which have nothing whatever to do with the organization of the data.

I think the idea of unifying the file system, the database, and packet streams is possible and useful, that we can have universal representation and handling of data. I do not buy this contention that these are fundamentally different problems and that we should therefore accept inferior, specialized solutions to them.

Re:Worse than the old boss (1)

jd (1658) | more than 2 years ago | (#40216917)

The thing with self-descriptive data is it doesn't matter if you personally use ASCII, EBCDIC, Unicode or wide characters. You can map whatever to whatever. There is a standard, but it is in the description and not in the described. Specialized solutions are superior - in their niche. I would take a toolbox with a thousand types of saw, hammer and blade over a single Swiss Army Knife because each tool is superior even though no single blade can do everything.

A universal system would be an object-oriented Codd DBM. It would be able to do absolutely anything. Slowly.

Re:Worse than the old boss (0)

Anonymous Coward | more than 2 years ago | (#40217259)

A universal system would be an object-oriented Codd DBM. It would be able to do absolutely anything.

We don't even need to bring in any kind of object-oriented terminology (since all the abuses in the last decades), let's just say that an RDBMS with a properly implemented type system would be such a universal system. But there's still quite some research to do, Date's Third Manifesto lays down the foundation but it still leaves some open problems.

Slowly.

I disagree here. A system based on the relation model is not inherently slow, it's just that more research needs to be done in several directions like how to physically store data and how to handle transactions.

Think of it: row-oriented storage vs. columnar storage or have a look the Transrelational approach http://bookboon.com/en/textbooks/it-programming/go-faster [bookboon.com] (e-book free download). Ok, it ended in nothing probably because its efficiency was overestimated, but the point is that we have a lot more to explore aout the physical implementations. Think about all the indexing techniques for specialized data or think about specialized scenarios: how far can we push filter predicates to the hardware level (e.g. Nettezza), even directly to disks; how efficient are b-trees (or any kind of tree) vs. skip lists when 99% of the database is in memory? And so on and so forth. Even research on transaction protocols needs to be done more, think for example at Calvin http://cs-www.cs.yale.edu/homes/dna/papers/calvin-sigmod12.pdf [yale.edu] that promises to work on top of any kind of physical implementation.

The big problem I see here, addressed by NoSQL systems, is that people donìt want to understand their data, they simply want to store it and sort of retrieve it by simple means. Which doesn't quite fit with the relational model.

Re:Worse than the old boss (1)

vegiVamp (518171) | more than 2 years ago | (#40216935)

> What do you mean? We shouldn't use ASCII? Or Unicode? How about what we in the West know as the Arabic numbering system?

None of those *are* one-size-fits-all. ASCII and Unicode are very good at encoding text in human-readable forms; but I wouldn't want to encode my porn in them. The arabic numbering system is very good at expressing discrete quantities, but kindly refrain from writing a whole novel in it.

The point is that this NoSQL stuff is being hailed as the next big thing, which shall Smite the Relational Unbeliever with Fire, Brimstone and JSON. It isn't - it's merely a network-aware reiteration of an old idea - and it's not like NoSQL is just one thing, either - there's dozens of them, each with their own strong and weak points. In the end, it's webdevelopers who didn't /quite/ grasp something they thought was cool, and that then got turned into managementspeak.

Re:Worse than the old boss (1)

julesh (229690) | more than 2 years ago | (#40217537)

I don't think anybody is claiming NoSQL is new. Many NoSQL products are just incremental improvements over old-style object-oriented databases, after all.

All that is new is the concerted push to point out to people that RDBMSs and SQL shouldn't necessarily bethe automatic solution to every problem. They're extremely good at certain tasks, perhaps even for a large majority of tasks, but there are some instances where they are not the best tool for the job. The NoSQL people just want to make sure we all consider this when choosing the tech for out next projects, and perhaps evaluate whether one of their systems is better for us than the SQL system we'd otherwise automatically default to.

Re:Worse than the old boss (1)

vegiVamp (518171) | more than 2 years ago | (#40217689)

Unfortunately, the NoSQL people come over as if - and many actually do - believe that RDBMSes are utterly useless now that they have found Je- err, their new toy.
Of course what is now suddenly known as NoSQL has it's place - hell, how many of us haven't been using Memcached or something similar? Mozilla (and many others) uses RDF stores - yep, that's also NoSQL now. It's just not the ONLY solution, let alone always the BEST one - and of course you need to pick the right tool for the job.

There's been a kentering, fortunately - and the more moderate and/or smarter specimens have retconned NoSQL to mean Not Only SQL. Still pretty obvious where it came from, but at least that's an acknowledgement of what life is like in the real world.

I have no problem with using NoSQL or whatever tech where it's appropriate. Almost all our sites have memcached fronts. We've been using Redis and AWS (ZOMG! Cloud!) for specific high-burst things. We've got a Cassandra (although I question if that was the right choice for that bit...) and we're going to be looking at MySQL Cluster (which, yes, is also NoSQL even though a regular SQL can also be used to question it).

I just have a problem with the religious conversion types who usually have barely a few years and one or two technologies in their fingers and suddenly need to convince the world that they've seen the light and so should you because pancakes.

Re:Same as the old boss (1)

Forever Wondering (2506940) | more than 2 years ago | (#40214279)

It's so cute how NoSQL developers have reinvented the XML database.

Actually, XML is a comparative latecomer.

NoSQL uses JSON which has "name: { blah:val, blah:val }" style syntax. I needed a text database format for some [perl/awk] scripts I wrote in the 80's. I ended up creating a similar curly brace format--no big deal.

Before relational databases even existed, there were CODASYL-compliant databases. These didn't even have SQL as we know it today.

Re:Same as the old boss (2)

cait56 (677299) | more than 2 years ago | (#40214343)

Actually I think they reinvented the flat file.

Re:Same as the old boss (1)

Anonymous Coward | more than 2 years ago | (#40214841)

It's so cute how you exclude XML database developers from NoSQL developers. NoSQL gave the trend a name, the currently popular examples tend to not be XML based, but they are all part of the same family. The biggest difference is probably that XML developers love formal schemas whereas the current crop prefers informal ones.

Re:Same as the old boss (2)

Pseudonym (62607) | more than 2 years ago | (#40215157)

What I find especially cute is that nobody in this thread seems to have heard of Z39.50 or WAIS.

Re:Same as the old boss (1)

greg1104 (461138) | more than 2 years ago | (#40216091)

The whole "XML in databases!" trend came out of people being frustrated with not being able to stuff arbitrary data into a relational database. This "new" document storage idea is addressing the exact same problem in a similar way, only it's a different schemaless storage scheme/database pair. That's why I was amused by the similarity.

Z39.50 and WAIS were implementing a client/server protocol that wasn't tied to any particular database storage backend. If I were searching for a historical precedent for flexible data transfer mechanisms, that concept surely goes back earlier than Z39.50. I'd expect to find prior art for that broader idea among the earlier 50's and 60's research into data storage.

Re:Same as the old boss (1)

Pseudonym (62607) | more than 2 years ago | (#40229053)

Z39.50 and WAIS were implementing a client/server protocol that wasn't tied to any particular database storage backend.

That's certainly true. I only know of one Z39.50 database engine that actually speaks Z39.50 natively.

Nonetheless, Z39.50 was designed with SGML in mind. It implements a very flexible documents-with-nested-and-repeating-fields schema, and did so in 1988.

Re:Same as the old boss (0)

Anonymous Coward | more than 2 years ago | (#40215277)

You mean M. See GlobalsDB. That stuff dates back to the 1960s.

Re:Same as the old boss (1)

gadzook33 (740455) | more than 2 years ago | (#40215617)

You mean lotus notes?

The article is barely a description of MongoDB... (5, Informative)

Nadaka (224565) | more than 2 years ago | (#40213195)

The article is barely a description of MongoDB records. It does not really detail any real drawbacks or benefits beyond "look ma, random structure in my record!"

Re:The article is barely a description of MongoDB. (3, Insightful)

Moses48 (1849872) | more than 2 years ago | (#40213517)

I read this article with the hope of seeing some of the benifits and drawbacks (as the title implied). No talk of scalability, indexing, speed, etc. I actually feel dumber for having read the article.

Re:The article is barely a description of MongoDB. (1)

nullchar (446050) | more than 2 years ago | (#40214743)

The comments on SlashBI are great too. I also wanted to know how to query data out of your "documents" as the Wikipedia page doesn't describe that. Using the SlashBI example, show me all contact objects with state = "DC" or all records where last name ilike 'o_ama'. Does performing a search like that iterate over all records? Do you need to enable some full-text indexing of your entire document store to be able to execute queries like that?

Re:The article is barely a description of MongoDB. (0)

Anonymous Coward | more than 2 years ago | (#40214927)

Mongodb has no way to guarantee data integrity (no defined fields, foreign keys, constraint or triggers) nor can it provide much in the way of security but It is very fast at querying huge data sets, great for making persistent objects in many languages, and even better for treating those objects more like a database ( finding objects that have parameters in common). Lack of data validation means means a risk of errors and you wouldn't want to run a mission critical system on it, but it's great for data gathering and scrubbing before moving it elsewhere. The best example is scraping and storing web pages, most likely for later parsing or use but I've found many a time where I needed to gather buckets of data prior to actually figuring out how to best manipulate an use it or just wanted to store data that took a lot of I/O to gather and I didn't want to repeat it. Here's a simple perl example for storing and pulling an object

Re:The article is barely a description of MongoDB. (1)

hlavac (914630) | more than 2 years ago | (#40215957)

I never understood this No-SQL fad. You can turn off transaction isolation and cram serialized record data into a single BLOB field, and you will get the same thing right? Or, use a freaking filesystem? Why do they keep patting themselves on the shoulder over performance of particular implementation that is due to lack of features and safety, and comparing it to relational databases in general as if it was somehow superior? Apples and oranges. Like saying MyISAM backend is superior to InnoDB in MySQL because it is faster. SQL as a performance bottleneck? Having to escape certain characters? mysql_real_escape_string()? They probably never heard of bound parameters and prepared statements. Once they find out they need to start addressing things like durability (when they acknowledge successful completion of a transaction to the remote client, and then there is a crash immediately after that, will the transaction be lost?) and isolation (multiple concurrent transactions modifying the same data jeopardizing the integrity of the data) they will eventually find out that transaction processing is about more than just atomic updates, and find themselves doing the stuff they loathe on the SQL databases. And it will hurt performance to do it compared to the case where they don't care about these things, surprise, surprise. Reminds me of the postgresql guys when they thought they could somehow make their great idea of snapshot based concurrency control into a proper serializable isolation, and everyone else was doing it wrong. They couldn't, it only works for read only transactions. Now they know.

Re:The article is barely a description of MongoDB. (1)

julesh (229690) | more than 2 years ago | (#40217605)

You can turn off transaction isolation and cram serialized record data into a single BLOB field, and you will get the same thing right?

Not really. Schemaless databases provide indexing and search capabilities that are impossible to achieve using SQL blobs without either loading all your data back into memory whenever you want to search for something or providing your own index mechanism.

Or, use a freaking filesystem?

Which as well as lacking indexing and search as the SQL-based system would, also does not provide any useful mechanism for concurrent updates, or for ensuring consistency (whether eventual or guaranteed at all times). It would also probably be much slower.

Why do they keep patting themselves on the shoulder over performance of particular implementation that is due to lack of features and safety, and comparing it to relational databases in general as if it was somehow superior? Apples and oranges.

They don't. You're misreading the articles because you haven't spotted the context: people are using oranges and complaining that the cider they're ending up with just doesn't taste right. I.e., a lot of people are using SQL databases for tasks for which one of the various NoSQL systems would be better suited simply because they don't realise there's a better tool for them. These people need to see comparisons between SQL & the NoSQL systems that are available in order to realise this.

SQL as a performance bottleneck? Having to escape certain characters? mysql_real_escape_string()? They probably never heard of bound parameters and prepared statements.

My experience is that SQL isn't so much the bottleneck (although it is slow, even with prepared statements) as object-relational mapping. And yes, my application does need some form of ORM (whether handbuilt or using an off-the-shelf library) because I'm working with many types of polymorphic object in single collections. Schemaless databases make this much, much simpler, as they remove the need for large numbers of table joins to make a deep inheritance heirarchy work.

Another NoSQL article on /. (5, Insightful)

Sarten-X (1102295) | more than 2 years ago | (#40213271)

Oh, look, it's a NoSQL article.

Cue the hundreds of Slashdotters who proclaim "Oh, they're reinvented obsolete databases" and "Just wait until they need ACID, then they'll be fucked", the NoSQL blind-faith followers who harp about pure scalability and clustering, and at least a dozen references to an animated video of a retarded strawman saying "webscale" repeatedly.

Somewhere in the depths of poorly-researched comments will be some guy who thinks that NoSQL is a tool that really just might be useful for particular use cases, and should be used where appropriate, and nowhere else. Sadly, his post will be missed because everyone's too busy talking about how everything can be done just as easily on a $500,000 server farm running Oracle's latest and greatest turd.

Re:Another NoSQL article on /. (-1)

Anonymous Coward | more than 2 years ago | (#40213333)

Where are the mod points when you need them ... Please mod parent up.

Re:Another NoSQL article on /. (1)

Anonymous Coward | more than 2 years ago | (#40213545)

You don't get mod points because you're AC.

Re:Another NoSQL article on /. (-1)

Anonymous Coward | more than 2 years ago | (#40213721)

You don't get mod points because you're AC.

Where are the mod points when you need them ... please mod parent up.

Re:Another NoSQL article on /. (1)

Fwipp (1473271) | more than 2 years ago | (#40213337)

+1 Preemptively made other posts Redundant

Re:Another NoSQL article on /. (1)

busyqth (2566075) | more than 2 years ago | (#40213377)

Oh there are definitely use cases for mongo: It's the cheap/fast selection on the "cheap/fast/good: pick 2 of 3" scale.
Kind of like the McDonalds of data storage.

Re:Another NoSQL article on /. (0)

flimflammer (956759) | more than 2 years ago | (#40213389)

Welp, looks like that's it for the thread, folks. Move along.

Re:Another NoSQL article on /. (0)

Anonymous Coward | more than 2 years ago | (#40213447)

That presumes that it is the correct tool for any job. Without making a comment on the truth of the matter, it is quite possible for NoSQL to be the wrong choice for any possible job when compared to other technologies.

As a comment, I feel like flexible data models increase the importance of institutional knowledge and make the maintenance burden higher in all cases. It does not lead to faster development time. However, I don't know enough about the performance characteristics to make a blanket statement that it has no valid use cases.

Re:Another NoSQL article on /. (5, Informative)

greg1104 (461138) | more than 2 years ago | (#40213453)

Sadly, his post will be missed because everyone's too busy talking about how everything can be done just as easily on a $500,000 server farm running Oracle's latest and greatest turd.

Actually, I was going to talk about how PostgreSQL 9.2 (expected in Q3 of this year) will include JSON support [postgresql.org] . The database also has non-relational key value [postgresql.org] storage, and that feature is even available in Heroku deployments [heroku.com] now.

PostgreSQL also lets you relax ACID for performance when that makes sense, at the transaction level, using synchronous_commit parameter [postgresql.org] and unlogged tables [depesz.com] .

There are two things PostgreSQL doesn't do as well as MongoDB. It won't do simple key/value lookups quite as fast; I normally eliminate that problem by putting a memcached server in at some level. And you can't split writes among multiple nodes easily yet.

Re:Another NoSQL article on /. (2)

dkleinsc (563838) | more than 2 years ago | (#40213681)

Glad I'm not the first to bring up PostgreSQL, which gives you serious amounts of awesomeness at 0% of the cost of Oracle.

Re:Another NoSQL article on /. (0)

Anonymous Coward | more than 2 years ago | (#40215105)

Strictly speaking, you can't split writes among multiple pure PostgreSQL nodes easily, but if you don't mind inserting pgpool-II layer, it becomes trivially easy to do so. Being statement level replication, it is no worse than MongoDB's.

Re:Another NoSQL article on /. (1)

greg1104 (461138) | more than 2 years ago | (#40216139)

I was strictly speaking, because pgpool-II's statement replication is neither built-in nor without limitations--compared to the full feature set of PostgreSQL. Another write scaling approach is to use the PL/Proxy language to wrap database access. There's also people doing PostgreSQL sharding in their application layer, connecting to one of multiple databases based on what they need. None of these ideas are popular nor built-in to the database yet though.

Re:Another NoSQL article on /. (0)

Anonymous Coward | more than 2 years ago | (#40213491)

... you forgot making fun of "SlashBI" and lamenting the post-Taco corpification of /.

Re:Another NoSQL article on /. (0)

Anonymous Coward | more than 2 years ago | (#40213575)

Somewhere in the depths of poorly-researched comments will be some guy who thinks that NoSQL is a tool that really just might be useful for particular use cases, and should be used where appropriate, and nowhere else. Sadly, his post will be missed because everyone's too busy talking about how everything can be done just as easily on a $500,000 server farm running Oracle's latest and greatest turd.

Here's a thought: not everything made is useful. Sometimes crap is just crap. And another thought: sometimes taking a balanced approach doesn't always present an impression of cleverness or diplomacy, but rather indecisiveness and incompetence.

Re:Another NoSQL article on /. (1)

jd (1658) | more than 2 years ago | (#40213651)

Agreed, but that's the peril of living in a world where everything is tightly-coupled and highly-integrated. People forget that you can mix-n-match, they look no further than using one system for everything. NoSQL does indeed have a purpose, and just like an F1 car, it is in a class of its own when used for that purpose. But I'd no more use Memcache as a substitute for NetCDF or Ingres than I would use an F1 car to go off-road sight-seeing.

Re:Another NoSQL article on /. (3, Funny)

Hognoxious (631665) | more than 2 years ago | (#40217065)

For the benefit of readers in the US, F1 is like Indycar but the cars can turn both right and left.

my article about porn stars using kickstarter (0)

decora (1710862) | more than 2 years ago | (#40214699)

my article about porn stars using kickstarter and other donation websites to provide each other with healthcare (because there is no healthcare in the porn industry) got instantly blacklisted.

instead, you get this.

shrug.

Re:Another NoSQL article on /. (1)

sco08y (615665) | more than 2 years ago | (#40217913)

Somewhere in the depths of poorly-researched comments will be some guy who thinks that NoSQL is a tool that really just might be useful for particular use cases, and should be used where appropriate, and nowhere else.

Yes, those "appropriate uses for NoSQL" are like unicorns... often rumored, only apparently seen in huge companies like Amazon and Google where they have dozens of PhDs working on them. They're very similar, unfortunately, to the "seemingly appropriate uses for NoSQL" but you can't really tell until you've wasted months of development effort...

Re:Another NoSQL article on /. (1)

DavidTC (10147) | more than 2 years ago | (#40225027)

And when people point out there appears to be no actual use, the NoSQL people feel the need to expand NoSQL to include completely random things like Memcache, which is not any sort of 'database' at all, and the Memcache people would be completely baffled to be included in this group.

I'm frankly surprised they haven't started claiming that filesystems are NoSQL. And everyone uses those! So everyone uses NoSQL!

And, in fact, they're correct. That's really what NoSQL is. It's a not a replacement for any sort of existing database, it's a replacement for a filesystem. If you are dealing with a massive amount of random chunks of information of differing types, then you need NoSQL because filesystems only hold so much and are not geared that hugely.

And, I should point out, that almost no one is doing that. Google, yes. Some DNA researcher, yes.

Everyone else, no. And most of the people who are...just need some sort of file replication and locking system, with smarter caching. Or memcache or something like that.

I swear, it's like the database universe got invaded with a bunch of idjits saying 'Use unstructured files instead of databases, they're much better at storing unstructured data like spreadsheets and PDFs and images'.

Yeah, thanks, we already knew that. In fact, we're pretty certain everyone already knew that, considering that was invented before databases. The fact some people with massive amounts of files have invented a sort of ultimate user-space file store for their huge amounts of data...well, good for them, but that's not actually relevant to anyone else.

(And the fact that the NoSQL people seem intent on rigging JSON or XML or whatever back on top of them, to get 'fields' in their unstructured data, is just hilarious. I keep waiting for them to also invent storing multiple rows in a single NoSQL block, and then they can even come up with a query language to search in them! Uh...guys?)

Article is not useful (5, Insightful)

claytongulick (725397) | more than 2 years ago | (#40213283)

I'm not sure what the point of this "article" is. It is light on actual information or anything useful, it's basically just a few paragraphs that say "a NoSQL database called Mongo stored data in JSON format. This may or may not work for you".

If we're going to have "BI" articles, they should be informative, containing useful information that we couldn't have gathered ourselves in 10 secs of googling.

How about some comparisons between various NoSQL solutions? How about binary access API v/s RESTful approach ala Couch? How about clustering, replication and scalability? How about stability concerns (with Couch, for example). Real world use cases? Examples of companies using them for specific solutions? Performance comparisons with RDBMS's? Problem domains that NoSQL/schema less DB is more suited to than a RDMBS?

I'm not trying to be pointlessly critical here, I'm trying to provide some constructive feedback on the new slashdot BI format. This article wasn't useful to me at all. I'll probably not spend time reading these articles in the future if the content is as light as this article.

Re:Article is not useful (1)

mrtwice99 (1435899) | more than 2 years ago | (#40213579)

I have to say I agree totally. I read it honestly interested in what it might have to say and came away thinking, why even bother writing that!

Re:Article is not useful (0)

Anonymous Coward | more than 2 years ago | (#40215381)

I'm guessing this article is designed as an 'intro' to being a Nerd and Mattering.

It'll save someone from 20 minutes of Googling and a lifetime of understanding.

Re:Article is not useful (1)

PuZZleDucK (2478702) | more than 2 years ago | (#40215679)

Can you write the next one! :p

Unstructured Data (4, Interesting)

Bigby (659157) | more than 2 years ago | (#40213413)

I don't know when unstructured data turned into NoSQL or Big Data, but it is a pretty simple concept with complex Enterprise level requirements. I work in this field and have for various companies. The biggest obstacle is conforming to the laws of various jurisdictions and levels of government.

You have unstructured data, but it NEEDS some level of structure. That structure is there to restrict access to certain groups within the organization and also for retention rules, which differ by type of data being stored. Not to mention that you must store certain documents in the country of origin, so structured field-based distributed storage plays a role. Oh yea, laws/policies around encryption and whether or not an index violates those laws/policies.

This doesn't work well with a relational database. Sure, you can jam it into a RDBMS like IBM Content Manager, but it becomes inflexible. However, there are constraints that must be followed and all documents need some kind of structure wrapped around them in a RDBMS-like fashion.

I haven't dove into these NoSQL systems myself. They seem like a good idea, but I hesitate if they are too loose. In an Enterprise with sensitive information, you need to deny first. Also, how do they index the fields? Like when you have 100,000,000 documents with invoice numbers...

Re:Unstructured Data (0)

Anonymous Coward | more than 2 years ago | (#40213469)

"Also, how do they index the fields? Like when you have 100,000,000 documents with invoice numbers..."

My guess is the _id would be the invoicenumber.

Re:Unstructured Data (2)

greg1104 (461138) | more than 2 years ago | (#40213533)

Identifying which field is the primary key is not the same as indexing the fields, plural.

Re:Unstructured Data (0)

Anonymous Coward | more than 2 years ago | (#40213663)

The other fields are hidden in het JSON hash, but what fields are appropriate for a document? I'd store useful info (amounts, customers, whatever) about the invoices somewhere else (rdbms) myself.

Re:Unstructured Data (3, Interesting)

Bigby (659157) | more than 2 years ago | (#40213647)

Some of those documents with invoice numbers are not invoices. In fact, they could have many invoice numbers. An invoice numbers are just an example. There is a lot of value to a company to find all documents relating to product #XYZ that was shipped to company ABC. Maybe throw some date constraints in there. And they don't want useless garbage in the results. Also, all invoices should have an invoice number. And an invoice number should have a certain pattern. Otherwise, garbage-in garbage-out.

Also, the part where RDBMS based document storage falls flat on it's face is versioning of the schema itself. Business requirements change; they want to require a field that wasn't required before. They want to make one optional. They want to change the type or the pattern format. But the searches should still go across all those documents. NoSQL based stuff, assuming they are properly and efficiently indexed, may do better in this department.

Re:Unstructured Data (-1)

Anonymous Coward | more than 2 years ago | (#40214429)

Huh? Ever heard of many-to-many relationship tables? RDBMS can do Everything "NoSQL" can. If your requirements are changing that much, you don't change your databasemodel ("schema") all the time, you just accomodate all future requirements with your current model. Keep it simple stupid, or you'll lose it all.

"NoSQL" are for braindead people that don't know how to use properly engineered tools for the right purposes, or for extremely specialized purposes where you'd basically need to care for every detail and try to squeeze out the last drop of performance while doing highly unusual stuff with a "database". I'm saying "database" here, because once you venture away from RDBMS, you're usually on your own, with very limited guarantees and a high risk of simply doing it wrong.

Yes, RDBMS is boring. It puts food on the table though. Eventually, most "NoSQL" projects hits a snag, and either have to spend alot of time and resources dealing with it in detail, or just move over to another tool. Hopefully, it will be an RDBMS, or else, rinse and repeat until boredom sets in.

RDBMS is the way for most projects. No need to overcomplicate matters. Users won't care for "NoSQL". In fact, they often care about SQL, which is a big bonus (hello Crystalreports!). "Databases" such as Filemaker, will die one day, while RDBMS will keep on going and giving.

Re:Unstructured Data (1)

Bigby (659157) | more than 2 years ago | (#40218121)

I agree that you need structure, much like RDBMS. However, there are advantages to a NoSQL-like model with Enterprise document storage. There are disadvantages to RDBMS as well. It needs something in the middle.

Sure, a traditional RDBMS can do it. IBM Content Manager is exactly that (with an unstructured component for storing docs). Have you used RDBMS for Enterprise Content Management? Holding documents to strict schemas can be ineffective, because documents change over time. Sure, you can just create more and more tables, but that requires administrators and time. It also creates a mess. Also, users want to search the system, not just a table.

Suppose the add a field to invoices. And it is required on all future invoices. With RDBMS, you need to create a new table with a NOT NULL constraint. Now they want to search for all invoices to a specific vendor between certain dates. They need to search both tables, because the new column is inconsequential to their business need. That table separation is worthless to the user. Are you going to UNION 10 tables and then ORDER BY the results? The way around this would be to have a bunch of two column tables with ID and ATTR. But then you run into other issues.

And again, yes, it can all be done in a RDBMS. It can all be done with NoSQL. There is a lot of stuff that can be built around it. But something more like an Associative database works much better actually.

Re:Unstructured Data (1)

dbguy (129369) | more than 2 years ago | (#40227709)

... Holding documents to strict schemas can be ineffective, because documents change over time. Sure, you can just create more and more tables, but that requires administrators and time....

Suppose the(y) add a field to invoices. And it is required on all future invoices. With RDBMS, you need to create a new table with a NOT NULL constraint.

There are good reasons to choose NoSql (I like my paycheck), but that's not one of them.
The major SQL products have supported "add column" for decades. I implemented add column in the '80s. ISTR that oracle, db/2 and sybase also supported it at the time.

Re:Unstructured Data (1)

Yogs (592322) | more than 2 years ago | (#40216101)

Relatively free form key value pairs except some other stuff that matters for your domain works just fine in a relational db, you just have to query for it when you need it. If you already have a db and an ORM, which would be the common case in any enterprise environment, you'll get your getters and setters for free once you specify class/member->table/column and you can have an attribute table in the without breaking step. How this would be hard to set up or use compared to a key/value store is a mystery to me.

I've worked on a couple apps that were admin configurable via extended properties and did exactly this against a relational db at runtime. It works fine. Actually, better than fine. Let's say in the future you want to use them in aggregates or use them to filter rows on large datasets. Keeping everything in a relational db, dealing with that need after the fact is easy, and efficient if you choose to make it so. Split your stores and not so much.

YMMV always, but I think your vendor just sucks if your keys and indexes aren't encrypted along with the data. Anyways, if you want to encrypt/decrypt in the application you can, the nosql folks who haven't gotten around to supporting that (or COMPRESSION in some cases, really) like that argument.

But hey, let's say you want really, really unstructured data without any mapping into your model? Fine, use a lob, and pardon me while wretch. Point is the tool doesn't actually have any simple functional shortcomings compared to a key/value or simple document store, it's just that it gives you the option to impose a little more discipline, and from painful experience most people have chosen that option.

I'm not saying these things don't have a place, but it's more limited than acknowledged, and purported advantage of being schema-less is pretty stupid. The only related argument I've heard that has any sense to it is about uptime, but ultimately that fails for me, too. First 99% of schema changes are simple additive ones that have no impact on uptime. Second even the 1% can be handled pretty well in most cases restricting writes on a limited basis. Lastly, if this update outage is really putting you in knots you can keep modification dates (or just add them for this use case) do the 99.99% of the data transfer ahead of time then the write outage window can be TINY because the final set of data to transfer is so small.

Re:Unstructured Data (0)

Anonymous Coward | more than 2 years ago | (#40217197)

Fine, use a lob, and pardon me while wretch.

Sounds purple until a pentagram.

Re:Unstructured Data (1)

dhasenan (758719) | more than 2 years ago | (#40228507)

In RavenDB, you create indices by creating a query returning the field you want to index and telling RavenDB to index that. For instance, if you are going to query your User documents based on email address often, you would write an index:

from user in Users select new { user.EmailAddress }

And then you can query:

from user in Users where user.EmailAddress == "bob.dobbs@example.org"

You can do this without an index, but it will be slow. Though in the case of RavenDB, I believe the database will add indices based on query patterns -- your query will be slow for the first few times, then indexing kicks in, and eventually your query gets fast.

tr0ll (-1)

Anonymous Coward | more than 2 years ago | (#40213561)

Goodbye...she had Direct orders, or 7hen disappeared there are about 700 used to. SHIT ON megs of ram runs Dying. All major in posting a GNAA

I'm using Mongo (1)

xmateosx (2654735) | more than 2 years ago | (#40213811)

With MongoDB and lack of hard schema requirement doesn't mean your data model can be all willy-nilly. You have to put some thought into it. Several people have mentioned they are looking for the benefits and drawbacks. I'm really enjoying it. Here's my short list draw backs. Case Sensitive: If you load data with mixed case you will have to use a regular expression to find it all. Data Type UnStrickness: If you load zip codes as a string "11223" and try to find it with an numeric 11223 you are out of luck. Also, if you load your data with mixed data types for the same set of keys, good luck finding it again with out having to work for it. Rich Documents Are Cool: Until you try to remove a struct inside and array that's inside a struct. There is little documentation that explains how to manipulate complex data structures. You can join google groups mongodb-user, they are very responsive. Plus learning to use JSON as your query language is a different way of thinking from Sql Queries. Things I like about mongodb: It is fast. Replicating Shards is very handy and relatively easy to do. It has a growing audience and user base. And finally it fun to watch peoples faces when you tell them "I replicated a bunch of shards today, what did you do?' peace

Just wondering... (2)

ilsaloving (1534307) | more than 2 years ago | (#40213845)

Where is Lotus Notes in all this bru ha ha? They were the original NoSQL system.

Re:Just wondering... (0)

Anonymous Coward | more than 2 years ago | (#40216225)

Been developing Domino applications for the past 15 years.

The NoSQL thing is nothing new.... Non relational databases work great for collaboration, workflow, wiki's and etc.

For last 10 years keep being told that Notes/Domino is going away... But it still manages to stay relevant....

Re:Just wondering... (1)

Hyperhaplo (575219) | more than 2 years ago | (#40216505)

Where is Lotus Notes in all this bru ha ha?

Still at the same place they were decades ago?

*not* flamebait here. Truth. I had deal with this not that long ago on a daily basis... and I wonder, I really really do.

Re:Just wondering... (1)

ilsaloving (1534307) | more than 2 years ago | (#40218087)

Having done some development work on it a long time ago, your news is dissappointing.... yet... somehow not surprising.

Re:Just wondering... (1)

Hyperhaplo (575219) | more than 2 years ago | (#40267513)

Well, the "good news" is where I work is now dumping Lotus for Outlook... and all MS software along with it.

I am not entirely sure if this is a step forwards or backwards..

Re:Just wondering... (0)

Anonymous Coward | more than 2 years ago | (#40217237)

I just use an excel spreadsheet. Who needs a "database" anyway?

Disappointing (2)

g0es (614709) | more than 2 years ago | (#40213973)

I was really hoping for a more in-depth description of what NOSQL has to offer over other DB options.

NoSQL is just an umbrella term (2)

aq1018 (2654781) | more than 2 years ago | (#40214913)

... to describe non-relational databases. There are many different ways to store data in various database designs. They are mainly of the following: Relational (mysql, postgresql), Column based (HBase, Cassandra), Document based (Mongodb, CouchBase), Hash based (redis, memcached), and Graph based (neo4j). Databases can also be categorized as single system databases or distributed databases. Mongodb and most (all?) relational dbs are single system databases. HBase, Cassandra, CouchBase, Riak are distributed databases. NoSQL is developed to solve problems that traditional SQL databases are unable or difficult to solve, such as real time updates on a massive scale, or being able to scale horizontally with relative ease, or lighting fast query speeds. They also have drawbacks such as eventual consistency, and synchronization issues. A good architect / programmer should have a solid analysis of the usage case against different flavors of DBs and perform benchmarks, be familiar with their fail over / recovery procedures, and have a good understand of the underlying technologies before considering using them in a serious production environment.

What is document storage? (1)

DogDude (805747) | more than 2 years ago | (#40215281)

What is the point of document storage in a noSQL database? If you're not going to store docs in a RDBMS, why not just store them in a filesystem? What is the point of Mongo or whatever this stuff is?

Re:What is document storage? (-1)

Anonymous Coward | more than 2 years ago | (#40215553)

But, but, noSQL is kewl! All the leet kewl folks use it! Cause, it's like, not the old SQL stuff that older folks use! That's gotta be bad. Cause it's been around for a long time. And tested and stuff.

Re:What is document storage? (1)

crackspackle (759472) | more than 2 years ago | (#40215673)

What is the point of document storage in a noSQL database? If you're not going to store docs in a RDBMS, why not just store them in a filesystem? What is the point of Mongo or whatever this stuff is?

They are JavaScript Object Notation (JSON) documents and you can query into fields of the object-document without the database having to read the whole "document" in the same way you can read rows based on some set of columns in an RDBMS. Given objects like { a = { b,c } } or just d = f you could read a.b where = c or just d where = f. It's multidimensional as opposed to the flat column format of an RDBMS. Unfortunately, their are no data types, constraints, foreign keys or triggers. Data integrity has to be done in programatically

Re:What is document storage? (1)

DavidTC (10147) | more than 2 years ago | (#40225217)

They are JavaScript Object Notation (JSON) documents and you can query into fields of the object-document without the database having to read the whole "document" in the same way you can read rows based on some set of columns in an RDBMS.

Yes, congratulations on off-loading the 'reading the document into memory and parsing it' into an entirely different process. I'm that will make it much faster then simply keeping the document mmapped in the application that needs it, and in no way be counterproductive as data is pointlessly transferred back and forth and memory is consumed for no useful reason.

Look: Either the document is already in memory, in which case it should be in the memory of the damn program that wants to use it, or it's not in memory, and someone is going to have to read it in and parse the data fields.

Re:What is document storage? (0)

Anonymous Coward | more than 2 years ago | (#40216285)

You just explained it. The filesystem is the problem.

1. Sometimes the file system is not fast enough (or stable) vs. the speed of incoming data. Therefore a mechanism that uses something like multiple machines, memory, and/or queues/mailboxes/etc is used to prevent the system from being murdered.

2. Most file systems in practical use such as ext(insert number), NTFS, HFS, FAT, etc. have some serious design limitations. Folders in terms of organizing data are a serious WTF. Say for example I want to store my pr0n by star name. What happens if the star uses multiple names? What if I also have a parent folder for hair color and she/he dyes hair? The organization system in most file systems is easy to explode, and no amount of symbolic links, shortcuts, whatever can fix it easily.

3. Disk IO is seriously limiting, and SSDs help, but are still not the solution in terms of speed, cost, and size still. Eventually you want to retrieve all that data out of the file system. If you had to go to the file system all the time and rely on hardware and OS caching, you might be in some serious performance pain.

4. Analytics and queries. What file system is this not painful? Do you want an API on top of the file system to do this better? In that case, you've essentially just written a database.

5. The documents are ultimately stored in the file system. Same goes for non-"document" NoSQL databases such as graph and object databases.

Does that help?

Not just k/v/blob store, but delegating queries (1)

csirac (574795) | more than 2 years ago | (#40215439)

We used MongoDB as a query/cache accelerator for semi-structured data. The key bottleneck was delegating queries outside of application (pre-filtering results according to ACLs, date transforms, etc.).

We don't have a shockingly huge dataset, and site traffic wouldn't be considered as webscale, but the ad-hoc schema and ability to delegate complex queries to the DBs as JS was really powerful and bought us a lot of performance for very little effort.

And it's only a cache of the authoritative data store, so we can trash mongo and re-load the whole dataset in a few hours.

My Experience (0)

Anonymous Coward | more than 2 years ago | (#40215783)

I'm currently using MongoDB for analytics data in a fairly large company. I must say this article was complete weaksauce. Aside from the fairly well documented scalability, and yes, it delivers, I've found that working with objects rather than related tables has given me a new perspective on what high "normalization" is. Being schema-less is not just about store whatever you want. It's about storing what you need, and also being able to perform some rather interesting observations on that data.

A working example is the fact that I can store, in most cases, only differentiating data. For example, it is very easy for me to take the URL that someone is on, compare it to the URL that someone is going to (via a click), and store only the difference in information. If the domain does not differ, I can store it in a single location. Could the same be achieved with SQL... of course, two tables, each with similar structure, but one allowing null values, and then another table which links them. However, then my queries become increasingly complex for pretty simple data.

In this same area, it is important for me to have things like query parameters normalized. I want to be able to easily query if the user hit a URL with a particular parameter, and a particular value for that parameter. Using an embedded object solves this quite nicely, and there is very little code required... the property names are the parameter name, their values, the values passed. Compare this to what you might have to do in SQL... either store the full query string and try to get away with LIKE '%param=value%' or parse this query string each time... or worse, in my opinion, create a table with columns like url_id, param_name, value and then store one record for each parameter, not to mention the fact that having a schema would basically require me to say all parameters are the same type. And while that is true in the URL sense (they are all strings), my data model clearly understands that my 'page' parameter is an integer for searches and my 'query' parameter is a string... why shouldn't my database be able to do the same thing with ease?

In addition to this, I like a number of features such as aggregate operations. A single URL object can easily have it's click count aggregated in a single autonomous "query." No need to select the value first and then update, not to mention wrap it in a transaction to avoid the possibility of concurrency issues messing with the count.

Additionally, some bringing up XML databases (which I agree I'm not very familiar with) seem to be missing the point that JSON data, but more specifically, BSON, can carry type information without the need for additional parsing and/or standards. Not everything is a 'string' -- and point of fact, you can even have the possibility of more complex objects which are stored as natural types... MongoDB supports date objects for example, as well as geolocation data which is query-able via concepts specific to that data.

These are, in my opinion, the more powerful features of MongoDB and presumably other object store databases that don't get discussed. There are some limitations, of course. For users who get too carried away, they may, for example, run into the maximum object size limitations. Furthermore, the benefits may not be immediately realized when coming from a relational background. It took me some time to really understand the distinction.

Re:My Experience (1)

DavidTC (10147) | more than 2 years ago | (#40225871)

Could the same be achieved with SQL... of course, two tables, each with similar structure, but one allowing null values, and then another table which links them.

Uh, no. In SQL, you store data by putting each record in a row. No one has the slightest idea what you're talking about, or why you'd need to 'store only the differences', or why you'd need three tables for that.

However, then my queries become increasingly complex for pretty simple data.

Oh noes! Complicated queries!

Compare this to what you might have to do in SQL... either store the full query string and try to get away with LIKE '%param=value%' or parse this query string each time

That sure is a complicated query. Why, it's so complicated you managed to type the important part in a slashdot post off-hand.

And while that is true in the URL sense (they are all strings), my data model clearly understands that my 'page' parameter is an integer for searches and my 'query' parameter is a string... why shouldn't my database be able to do the same thing with ease?

Yes, especially as MongoDB somehow does magically understand the types of URL parameters...oh, wait, it doesn't? You're having to parse each one as you put it in and figure out the type? Well then that hardly seems relevant to complain that SQL would also require that, does it? (Not that I have any idea why you're complaining about having to store something as a string that actually is a string. There's not any sort of loss there. If you want to treat it as a number, you can do that just as easily after you pull it from the database as before.)

In addition to this, I like a number of features such as aggregate operations. A single URL object can easily have it's click count aggregated in a single autonomous "query." No need to select the value first and then update, not to mention wrap it in a transaction to avoid the possibility of concurrency issues messing with the count.

Yes, it sure is complicated to do something like: UPDATE urltable SET counter = counter + 1 WHERE url = 'blah'; (With INSERT...ON DUPLICATE KEY if needed.)

NoSQL is not New, and Not Always Schemaless (0)

Anonymous Coward | more than 2 years ago | (#40216245)

While "NoSQL" the term itself is not new, what most of the posters here really are complaining about is relational databases vs. a subset of everything else. I am also not sure what the assumption that all "NoSQL" databases have no schema, and consequently the attachment to schema. Overall, use the right tool for the right job, and sometimes that even means multiple tools. Just because something worked for you on 10 projects doesn't mean for everyone else it will fit every situation.

Firstly, there is no reason past, present, or future why a relational database cannot forgo using SQL. SQL is simply a tool for relational databases, or rather a semi-standard powerful DSL with an accompanying API per implementation. It's a decent fit with relational databases. Is a relational DB without SQL "NoSQL?" Honestly, "NoSQL" probably one of the dumbest terms and just reminds of people who complain like little children about the failings (of which there are indeed many) of SQL against relational databases.

Moreover, some of the oldest databases in software history are not relational and do not use SQL. For instance, there are quite a number of object databases that are as old or older than many popular relational implementations. These databases do not use SQL. Are they NoSQL? Supposedly, yes. They also most definitely can have schemas. The schemas are defined by classes usually, which creates many advantages in various OO languages vs. the relational model. Some of the largest databases in the world are object databases ranging from applications such as nuclear and genetic sciences to shipping. Gemstone object database is used for shipping and is baked in with Smalltalk. If anything, this ensures that this database can actually have a more complex and rigid schema than any SQL implementation because you can define classes, business rules, and callback-like behavior to handle and enforce your schema. (Aside: this brings up an important point that why not use more functional languages with DBs since after all they are a closer fit than OO, hence less impedance mismatch and so on).

Another category of databases that is NoSQL are graph databases. These again are some of the largest databases on the planet. In most implementations whether we're talking about Neo4j, InfiniteGraph, Sohnes, AllegroGraph, whatever, these databases typically can have a similar level of schema control as object databases or very little. Sometimes that is the point of the graph. You can not just enforce "column" like functionality via class fields, hashes, or something similar, but actually enforce how the structure of the graph can grow. This is usually done via programmer logic. In other words, the responsibility in both object and graph databases can be passed to the programmer to augment the schema, rather than force one more general schema as in the relational world. Therefore in a graph database, I can typically create directed and undirected graphs, b-trees, tries, binary trees, red-black trees, linked lists, whatever.

Additionally, we have the newer set of databases which is probably what most people think of including mongodb, couchdb, cassandra, hadoopfs, riak, and so on. These databases have their purpose like any other. Would I use Neo4j or Oracle as my persistent cache? Probably not. Redis? A much better chance. Really none of these are a magic bullet and they are all growing like any other. I honestly can't stand some of them in my own work, but that doesn't make them bad. Some use JSON, some don't. This isn't a magic language or the new SQL. You can easily write a JSON layer for almost any database, so that's all it is - a way of talking. Don't like it? In most cases, you can use something else or write your own.

The more I hear about NoSQL, the more I want to puke. Yay, some idiots invented a new term akin to renaming air "non-solid." None of this is new, and we've all seen it before, we'll see it again. It's the same annoyance level as cloud computing which we've already also seen. At the end of the day, a combination of ignorance, stupidity, trendiness, boredom, and shiny new toy syndrome polarizes people on these topics. If you've never used these databases in production and you have not educated yourself on the alternatives, you probably shouldn't open your mouth. So much either FUD or Fanboi going on here it's crazy. Once again, know your tools and pick the right one(s).

Graph Databases (0)

Anonymous Coward | more than 2 years ago | (#40217073)

Most talk of NoSQL seems to focus on the key-value / document-oriented databases (Dynamo, Cassandra, Mongo, etc.). IMHO graph databases (e.g. neo4j) sound a lot more interesting and relevant to most use cases (at least the ones I deal with), focusing on the relationships among the data elements and new ways to query that data.

NoSQL needs better PR. (1)

Millennium (2451) | more than 2 years ago | (#40218059)

I, for one, would have a much better time taking NoSQL seriously if so many of the arguments for it didn't reduce to -and to truly express this reduction properly I need to put on my best Barbie voice- "The relational model is haaaaard." Some say SQL instead (for example, whoever came up with the NoSQL moniker), but except for a couple of arguments that amount to pure syntax baw it reduces to pretty much the same thing.

NoSQL has its place: there are some things it does really well. The problem is that the things it does best are not the things most of its advocates call for.

Check for New Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Create a Slashdot Account

Loading...