Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Ask Slashdot: Which NoSQL Database For New Project?

Soulskill posted about 6 months ago | from the mo-sql-mo-problems dept.

Databases 272

DorianGre writes: "I'm working on a new independent project. It involves iPhones and Android phones talking to PHP (Symfony) or Ruby/Rails. Each incoming call will be a data element POST, and I would like to simply write that into the database for later use. I'll need to be able to pull by date or by a number of key fields, as well as do trend reporting over time on the totals of a few fields. I would like to start with a NoSQL solution for scaling, and ideally it would be dead simple if possible. I've been looking at MongoDB, Couchbase, Cassandra/Hadoop and others. What do you recommend? What problems have you run into with the ones you've tried?"

Sorry! There are no comments related to the filter you selected.

Do you need a database? (2, Insightful)

tubs (143128) | about 6 months ago | (#46702801)

Do you need a database to do what you're trying to do? Why not just write the information to a text file (csv or tab seperated?), and use other programs to query the data?

Re:Do you need a database? (5, Funny)

Anonymous Coward | about 6 months ago | (#46702811)

Excel Spreadsheet, maybe?

Re:Do you need a database? (0)

Anonymous Coward | about 6 months ago | (#46702827)

Absolutely, write to a file and you can Splunk the file.

Re:Do you need a database? (2)

jythie (914043) | about 6 months ago | (#46703553)

*gasp* a sensible solution using readily available mature tools? *faints*

Re:Do you need a database? (1, Insightful)

Anonymous Coward | about 6 months ago | (#46702831)

Definitely use a CSV or tab-separated file. A NoSQL database is wayyyy overkill. Even a SQL database is overkill for what you're trying to do.

Re:Do you need a database? (4, Informative)

DarkOx (621550) | about 6 months ago | (#46703047)

I disagree, he is concerned about scaling. The last thing in the world he should do is use a bunch of flat files, unless he really just needs to store the data, but he already said he needs to do reports and totals on it.

Also he is working in Ruby. The smart thing for him to do IMHO is write his program against ruby/DBI. It isn't the pretty database api, but it supports plenty of different backend options and it does not sound like his program needs especially complex database operations or queries. He can start working with something like SQLite as the database "server", and move up to something else, perhaps Postgress (which can be every bit as fast as the NOSQL solutions unless you are getting highly highly custom) without needing to alter his program.

Re:Do you need a database? (0)

Anonymous Coward | about 6 months ago | (#46703263)

Pathos is what databases are for. Or do you think that manually locking the cvs file is a scalable solution?

Re:Do you need a database? (0)

Anonymous Coward | about 6 months ago | (#46702833)

That would be sqlite, and isn't what he wants.
It might seem easier to just use a text file, although I'm not quite sure why it would seem that way, but it isn't.

Re:Do you need a database? (0)

Anonymous Coward | about 6 months ago | (#46702863)

^ This is probably what I would do if the data was not required for immediate usage, just log the data with timestamp to a text file. It would be the quickest thing to do. You could also put it in a log rotation for archiving.

Re:Do you need a database? (5, Insightful)

mwvdlee (775178) | about 6 months ago | (#46702927)

Basically the question is; what's the expected volume of records and fields per records?

A solution for 100 records a week with 4 fields each would be different from 1000 records per second with 30 fields each.
1000 records/sec with 4 fields would be yet another solution.

Re:Do you need a database? (3, Informative)

DorianGre (61847) | about 6 months ago | (#46703767)

We are looking at 99% incoming data, 10-12 fields, 1000-2000 per session per week, X as many users as we can get.

Re:Do you need a database? (3, Interesting)

Richard_at_work (517087) | about 6 months ago | (#46702959)

Theres probably an element of multithreaded access that needs to be taken into consideration here - writing to a single text file may get you into issues if the receiving webserver is multithreaded, meaning the threads will either have to queue for write locks, or write to a different file.

Database engines don't have this issue, so while it may be overkill, there may be reasons to have one irregardless.

Re:Do you need a database? (5, Insightful)

FyRE666 (263011) | about 6 months ago | (#46702979)

Please don't do this (use a flat file) to store data for a web app that's likely to be accessed by more than one device at a time. Unless you implement your own file locking mechanism, you'll eventually end up with corrupt entries. Even if you do implement your own locking scheme, it's probably not going to be as efficient as using a DB. It's a 5 minute job to set up a new MySQL DB and associated query to push data in, then you can filter and report on it much more easily. It's something DBs are very good at!

Unless you have a specific need to scale horizontally, it's generally better to stick with a SQL DB for web apps. I've used MySQL, PostgreSQL and Oracle for this. MySQL is by far the easiest to work with, hence its popularity. I don't actually know of any advantage to using PostgreSQL; it doesn't perform any better, and is (or at least used to be) much less user friendly.

Re:Do you need a database? (1)

Raumkraut (518382) | about 6 months ago | (#46703049)

For storing and querying arbitrarily-structured data, which is what the submitter seems to be wanting, a traditional relational SQL database is not necessarily the best way to do it.

And if anything, MongoDB is easier to start using than any relational database, IME. No need to create databases, schemas, or tables (collections) beforehand - you just install MongoDB, start writing data, and it gets stored.

Re:Do you need a database? (5, Insightful)

Richard_at_work (517087) | about 6 months ago | (#46703099)

I think many people get stuck in thinking "one single database, thats it, my initial decision condemns me forever", when in-fact theres no shame in having many databases.

Stick the raw data into one database, choose the database that suits that.

Transform the data from the raw database into something you can use day to day, thats well structured etc, choose the database for that.

Transform the data from the day to day schemas into something that more suitable for archiving and long term reporting, again choose the database for that.

You don't have to have one single database type, every particular one has its strengths, so use them!

Re:Do you need a database? (2)

Jody Bruchon (3404363) | about 6 months ago | (#46703247)

Create a table, get a POST, Insert contents of POST into table...I don't really see how this isn't the best way to do it.

Re:Do you need a database? (2, Informative)

Anonymous Coward | about 6 months ago | (#46703711)

>For storing and querying arbitrarily-structured data, which is what the submitter seems to be wanting
I dunno. I read TFS and it looks more like he wants rows of tabular data. Were this a STX site, I'd vote to close as too broad since he hasn't actually said anything useful about what he's storing.

So default answer to "Which NoSQL database should I use?" is always "Don't use NoSQL."

Re:Do you need a database? (1)

Lennie (16154) | about 6 months ago | (#46703103)

There are a whole lot of things PostgreSQL was less user friendly, but they take their time and keep improving it in a consistent way. It has many, many features.

Personally I really like PostgreSQL. It scales really well.

And if there is anything missing, there might be things some people want.

But I think you'll find it will be added in the next 3 releases. 9.4 is now in development:
- upsert/merge in 9.4
- basis of logical replication in 9.4 (has been available in out of tree tools for many years), upcoming versions will built on that.

I'm not sure what people still need if those are done other than multi-master. And this is where logical replication can really help. We don't know if the developers will implement it of course. These things take effort and time.

Re:Do you need a database? (1)

jythie (914043) | about 6 months ago | (#46703567)

No need to develop your own locking system, just use whatever logging functionality the server has.

Re:S3 better than files on disk (2)

xelah (176252) | about 6 months ago | (#46703211)

Now scale that. Or just lock it properly.

If you want simple, scalable and low sysadmin overhead and all you need are key -> value lookups then Amazon's S3 can be an excellent choice. You don't need to manage it, you don't need to work out how to add servers and its well proven at extremely large scales.

However, like a lot of other posters, I'm very sceptical that NoSQL is the place to start. SQL databases can do a LOT for you, are very robust and can scale very considerably. As your requirements grow you might find yourself wanting things like indexes, transactions, referential integrity, the ability to manually inspect and edit data using SQL and the ability to store and access more complex structures. You're likely to give yourself a lot of pain if you go straight for NoSQL, and even if you DO need to scale later combining existing SQL and new NoSQL data stores can be a useful way to go.

Use PostgreSQL (5, Informative)

Anonymous Coward | about 6 months ago | (#46702803)

If you need to store less than a few hundred million rows just use PostgreSQL.
It supports JSON and transactions.

Re:Use PostgreSQL (4, Insightful)

Lennie (16154) | about 6 months ago | (#46702873)

Yes, that is what I would wanted to point out too.

Also in PostgreSQL 9.4 it has jsonb which is, in certain tests less than a year ago, faster than MongoDB.

Re:Use PostgreSQL (2)

Lennie (16154) | about 6 months ago | (#46702883)

Also if you want a key/value store, there is also http://symas.com/mdb/ [symas.com] from a company of some of the OpenLDAP developers.

Which really seems to be have the fastest read performance of them all.

Re:Use PostgreSQL (1)

Anonymous Coward | about 6 months ago | (#46703041)

Object languages like PHP and relational databases have impotence mismatch.

Re:Use PostgreSQL (1)

zauberberg51 (1015659) | about 6 months ago | (#46703707)

impotence => impedence for those who are not too impudent

native populations die after meeting us (-1)

Anonymous Coward | about 6 months ago | (#46702809)

who knows why http://www.youtube.com/results?search_query=unrepentant&sm=3 we're buggy?

Another troll by Soulskill (-1)

Anonymous Coward | about 6 months ago | (#46702817)

Christ all mighty.... slashdot just gets worse and worse.

CouchBase (0)

Anonymous Coward | about 6 months ago | (#46702823)

CouchBase/CouchDB is probably the easiest and most available one out there. It's particularly well suited for app backends too, as both the backend and mobile apps can talk to the same database, in theory eliminating the need for the backend to handle data syncing.

One caveat though, the last time i used Couch ( which was a few years back now ) I encountered problems with its map/reduce implementation. Specifically, you cant ( or at least couldn't at the time ) do chain map/reduces, which severely limits how you can query your data. With the requirements you listed, you should be fine though.

Re:CouchBase (2)

grcumb (781340) | about 6 months ago | (#46703499)

CouchBase/CouchDB is probably the easiest and most available one out there. It's particularly well suited for app backends too, as both the backend and mobile apps can talk to the same database, in theory eliminating the need for the backend to handle data syncing.

Those are good reasons, and it's also true that CouchDB will use a lot less resource overhead than a full-bore RDBMS under load. Depending on the use case, it might also prove decidedly easier to scale.

But the place where NoSQL really shines is storing amorphous or heterogeneous data. Because you have no constraints about what goes into a given record, you can record more or less name/value pairs at your whim. As with Perl, though, freedom comes at the cost of potential disorder.

But honestly, with the tiny amount of detail provided, it seems like it's really six of one and half a dozen of the other. If it's just call data being recorded, and the same call data every time, it won't make a huge difference if you use a full-blown RDBMS or a NoSQL database. Either one has its costs (individual PUTs and POSTs in CouchDB for example, can be expensive, whereas queuing and write contention might cause headaches at extreme scales in PostGres or Oracle).

Both an RDBMS and a NoSQL database will deal with replication fairly well, though my personal inclination is to prefer the simplicity of replication in CouchDB right up until the noise level gets out of hand.

Sounds like you need a database (5, Insightful)

Anonymous Coward | about 6 months ago | (#46702825)

You might want to consider a SQL database.

Please specify a better scenario (2)

prefec2 (875483) | about 6 months ago | (#46702829)

Based on your information no one can give you solid advice. It highly depends on the load you expect and on the data model you will use. for a simple twitter, you can use a log file, or any NoSQL technology. If you only have a few transactions and not billions of entries, you could use PostgreSQL or even MySQL. However, PostgreSQL scales better. If you want to make complex interpretations on graph like data you may consider Neo4J as a graph DB.

Re:Please specify a better scenario (4, Insightful)

OzPeter (195038) | about 6 months ago | (#46703147)

Based on your information no one can give you solid advice.

IMHO the question is deliberately designed to be vague. iPhones and Android devices, PHP and Ruby On Rails .. that is such a shotgun blast of specifications that are totally unrelated to the DB use on the back end that the entire question smells of click bait to me.

Re:Please specify a better scenario (5, Insightful)

khchung (462899) | about 6 months ago | (#46703435)

Based on your information no one can give you solid advice.

IMHO the question is deliberately designed to be vague. iPhones and Android devices, PHP and Ruby On Rails .. that is such a shotgun blast of specifications that are totally unrelated to the DB use on the back end that the entire question smells of click bait to me.

Either that, or the OP simply have no idea how databases work at all.

If OP has any idea how database (any database, not just relational) works, he would be talking about data and transaction volumes, access patterns, transactional requirements, data integrity constraints, retention and housekeeping requirements, etc.

Instead, as you said, he talked about devices platforms, communication protocols, language and runtime environment which are all irrelevant to choosing database. (ok, the last may be a bit relevant depending on which database used)

Re:Please specify a better scenario (1)

jythie (914043) | about 6 months ago | (#46703585)

And here I am out of mod points.

At first reading something seemed off about the question, and I think you summed it up nicely.

To me it comes across a bit as the OP asking 'I need some vaguely authoritative sounding reasons for a sexy solution, look at my keywords and tell me what is "in" with that community'

Re:Please specify a better scenario (1)

DorianGre (61847) | about 6 months ago | (#46703745)

Not bait, simply dipping my tow into the NoSQL waters. I have been a developer for almost 20 years now and can spin this up with a SQL database in under an hour. The big thing here is that it be highly scalable (thus the iphone/andriod - you never know how big these will get, or how fast) and we are able to get some kind of structured time based reports out on the back end.

NoSQL? (5, Insightful)

aaaaaaargh! (1150173) | about 6 months ago | (#46702839)

I would like to start with a NoSQL solution for scaling

And there it is, the proverbial premature optimization ...

Re:NoSQL? (1)

louaish88 (731196) | about 6 months ago | (#46702923)

But, but, but NoSQL is Webscale!

Re:NoSQL? (4, Funny)

gnoshi (314933) | about 6 months ago | (#46703187)

Shards! It has shards!

Re:NoSQL? (3, Funny)

VortexCortex (1117377) | about 6 months ago | (#46703443)

Shards! It has shards!

Heal The Dark Crystal, Gelfling!

Only then can the two be made one! [mypopescu.com]

Re:NoSQL? (1)

Wootery (1087023) | about 6 months ago | (#46703691)

It's true! [youtube.com]

Re:NoSQL? (2)

mwvdlee (775178) | about 6 months ago | (#46702941)

Being able to scale from 1 billion records a day to 10 billion a day does not a premature optimization make.

The simple fact is that there's not enough information to give any reasonable advise.

NoSQL (0)

Anonymous Coward | about 6 months ago | (#46703083)

Which is why the question is just technological masterbation

Re:NoSQL? (-1)

Anonymous Coward | about 6 months ago | (#46703277)

I call bullshit. There's a very clear clue in OP's RFA: I'll need to be able to pull by date or by a number of key fields, as well as do trend reporting over time on the totals of a few fields. And the winner is, use a relational db. Always use a relational db until you know that you don't need the flexibility or that it can't keep up.

Also, look at what he's logging connections to. Really, you think inserting one row is going to be the scaling chokepoint in that setiup? LOL!

Re:NoSQL? (1)

DorianGre (61847) | about 6 months ago | (#46703753)

We don't know how big to scale. A few thousand users, a few million?? Apps in the wild are like this sometimes. New to NoSQL and really just wanted a good place to start with a platform that would let us scale. I don't want the Oracle on million dollar hardware problem again.

Re:NoSQL? (5, Insightful)

Sarten-X (1102295) | about 6 months ago | (#46703397)

As an expert (relative to most of Slashdot) in NoSQL databases, with a significant amount of experience in Hadoop and HBase systems, I agree wholeheartedly.

NoSQL solutions can be ridiculously fast and scale beautifully over billions of rows. Under a billion rows, though, and they're just different from normal databases in various arguably-broken ways. By the time you need a NoSQL database, you'll be successful enough to have a well-organized team to manage the transition to a different backend. For a new project, use a RDBMS, and enjoy the ample documentation and resources available.

Re:NoSQL? (1)

tigersha (151319) | about 6 months ago | (#46703415)

Thank you. Someone who talks sense around here.

Re:NoSQL? (2, Insightful)

Anonymous Coward | about 6 months ago | (#46703545)

As an expert (relative to most of Slashdot) in NoSQL databases, with a significant amount of experience in Hadoop and HBase systems, I agree wholeheartedly.

NoSQL solutions can be ridiculously fast and scale beautifully over billions of rows. Under a billion rows, though, and they're just different from normal databases in various arguably-broken ways. By the time you need a NoSQL database, you'll be successful enough to have a well-organized team to manage the transition to a different backend. For a new project, use a RDBMS, and enjoy the ample documentation and resources available.

Agreed. I used a NoSQL database on a project I'm working on at the moment, and stick by that decision even though I don't even have millions of row, but my situation is somewhat different to the OP's: my data model is very difficult to map to SQL (I have hundreds of different entity types, each of which has different field storage requirements, and need to be able to associate between entities of different types according to a variety of rules, meaning that some entity types may have hundreds of different types of entity associated with them; SQL quite simply sucks for this kind of data, but thankfully applications where you end up with this kind of data are few and far between). OP's data sounds like an ideal candidate for storage in a relational database; he has one basic entity type, no need to make any kind of connection between entities, and apparently no complicating factors at all.

Re:NoSQL? (1)

DorianGre (61847) | about 6 months ago | (#46703761)

Thanks. This is how we were going, but as we have a blank canvas at the moment, this was a why not sort of decision to look at NoSQL solutions.

2 comments, both useless (1)

Anonymous Coward | about 6 months ago | (#46702841)

To answer the question "Which NoSQL Database For New Project?" there are 2 comments:
  - A relational database
  - A plain text file

The user gave an argument: "I would like to start with a NoSQL solution for scaling"

NoSQL is a good solution for horizontal scaling, CSV and SQL DB are not.

I would recommend MongoDB if the transactional aspect is not important for your purpose: easy to learn, easy to use.

Re:2 comments, both useless (2, Informative)

Anonymous Coward | about 6 months ago | (#46703613)

NoSQL is a good solution for horizontal scaling, CSV and SQL DB are not.

I'd like to dispute this. Based on the OP's description of his application, two things come to mind:


  •    
  • His application is mostly-write-only. He probably does not need instant query ability, but may need to be able to handle a very large number of inserts per second (assuming he's justified in his assertion that he needs scalability). For this kind of application, logging your incoming data to a plain text file (or sequentially-appended binary data file, or any other write-only plain file approach) can be a significant performance improvement. This files can then periodically (e.g. every hour, every minute, whatever time frame suits) be pulled of local storage, merged, and inserted into a central database as a batch from which read queries are performed. Single batched updates are much more efficient than large numbers of small updates.
  •    

  • His queries are easily parallelized. He needs to perform only two operations: selecting data based on simple criteria, simple numerical summarization. Both of these are trivially scaled horizontally by using systems with local SQL databases and a simple service running on the machines as nodes in a map/reduce architecture.

Blanket statements like yours above can't really be made without reference to the intended application, as some applications scale much more easily than others, and OP's sounds like it's one of the easy kind.

MongoDB (2)

timkofu (2552496) | about 6 months ago | (#46702845)

These guys are committed, meaning mongo has a future. 2.6 that came out the other day has some nice new features and many bug fixes.

Re:MongoDB (1)

Anonymous Coward | about 6 months ago | (#46703671)

Plus it's web scale. You just plug it in and it scales right up.

light (3, Insightful)

invictusvoyd (3546069) | about 6 months ago | (#46702853)

SQLite is a relational database management system contained in a C programming library. In contrast to other database management systems, SQLite is not a separate process that is accessed from the client application, but an integral part of it.

Database Scaleability. (5, Insightful)

tonywestonuk (261622) | about 6 months ago | (#46702855)

"I'll need to be able to pull by date or by a number of key fields"

So, in other words, you have already decided on key fields. If you use a database, this has things call index's, that can search billions of rows for a key field in a fraction of a second.
If you don't use something with INDEX's then you can't do this.

Where has this idea that Databases can't scale come from? - The world runs on Database for heaven sake. Do you think when you take money out of an ATM, its going to MONGODB? - And yet there are millions of ATM's and you can take money out of your VISA account in almost all of them anywhere in the world. That is called scale.

Re:Database Scaleability. (4, Insightful)

cyber-vandal (148830) | about 6 months ago | (#46702867)

Where has this idea that Databases can't scale come from?

Salesmen

Re:Database Scaleability. (1)

Anonymous Coward | about 6 months ago | (#46702901)

Also known as "Scalesmen"

Re:Database Scaleability. (1)

korgitser (1809018) | about 6 months ago | (#46702945)

b.bb...but mongodb is webscale!

Re:Database Scaleability. (0)

Anonymous Coward | about 6 months ago | (#46702993)

"I'll need to be able to pull by date or by a number of key fields"

So, in other words, you have already decided on key fields. If you use a database, this has things call index's, that can search billions of rows for a key field in a fraction of a second.
If you don't use something with INDEX's then you can't do this.

Where has this idea that Databases can't scale come from? - The world runs on Database for heaven sake. Do you think when you take money out of an ATM, its going to MONGODB? - And yet there are millions of ATM's and you can take money out of your VISA account in almost all of them anywhere in the world. That is called scale.

What if you have to use PostgreSQL? I've seen no evidence that it can scale or run multi-master.

Re:Database Scaleability. (0)

Anonymous Coward | about 6 months ago | (#46703021)

You are misinformed [sourceforge.net] .

Re:Database Scaleability. (3, Informative)

Raumkraut (518382) | about 6 months ago | (#46703029)

MongoDB has indexes.
MongoDB also lets you store and query arbitrary data, in addition to any "key fields", without having to pre-define all the possible fields. Which it seems is what the submitter asked for.

Where has this idea that "NoSQL" means "not a database" come from?

Re:Database Scaleability. (5, Insightful)

janoc (699997) | about 6 months ago | (#46703191)

Databases don't scale for people who don't understand SQL, don't understand data normalization, indexing and want to use them as flat files. Unfortunately, a way too common anti-pattern :(

The second group are too-cool-to-learn kids using the latest development tool fad on the market to build yet another Facebook/Twitter/Instagram/whatever clone ...

Re:Database Scaleability. (1)

TheDarkMaster (1292526) | about 6 months ago | (#46703335)

We have a winner here. When I saw the number of buzzwords in the article, I already thought the worst too.

Re:Database Scaleability. (1)

wvmarle (1070040) | about 6 months ago | (#46703523)

I've mis-used databases just as you describe. And continue to do so. That's fine, I'm an amateur, and I never needed to handle databases larger than a couple thousand rows. I could probably get away with tens or hundreds of thousands of rows before running into problems.

Now if I were to develop something that needed a billion rows - that's a different story, and I do know my current approach won't work and I'd have to learn a lot about databases to pull it off. And submitter is obviously trying to do that (or at least something that needs a few rows and hoping it grows larger than Facebook and Google combined, so he needs scalability). Also I believe submitter doesn't really know what he's talking about.

If you really need to be able to handle that kind of data sets, and have even just a subset of the skills needed, you don't come to Slashdot for advice. You'd know who to ask - a friend or colleague who does just that.

So submitter may have big dreams, he almost certainly doesn't have the skills to have even a fighting chance of making it. And with that I don't need the actual database management skills, but the skills of knowing where your weaknesses are, knowing who can fill those gaps, and asking those people (maybe by having a discussion over a beer, or by hiring them outright).

Re:Database Scaleability. (1)

Anonymous Coward | about 6 months ago | (#46703673)

The ATM using MONGODB would explain why I never have any money in my account.

The Slashdot logo says... (0)

LookIntoTheFuture (3480731) | about 6 months ago | (#46702871)

Become a fan of Slashdot on Facebook

MariaDB (2, Insightful)

Anonymous Coward | about 6 months ago | (#46702881)

I would consider using the latest release of MariaDB.

You can use it as a standard MySQL server, but they also have Cassandra NoSQL as an engine for it now (since the release of 10)... So you would be easily able to play with things on different database types and see what suits your situation better.

MongoDB obviously... (1)

kryps (321347) | about 6 months ago | (#46702885)

... since it is web scale. ;-)

https://www.youtube.com/watch?v=b2F-DItXtZs [youtube.com]

Re:MongoDB obviously... (0)

Anonymous Coward | about 6 months ago | (#46703001)

hahahahahaha webscale, hahahahaha

Re:MongoDB obviously... (1)

jythie (914043) | about 6 months ago | (#46703637)

I was hoping someone would post that ^_^ always good for a laugh.

Elastic Search (1)

Anonymous Coward | about 6 months ago | (#46702897)

If you're going to need search at some point you should just opt for Elastic Search from the start. Yeah, it's a search engine, but it's also a rather good key/value store.

Re:Elastic Search (1)

beerbear (1289124) | about 6 months ago | (#46703053)

I second this. Easy to set up, easy to use.

Didn't know DICE owned stackoverflow too! (0)

Anonymous Coward | about 6 months ago | (#46702919)

Must be nice to be dice!

Don't ask this on Slashdot (0)

dr.Flake (601029) | about 6 months ago | (#46702921)

If you're question is relatively simple, aimed at the general slashdot crowd, than the answer is that you need to hire someone who knows database implementations

If the question is complex enough for an experienced database implementer, he/she would know where to post that question. And it is not here.

As you can read above. The answer is simple. The problem non-existing for experienced implementers.

Short Intro (5, Informative)

emblemparade (774653) | about 6 months ago | (#46702933)

It's a mistake to think that "NoSQL" is a silver bullet for scalability. You can scale just fine using MySQL (FlockDB) or Postresgl if you know what you're doing. On the other, if you don't know what you're doing, NoSQL may create problems where you didn't have them.

An important advantage of NoSQL (which has its costs) is that it's schema-free. This can allow for more rapid iteration in your development cycle. It pays off to plan document structures carefully, but if you need to make changes at some point (or just want to experiment), you can handle it at the code level. You can also support older "schemas" if you plan accordingly: for example, adding a version tag or something similar that can tell your code how to handle it. So, even ignoring the dubious potential of better scalability, NoSQL can still be beneficial for your project.

More so than SQL, NoSQL database are designed for different kinds of applications, and have different strengths:

MongoDB is a really good backend engine that gives programmers lot of control over performance and its costs: if you need faster writes, you can allow for eventual integrity, or if you need faster reads, you can allow for data not being the absolute freshest. For many massive multiuser applications, not having immediately up-to-date data is a reasonable compromise. It also offers an excellent set of atomic operations, which from my experience compensate well for the lack of transactions. Furthermore, MongoDB is by far the most feature-rich of these, supporting aggregate queries and map-reduce, which again can make up for the lack of joins. It also offers good sharding tools, so if you do need to scale, you can. Again, I'll emphasize that you need a good understanding of how MongoDB works in order to properly scale. For example, map-reduce locks the database, so you don't want to rely on it too much. The bottom line is that MongoDB can offer similar features to SQL databases (though they work very differently), so it's good for first-timers.

Couchbase is very good at dispersed synchronization. For example, if parts of your database live in your clients (mobile applications come to mind), it does a terrific job at resynching itself and handling divergences. This is also "scalable," but in a quite different meaning of the term than in MongoDB.

I would also take a look at OrientDB: it's not quite as feature rich as MongoDB (and has no atomic operations), but it can work in schema-mode, and generally offers a great set of tools that can make it easy to migrate from SQL. It's query language, for example, looks a lot like SQL.

The above are all "document-oriented" databases, where you data is not opaque: the database actually does understand how your data is structured, and can allow for deep indexing and updating of your documents. Cassandra and REDIS (and Tokyo Cabinet, and BerkeleyDB) are key-value stores: much simpler databases offering fewer querying features: your data is simply a blob as far the engine is concerned. I would be less inclined to recommend them unless your use case is very specific. Where appropriate, of course simpler is better. With these kinds of databases, there are actually very few ways in which you can create an obstacle for scalability: simply because they don't do very much, from a programming perspective.

There are also in-between databases that are sometimes called "column-oriented": Google and Amazon's hosted big data services are both of this type. Your data is structured, but the structure is flat. Generally, I would prefer full-blown "document-oriented" databases, such as MongoDB and OrientDB. However, if you're using a hosted service, you might not have a choice.

It's also entirely possible to mix different kinds of databases. For example, use MongoDB for your complex data and use REDIS for a simple data store. I've even seen sophisticated deployments that very smartly archive data from one DB to another, and migrate it back again when necessary.

Re:Short Intro (1)

St.Creed (853824) | about 6 months ago | (#46703531)

Any relational database can also do "schemaless" models, by using the EAV (anti-)pattern. Mainly this conveys a lack of understanding of your data and a lack of planning and design in your datamodel, but hey, it happens. The fun thing is that you still get all those nice database features like parallel processing, concurrency, SQL, ACID transactions if you want them, security and maintenance tooling, etc.

And if you use a modern database like SQL 2014 or Oracle's latest, you will get column-based compression (okay, it still sucks in SQL Server 2014, but it's a start), so the whole issue with extending sparse schema's is moot. If you use the 6th normal form it's not an issue anyway since that implements column-based compression by modeling it.

What you say is of course correct. It's just that for people who have a nice toolbox with all kinds of data models, relational databases go a lot further than most people think.

Re:Short Intro (3, Insightful)

Brian Nelson (3610471) | about 6 months ago | (#46703703)

And any text file can be transnational if you write your code right. We can keep going down this road about how you don't /need/ X technology, but nobody wins. It's really OK to see the good in different technologies.

JUST USE POSTGRES (0)

Anonymous Coward | about 6 months ago | (#46702937)

Seriously - JUST USE POSTGRES - there is virtually nothing that it can't do.

Re:JUST USE POSTGRES (1)

Tanaka (37812) | about 6 months ago | (#46703141)

I like Postgres, and I like MongoDB too. Both have their strengths. Best tool for the job I say.

The great thing about MongoDB is you can install two or three servers in different datacenters, and have redundancy out of the box. It's really simple. And you can scale horizontally if you need to without any downtime.

The last time I looked at Postgres, to do the same, you had to use third party solutions, and the client side drivers didn't support it. Is it any better now?

Re:JUST USE POSTGRES (1)

VortexCortex (1117377) | about 6 months ago | (#46703599)

The great thing about MongoDB is you can install two or three servers in different datacenters, and have redundancy out of the box. It's really simple. And you can scale horizontally if you need to without any downtime.

I've never had to use 3rd party solutions to implement horizontal scaling, replication, pooling, clustering, etc. with Postgresql. I have often had to demand changes of 3rd party vendor-lockin-ware, or add a kludge myself to fit a business's needs. RTFM application used to be far more common [catb.org] , but seems to have fallen out of fashion of late as more programmers and DBAs are increasingly discovered not to be hackers. Did you know Postgresql supports NoSQL features via HStore and JSON?

Much experience has shown that it's better to look well before leaping rather than hop on the buzz-wagon then try adding wings on the fly. The problem with one-size-fits-all methodology is that when one designs a system with everyone in mind, one has actually designed it for no one at all. What happens when that "simple" redundancy solution meets a more complex problem space is that you're left with folks who didn't understand the issue in the first place trying to fix the problems they've caused.

Re:JUST USE POSTGRES (1)

VortexCortex (1117377) | about 6 months ago | (#46703489)

Seriously - JUST USE POSTGRES - there is virtually nothing that it can't do.

Indeed. With its native JSON type and HStore Key/Value store it has NoSQL features. Given Postgresql's ability to cluster, pool, and replicate it also scales quite well. IMO, it doesn't make sense to abandon all relational DB features in a NoSQL only solution (especially right off the bat) when you can have both. Postresql may just be the droids you are looking for.

Just Use SQL (5, Insightful)

Anonymous Coward | about 6 months ago | (#46702939)

I just felt I have to comment on this. So many developers start with the phrase "I need NoSQL so I can scale" and almost all of them are wrong. The chances are your project will never ever ever scale to the kind of size where the NoSQL design decision will win. Its far more likely that NoSQL design choice will cause far more problems (performance etc), than the theoretical scaling issues.

Take for example two systems I've been involved with for managing WiFi access to large scale networks (100,000+ concurrent users, 1000's of APs), one uses MongoDB the other based on PostgresSql. The MongoDB based solution has very real performance problems, its reporting takes a very long time to run taking very large amounts of system ram (24G in some cases) and that performance is only degrading as the system grows, there are also many other performance issue. These issues are not just mongo issues but simply that NoSQL is not well suited to the task. The system has been rewritten using an SQL backend and now works much better but importantly it's scaling but better. Growth in the system is no-longer degrading performance and the point where we need hardware upgrades or extra servers etc are now much more predictable so we can predict cost base growth in relation to user growth.

NoSQL does not guarantee scaling, in many cases it scales worse than an SQL based solution. Workout what your scaling problems will be for your proposed application and workout when they will become a problem and will you ever reach that scale. Being on a bandwagon can be fun, but you would be in a better place if you really think through any potential scaling issues. NoSQL might be the right choice but in many places I've seen it in use it was the wrong choice, and it was chosen base on one developers faith that NoSQL scales better rather than think through the scaling issues.

Avoid NoSQL for new projects. (0)

Anonymous Coward | about 6 months ago | (#46702949)

Fundamentally the single-key document store databases are built on the compare-and-swap primitive. This means that the data structure being implemented, i.e. the one that must support the application's write cases, must be designed up front and won't be amenable to incremental development. Not to mention that designing such a data structure is far more difficult than laying down some CREATE TABLE statements and figuring out what it is exactly that the application prototype is supposed to do.

But also avoid MySQL. It's not good at all. SQLite will also lead you astray.

Re:Avoid NoSQL for new projects. (0)

Anonymous Coward | about 6 months ago | (#46703069)

This means that the data structure being implemented, i.e. the one that must support the application's write cases, must be designed up front and won't be amenable to incremental development. Not to mention that designing such a data structure is far more difficult than laying down some CREATE TABLE statements and figuring out what it is exactly that the application prototype is supposed to do.

How exactly is "laying down some CREATE TABLE statements" not having to have your data structures "designed up front"?

Re:Avoid NoSQL for new projects. (0)

Anonymous Coward | about 6 months ago | (#46703119)

Good question. The difference is that SQL DDL doesn't specify the update semantics of foreign keys as CAS operations, where a working prototype NoSQL design would. It comes down to the fundamental operation, the one that's the reason why lock-free and wait-free algorithms are so tricky.

Why not use Sqlite? (0)

Anonymous Coward | about 6 months ago | (#46702965)

It's lightweight, fast and supports reasonably complicated queries. Not sure why you need a NoSQL database when you clearly need to Query by key fields.

PostresSQL or Riak (1)

imbaczek (690596) | about 6 months ago | (#46703031)

Postgres might carry you further than you imagine with hstore and json extensions. I'd also try Riak if you really want NoSQL.

Are you sure you require a NoSQL solution? (0)

Anonymous Coward | about 6 months ago | (#46703059)

If you're going to be doing analysis and totalling, then a traditional SQL database may be the better option.

hyperdex (1)

fredan (54788) | about 6 months ago | (#46703115)

take a look at hyperdex if your are looking for a NoSQL DB: http://www.hyperdex.org/ [hyperdex.org]

Big mistake (5, Insightful)

msobkow (48369) | about 6 months ago | (#46703139)

Telecommunications data is eminently suitable to schema table storage in any relational database, which with a little work, will let you index by the keys you intend to query by.

NoSQL solutions are better for unstructured data that doesn't come in predictable formats or value sets.

You need to take a step back and look at the problem before you decide on a solution. Don't be one of those idiots who tries to use a hammer to drive a screw.

Re:Big mistake (0)

Anonymous Coward | about 6 months ago | (#46703461)

Excuse me; we have been driving screws with an impact device, (hammer), at the rate of a million per day. The proper thread geometry/material with the correct impulse twists the screw right in. Been doing it for a decade. It will set the screw 5x faster than using a torque.

SQLite (1)

jchevali (171711) | about 6 months ago | (#46703167)

SQLite

Which luxury yacht after my new project? (5, Funny)

BlackPignouf (1017012) | about 6 months ago | (#46703233)

"I'm working on a new independent project. It will soon become the new Facebook, and I'll be billionaire next quarter. The only problem is that I don't know which luxury yacht to buy with all this money. I've been looking at Lady Moura, Christina O, Pelorus, Venus and others. What do you recommend? What problems have you run into with the ones you've tried?"

Re:Which luxury yacht after my new project? (5, Funny)

coofercat (719737) | about 6 months ago | (#46703659)

Pff! All that soon-to-have money and yet no imagination, huh? Buy an old diesel Navy submarine and have it refitted. Maybe cut some windows into the hull - that'll mean you can only go down to maybe 50 metres instead of 350, but that's still plenty, and if you get lost you can just look out of the windows to see where you are without having to worry about using sonar.

I'd imagine surfacing your submarine in Monaco's marina will turn far more heads than your ridiculous yacht moored a mile offshore ;-) (besides, a submarine is phallically shaped, so works better in metaphorical dick measuring competitions)

Oh, and be sure to use Postgres or MySQL for your on-board systems - it'll scale plenty well for a long time before you need to go all 'web scale' with a NoSQL DB.

Redis (0)

Anonymous Coward | about 6 months ago | (#46703275)

Look at a disk-backed Redis configuration.

Two words (2)

ledow (319597) | about 6 months ago | (#46703377)

Premature Optimisation.

It's a TRAP (1)

Anonymous Coward | about 6 months ago | (#46703425)

Don't tell NSA how to record calls into a database! I guess they've been typing it to a excel all this time.

Small problem (0)

Anonymous Coward | about 6 months ago | (#46703451)

You can't access any phone functions or text message functions via code on an iPhone. Unless you intend this for jail broken phones you're dead in the water. You're probably dead in the water anyway as only an idiot would load an app that tracks calls. There's a very good reason Apple locked that stuff out...security.

Checklist / ArangoDB (0)

Anonymous Coward | about 6 months ago | (#46703515)

To ind the right db I wrote this checklist:
http://nosql-database.org/select-the-right-database.html

Nevertheless I love ArangoDB because of:
* K/V + JSON + Graph = 3 models available!
* Speaks ServSide JavaScript with embedded V8 Server!
* FOXX GUI can talk directly to Database
* Multicore ready
* Advanced indexing plus geo, skip-list, n-gram, !
* Tunable durability + transactions
* AQL = SQL + JSONiq + CYPHER (I do not know of a better graph+SQL language out there...)
* quasi MVCC => SSD ready
* capped collections
* Replication + sharding
* management GUI
* and tons more

peer pressure (0)

Anonymous Coward | about 6 months ago | (#46703557)

not having query or joins or ACID is so cool . everyone is doing it

When to use NoSQL (0)

Anonymous Coward | about 6 months ago | (#46703663)

From your requirements
1) You require logging of information. If the 'back-end' system goes offline, what would you like to happen in the front-end? Using a filesystem for storage would remove the requirement of a 24/7 back-end database.
2) Using filesystem for storage would likely be a single file per POST. What will be the usage? If > 50k a week, you might want timestamped daily directory.
3) Trends. How often do you want these trends, immediate? More immediate, more likely move from file system to repository. And what value would you want your trend analysis to provide? As mentioned above, splunk is wonderful for basic trend reporting. Do you want deep statistical analysis, searching and querying?
4) Searching and querying I would suggest Postgres. Stats, how about R. This means you will need a ETL (extract transfer load) to separate SILOS (yeah, one day this will be solved, but not by NoSQL). If you do not know what you are collecting, or it will change often, now we might move to NoSQL. No Schema = NoSQL.

Load More Comments
Slashdot Login

Need an Account?

Forgot your password?