Ask Slashdot: Which NoSQL Database For New Project?

Become a fan of Slashdot on Facebook

Ask Slashdot: Which NoSQL Database For New Project? 272

Posted by Soulskill on Wednesday April 09, 2014 @05:12AM from the mo-sql-mo-problems dept.

DorianGre writes: "I'm working on a new independent project. It involves iPhones and Android phones talking to PHP (Symfony) or Ruby/Rails. Each incoming call will be a data element POST, and I would like to simply write that into the database for later use. I'll need to be able to pull by date or by a number of key fields, as well as do trend reporting over time on the totals of a few fields. I would like to start with a NoSQL solution for scaling, and ideally it would be dead simple if possible. I've been looking at MongoDB, Couchbase, Cassandra/Hadoop and others. What do you recommend? What problems have you run into with the ones you've tried?"

This discussion has been archived. No new comments can be posted.

Ask Slashdot: Which NoSQL Database For New Project?

Load All Comments

Search 272 Comments Log In/Create an Account

Comments Filter:

Do you need a database? (Score:3, Insightful)

by tubs ( 143128 ) writes: on Wednesday April 09, 2014 @05:17AM (#46702801)

Do you need a database to do what you're trying to do? Why not just write the information to a text file (csv or tab seperated?), and use other programs to query the data?

Share
twitter facebook
- Re:Do you need a database? (Score:5, Funny)
  
  by Anonymous Coward writes: on Wednesday April 09, 2014 @05:20AM (#46702811)
  
  Excel Spreadsheet, maybe?
  
  Parent Share
  twitter facebook
- Re:Do you need a database? (Score:5, Insightful)
  
  by mwvdlee ( 775178 ) writes: on Wednesday April 09, 2014 @05:49AM (#46702927) Homepage
  
  Basically the question is; what's the expected volume of records and fields per records?
  A solution for 100 records a week with 4 fields each would be different from 1000 records per second with 30 fields each.
  1000 records/sec with 4 fields would be yet another solution.
  
  Parent Share
  twitter facebook
  - Re: (Score:3, Informative)
    
    by DorianGre ( 61847 ) writes:
    
    We are looking at 99% incoming data, 10-12 fields, 1000-2000 per session per week, X as many users as we can get.
    - Re: (Score:3)
      
      by NatasRevol ( 731260 ) writes:
      
      So, 10-20 thousand data points, per customer, per week?
      Or, at 100 customers, 50-100 million data points per year?
      Get a real database. And some real horsepower.
    - Re: (Score:2)
      
      by aoteoroa ( 596031 ) writes:
      
      We are looking at 99% incoming data, 10-12 fields, 1000-2000 per session per week, X as many users as we can get.
      
      Our company's accounting system uses Mongo on the backend. With about 30 users, and a database that is 7 GB Mongo performs well and sounds like it would fit your application.
      Having said that I agree with other posters who have suggested that if you want to plan for future growth you would be wise to consider a real database from the start. We are planning a migration to PostgreSQL this year.
- Re:Do you need a database? (Score:4, Interesting)
  
  by Richard_at_work ( 517087 ) writes: on Wednesday April 09, 2014 @06:00AM (#46702959)
  
  Theres probably an element of multithreaded access that needs to be taken into consideration here - writing to a single text file may get you into issues if the receiving webserver is multithreaded, meaning the threads will either have to queue for write locks, or write to a different file.
  Database engines don't have this issue, so while it may be overkill, there may be reasons to have one irregardless.
  
  Parent Share
  twitter facebook
  - - Re: (Score:2, Informative)
      
      by Anonymous Coward writes:
      
      "Irregardless" is not a word, you nigger."
      Merriam-Webster:
      irregardless
      irregardless
      adverb \ir-i-gärd-ls\
      Definition of IRREGARDLESS
      Usage Discussion of IRREGARDLESS
      Irregardless originated in dialectal American speech in the early 20th century. Its fairly widespread use in speech called it to the attention of usage commentators as early as 1927. The most frequently repeated remark about it is that “there is no such word.” There is such a word, however. It is still used primarily in speech, alth
      - Re:Do you need a database? (Score:5, Funny)
        
        by funwithBSD ( 245349 ) writes: on Wednesday April 09, 2014 @10:42AM (#46704627)
        
        You ain't supposed to use it.
        
        Parent Share
        twitter facebook
- Re:Do you need a database? (Score:5, Insightful)
  
  by FyRE666 ( 263011 ) writes: on Wednesday April 09, 2014 @06:07AM (#46702979) Homepage
  
  Please don't do this (use a flat file) to store data for a web app that's likely to be accessed by more than one device at a time. Unless you implement your own file locking mechanism, you'll eventually end up with corrupt entries. Even if you do implement your own locking scheme, it's probably not going to be as efficient as using a DB. It's a 5 minute job to set up a new MySQL DB and associated query to push data in, then you can filter and report on it much more easily. It's something DBs are very good at!
  Unless you have a specific need to scale horizontally, it's generally better to stick with a SQL DB for web apps. I've used MySQL, PostgreSQL and Oracle for this. MySQL is by far the easiest to work with, hence its popularity. I don't actually know of any advantage to using PostgreSQL; it doesn't perform any better, and is (or at least used to be) much less user friendly.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by Raumkraut ( 518382 ) writes:
    
    For storing and querying arbitrarily-structured data, which is what the submitter seems to be wanting, a traditional relational SQL database is not necessarily the best way to do it.
    And if anything, MongoDB is easier to start using than any relational database, IME. No need to create databases, schemas, or tables (collections) beforehand - you just install MongoDB, start writing data, and it gets stored.
    - Re:Do you need a database? (Score:5, Insightful)
      
      by Richard_at_work ( 517087 ) writes: on Wednesday April 09, 2014 @06:43AM (#46703099)
      
      I think many people get stuck in thinking "one single database, thats it, my initial decision condemns me forever", when in-fact theres no shame in having many databases.
      Stick the raw data into one database, choose the database that suits that.
      Transform the data from the raw database into something you can use day to day, thats well structured etc, choose the database for that.
      Transform the data from the day to day schemas into something that more suitable for archiving and long term reporting, again choose the database for that.
      You don't have to have one single database type, every particular one has its strengths, so use them!
      
      Parent Share
      twitter facebook
    - Re: (Score:3)
      
      by Jody Bruchon ( 3404363 ) writes:
      
      Create a table, get a POST, Insert contents of POST into table...I don't really see how this isn't the best way to do it.
    - Re: (Score:2, Informative)
      
      by Anonymous Coward writes:
      
      >For storing and querying arbitrarily-structured data, which is what the submitter seems to be wanting
      I dunno. I read TFS and it looks more like he wants rows of tabular data. Were this a STX site, I'd vote to close as too broad since he hasn't actually said anything useful about what he's storing.
      So default answer to "Which NoSQL database should I use?" is always "Don't use NoSQL."
    - Re: (Score:2)
      
      by oh_my_080980980 ( 773867 ) writes:
      
      Which is why he might as well use a flat file. If he has structure, then an RDMS is what he should use. If he's not going to bother to organize the information, then a flat file would be perfect because all you are after is junk anyways.
  - Re: (Score:2)
    
    by Lennie ( 16154 ) writes:
    
    There are a whole lot of things PostgreSQL was less user friendly, but they take their time and keep improving it in a consistent way. It has many, many features.
    Personally I really like PostgreSQL. It scales really well.
    And if there is anything missing, there might be things some people want.
    But I think you'll find it will be added in the next 3 releases. 9.4 is now in development:
    - upsert/merge in 9.4
    - basis of logical replication in 9.4 (has been available in out of tree tools for many years), upcoming v
  - Re: (Score:2)
    
    by jythie ( 914043 ) writes:
    
    No need to develop your own locking system, just use whatever logging functionality the server has.
  - Re: (Score:2)
    
    by K. S. Kyosuke ( 729550 ) writes:
    
    I've used MySQL, PostgreSQL and Oracle for this. MySQL is by far the easiest to work with, hence its popularity.
    What about Firebird? Actual transactions - even transactional lazy schema updates -, single-file databases, reasonable tools, almost invisible maintenance, everything virtually idiot-proof. Even LibreOffice wants to switch to embedded Firebird for its native database engine. I can't imagine MySQL being anything other than PITA compared to Firebird.
- Re:S3 better than files on disk (Score:3)
  
  by xelah ( 176252 ) writes:
  
  Now scale that. Or just lock it properly.
  If you want simple, scalable and low sysadmin overhead and all you need are key -> value lookups then Amazon's S3 can be an excellent choice. You don't need to manage it, you don't need to work out how to add servers and its well proven at extremely large scales.
  However, like a lot of other posters, I'm very sceptical that NoSQL is the place to start. SQL databases can do a LOT for you, are very robust and can scale very considerably. As your requirements grow you
- Re: (Score:2)
  
  by Art3x ( 973401 ) writes:
  
  "Think of SQLite not as a replacement for Oracle but as a replacement for fopen()" --- About [sqlite.org]
- Re: (Score:3)
  
  by tubs ( 143128 ) writes:
  
  When I read the post the first thought that came to me was "log files" - you mention date & time, a "number" of fields and "few" fields for reporting. It still sounds like a log file from everything that is said. Indeed, just change from POST to GET and you can use the web server logs :-)
  But, why not build into the design that you may change the "backend" database without having to worry about what is at the backend?
- - Re:Do you need a database? (Score:5, Informative)
    
    by DarkOx ( 621550 ) writes: on Wednesday April 09, 2014 @06:31AM (#46703047) Journal
    
    I disagree, he is concerned about scaling. The last thing in the world he should do is use a bunch of flat files, unless he really just needs to store the data, but he already said he needs to do reports and totals on it.
    Also he is working in Ruby. The smart thing for him to do IMHO is write his program against ruby/DBI. It isn't the pretty database api, but it supports plenty of different backend options and it does not sound like his program needs especially complex database operations or queries. He can start working with something like SQLite as the database "server", and move up to something else, perhaps Postgress (which can be every bit as fast as the NOSQL solutions unless you are getting highly highly custom) without needing to alter his program.
    
    Parent Share
    twitter facebook
    - Re:Do you need a database? (Score:5, Insightful)
      
      by boristdog ( 133725 ) writes: on Wednesday April 09, 2014 @09:19AM (#46703839)
      
      As someone who is currently trying to convert a 20 year-old, multi-million-entry flat files DB into a real DB for a major corporation without bringing the corporation to its knees I heartily concur with NOT using flat files if there is ANY chance of this growing beyond a few hundred entries.
      By now hundreds of applications are using the old flat file DB, I have so much re-coding to do that I will probably retire before it is all complete.
      
      Parent Share
      twitter facebook
    - Comment removed (Score:5, Insightful)
      
      by account_deleted ( 4530225 ) * writes: on Wednesday April 09, 2014 @09:21AM (#46703861)
      
      Comment removed based on user account deletion
      
      Parent Share
      twitter facebook
      - Re: (Score:2)
        
        by oh_my_080980980 ( 773867 ) writes:
        
        Amen brother.
    - Re: (Score:2)
      
      by oh_my_080980980 ( 773867 ) writes:
      
      NoSQL is a flat file, so it's the same thing. He's not going to be organizing the data in any meaningful way with NoSQL, it's just a dumping ground.
    - Can you separate data collection from reporting? (Score:2, Interesting)
      
      by Anonymous Coward writes:
      
      If the goal really is just to amass data and then do offline reports on it (not completely clear from the question) then I can report that at my company we've been doing this at scale for over five years. Here's how:
      * A bunch of web servers accept data and append it to a local disk file.
      * Every hour, that "log" is pushed from each host into HDFS and a new log file started. (HDFS as in the Hadoop Distributed Filesystem)
      * Querying is done later, using Hive with a custom deserializer that natively understands
  - Re: (Score:2)
    
    by funwithBSD ( 245349 ) writes:
    
    Way overkill for the project, way underkill for the CV builder.
- - Re: (Score:3)
    
    by jythie ( 914043 ) writes:
    
    *gasp* a sensible solution using readily available mature tools? *faints*
Use PostgreSQL (Score:5, Informative)

by Anonymous Coward writes: on Wednesday April 09, 2014 @05:17AM (#46702803)

If you need to store less than a few hundred million rows just use PostgreSQL.
It supports JSON and transactions.

Share
twitter facebook
- Re:Use PostgreSQL (Score:5, Insightful)
  
  by Lennie ( 16154 ) writes: on Wednesday April 09, 2014 @05:37AM (#46702873)
  
  Yes, that is what I would wanted to point out too.
  Also in PostgreSQL 9.4 it has jsonb which is, in certain tests less than a year ago, faster than MongoDB.
  
  Parent Share
  twitter facebook
  - Re: (Score:3)
    
    by Lennie ( 16154 ) writes:
    
    Also if you want a key/value store, there is also http://symas.com/mdb/ [symas.com] from a company of some of the OpenLDAP developers.
    Which really seems to be have the fastest read performance of them all.
- Re: (Score:2)
  
  by NatasRevol ( 731260 ) writes:
  
  Unfortunately, as the submitter gave details later.
  Each customer is about 500k data points per year.
  Thousands of customers is a few hundred million rows, per year.
  - Re: (Score:2)
    
    by rtaylor ( 70602 ) writes:
    
    Right. So 5 years from requiring a NoSQL DB, and hardware/software advancements in that period will likely give another 3 years of easy growth with just a basic Pg installation.
    If it was 10m text/blob records per day, that would be a different animal; but it's probably 1/10th of that.
    - Re: (Score:2)
      
      by rycamor ( 194164 ) writes:
      
      A few hundred million rows is no trouble to PostgreSQL, if configured right. And if you go beyond that there are some great ways to deal with the problem:
      1. Partitioning [postgresql.org]: Make a large table composed of smaller subset tables. This is a great way to deal with what is primarily historical data, since you can partition by month, quarter, or whatever time period makes sense for your application. Then, when it comes time to archive or delete old data, all you have to do is migrate that month's table to the archiv
Sounds like you need a database (Score:5, Insightful)

by Anonymous Coward writes: on Wednesday April 09, 2014 @05:23AM (#46702825)

You might want to consider a SQL database.

Share
twitter facebook
- Re: (Score:2)
  
  by wiredlogic ( 135348 ) writes:
  
  But he needs something that's webscale. Probably will need sharding too.
Please specify a better scenario (Score:3)

by prefec2 ( 875483 ) writes: on Wednesday April 09, 2014 @05:25AM (#46702829)

Based on your information no one can give you solid advice. It highly depends on the load you expect and on the data model you will use. for a simple twitter, you can use a log file, or any NoSQL technology. If you only have a few transactions and not billions of entries, you could use PostgreSQL or even MySQL. However, PostgreSQL scales better. If you want to make complex interpretations on graph like data you may consider Neo4J as a graph DB.

Share
twitter facebook
- Re:Please specify a better scenario (Score:5, Insightful)
  
  by OzPeter ( 195038 ) writes: on Wednesday April 09, 2014 @07:00AM (#46703147)
  
  Based on your information no one can give you solid advice.
  IMHO the question is deliberately designed to be vague. iPhones and Android devices, PHP and Ruby On Rails .. that is such a shotgun blast of specifications that are totally unrelated to the DB use on the back end that the entire question smells of click bait to me.
  
  Parent Share
  twitter facebook
  - Re:Please specify a better scenario (Score:5, Insightful)
    
    by khchung ( 462899 ) writes: on Wednesday April 09, 2014 @08:07AM (#46703435) Journal
    
    Based on your information no one can give you solid advice.
    IMHO the question is deliberately designed to be vague. iPhones and Android devices, PHP and Ruby On Rails .. that is such a shotgun blast of specifications that are totally unrelated to the DB use on the back end that the entire question smells of click bait to me.
    Either that, or the OP simply have no idea how databases work at all.
    If OP has any idea how database (any database, not just relational) works, he would be talking about data and transaction volumes, access patterns, transactional requirements, data integrity constraints, retention and housekeeping requirements, etc.
    Instead, as you said, he talked about devices platforms, communication protocols, language and runtime environment which are all irrelevant to choosing database. (ok, the last may be a bit relevant depending on which database used)
    
    Parent Share
    twitter facebook
    - Re: (Score:2)
      
      by jythie ( 914043 ) writes:
      
      And here I am out of mod points.
      
      At first reading something seemed off about the question, and I think you summed it up nicely.
      
      To me it comes across a bit as the OP asking 'I need some vaguely authoritative sounding reasons for a sexy solution, look at my keywords and tell me what is "in" with that community'
  - - Re: (Score:2)
      
      by OzPeter ( 195038 ) writes:
      
      I have been a developer for almost 20 years now and can spin this up with a SQL database in under an hour.
      
      If you have have been a developer for 20 years then you should know that people will be skeptical of any question that lets them play and win Buzzword Bingo from a single sentence.
    - Re: (Score:2)
      
      by oh_my_080980980 ( 773867 ) writes:
      
      Why don't I believe you. If you can spin this up with a SQL database in under an hour, then you have your answer. The fact that you repeat "scalability" and "reporting" leads me to believe you do not understand what databases, in particular SQL databases, can do.
NoSQL? (Score:5, Insightful)

by aaaaaaargh! ( 1150173 ) writes: on Wednesday April 09, 2014 @05:29AM (#46702839)

I would like to start with a NoSQL solution for scaling
And there it is, the proverbial premature optimization ...

Share
twitter facebook
- Re: (Score:3)
  
  by mwvdlee ( 775178 ) writes:
  
  Being able to scale from 1 billion records a day to 10 billion a day does not a premature optimization make.
  The simple fact is that there's not enough information to give any reasonable advise.
  - - Re: (Score:2)
      
      by NatasRevol ( 731260 ) writes:
      
      Reports on big dbs are always the choke point.
      Lots of people, DBAs included, seem to miss this.
  - - Re: (Score:2)
      
      by rjstanford ( 69735 ) writes:
      
      Are you reporting across customers? If not, then sharding totally takes care of your problem. If so, then a combination of sharding and some meaningful aggregation may.
      It really sounds like you've already decided on a solution and are looking for affirmation rather than advice. I've regularly inserted millions of rows into a simple 3-node MySQL cluster (unsharded) every day for years... if you don't like SQL, that's fine, but what you're asking for sure sounds like a problem that a halfway competently se
- Re:NoSQL? (Score:5, Insightful)
  
  by Sarten-X ( 1102295 ) writes: on Wednesday April 09, 2014 @08:00AM (#46703397) Homepage
  
  As an expert (relative to most of Slashdot) in NoSQL databases, with a significant amount of experience in Hadoop and HBase systems, I agree wholeheartedly.
  NoSQL solutions can be ridiculously fast and scale beautifully over billions of rows. Under a billion rows, though, and they're just different from normal databases in various arguably-broken ways. By the time you need a NoSQL database, you'll be successful enough to have a well-organized team to manage the transition to a different backend. For a new project, use a RDBMS, and enjoy the ample documentation and resources available.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by tigersha ( 151319 ) writes:
    
    Thank you. Someone who talks sense around here.
  - Re: (Score:2, Insightful)
    
    by Anonymous Coward writes:
    
    As an expert (relative to most of Slashdot) in NoSQL databases, with a significant amount of experience in Hadoop and HBase systems, I agree wholeheartedly.
    NoSQL solutions can be ridiculously fast and scale beautifully over billions of rows. Under a billion rows, though, and they're just different from normal databases in various arguably-broken ways. By the time you need a NoSQL database, you'll be successful enough to have a well-organized team to manage the transition to a different backend. For a new project, use a RDBMS, and enjoy the ample documentation and resources available.
    Agreed. I used a NoSQL database on a project I'm working on at the moment, and stick by that decision even though I don't even have millions of row, but my situation is somewhat different to the OP's: my data model is very difficult to map to SQL (I have hundreds of different entity types, each of which has different field storage requirements, and need to be able to associate between entities of different types according to a variety of rules, meaning that some entity types may have hundreds of different
  - - Re:NoSQL? (Score:5, Interesting)
      
      by Sarten-X ( 1102295 ) writes: on Wednesday April 09, 2014 @11:15AM (#46704925) Homepage
      
      "Why not" is because the cost/benefit analysis is not in NoSQL's favor. NoSQL's downsides are a steeper learning curve (to do it right), fewer support tools, and a more specialized skill set. Its primary benefits don't apply to you. You don't need ridiculously fast writes, you don't need schema flexibility, and you don't need to run complex queries on previously-unknown keys. Rather, you have input rates limited by an external connection, only a few entity types, and you know your query keys ahead of time.
      
      Parent Share
      twitter facebook
- - Re:NoSQL? (Score:5, Funny)
    
    by gnoshi ( 314933 ) writes: on Wednesday April 09, 2014 @07:14AM (#46703187)
    
    Shards! It has shards!
    
    Parent Share
    twitter facebook
    - Re:NoSQL? (Score:4, Funny)
      
      by VortexCortex ( 1117377 ) writes: <VortexCortex@pro ... m minus language> on Wednesday April 09, 2014 @08:09AM (#46703443)
      
      Shards! It has shards!
      Heal The Dark Crystal, Gelfling!
      Only then can the two be made one! [mypopescu.com]
      
      Parent Share
      twitter facebook
  - Re: (Score:2)
    
    by Wootery ( 1087023 ) writes:
    
    It's true! [youtube.com]
MongoDB (Score:2)

by timkofu ( 2552496 ) writes:

These guys are committed, meaning mongo has a future. 2.6 that came out the other day has some nice new features and many bug fixes.
light (Score:4, Insightful)

by invictusvoyd ( 3546069 ) writes: on Wednesday April 09, 2014 @05:33AM (#46702853)

SQLite is a relational database management system contained in a C programming library. In contrast to other database management systems, SQLite is not a separate process that is accessed from the client application, but an integral part of it.

Share
twitter facebook
Database Scaleability. (Score:5, Insightful)

by tonywestonuk ( 261622 ) writes: on Wednesday April 09, 2014 @05:34AM (#46702855)

"I'll need to be able to pull by date or by a number of key fields"
So, in other words, you have already decided on key fields. If you use a database, this has things call index's, that can search billions of rows for a key field in a fraction of a second.
If you don't use something with INDEX's then you can't do this.
Where has this idea that Databases can't scale come from? - The world runs on Database for heaven sake. Do you think when you take money out of an ATM, its going to MONGODB? - And yet there are millions of ATM's and you can take money out of your VISA account in almost all of them anywhere in the world. That is called scale.

Share
twitter facebook
- Re:Database Scaleability. (Score:5, Insightful)
  
  by cyber-vandal ( 148830 ) writes: on Wednesday April 09, 2014 @05:35AM (#46702867) Homepage
  
  Where has this idea that Databases can't scale come from?
  Salesmen
  
  Parent Share
  twitter facebook
  - Re: (Score:3)
    
    by mbourgon ( 186257 ) writes:
    
    And Developers. Anything to keep those damn DBAs away.
    (Yes, I'm a DBA)
    - Re:Database Scaleability. (Score:5, Insightful)
      
      by Bacon Bits ( 926911 ) writes: on Wednesday April 09, 2014 @11:33AM (#46705089)
      
      God forbid someone make them think about their data structures and how the end user might need to query them with their own reports.
      
      Parent Share
      twitter facebook
- Re: (Score:2)
  
  by korgitser ( 1809018 ) writes:
  
  b.bb...but mongodb is webscale!
- Re:Database Scaleability. (Score:4, Informative)
  
  by Raumkraut ( 518382 ) writes: on Wednesday April 09, 2014 @06:27AM (#46703029)
  
  MongoDB has indexes.
  MongoDB also lets you store and query arbitrary data, in addition to any "key fields", without having to pre-define all the possible fields. Which it seems is what the submitter asked for.
  Where has this idea that "NoSQL" means "not a database" come from?
  
  Parent Share
  twitter facebook
- Re:Database Scaleability. (Score:5, Insightful)
  
  by janoc ( 699997 ) writes: on Wednesday April 09, 2014 @07:15AM (#46703191)
  
  Databases don't scale for people who don't understand SQL, don't understand data normalization, indexing and want to use them as flat files. Unfortunately, a way too common anti-pattern :(
  The second group are too-cool-to-learn kids using the latest development tool fad on the market to build yet another Facebook/Twitter/Instagram/whatever clone ...
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by TheDarkMaster ( 1292526 ) writes:
    
    We have a winner here. When I saw the number of buzzwords in the article, I already thought the worst too.
    - - Re: (Score:2)
        
        by TheDarkMaster ( 1292526 ) writes:
        
        Okay, I assume you are the original author of the topic. Looking the whole situation, I guess your primary problem is the ability to handle a large number of simultaneous users, correct? Databases like Postgres support this type of work, only if you had an operation of the size of Facebook you would begin to have problems. However, remember that the database is only part of the chain. You will need the application itself also has high performance (Ruby and performance are mutually exclusive). As an example,
  - Re: (Score:2)
    
    by wvmarle ( 1070040 ) writes:
    
    I've mis-used databases just as you describe. And continue to do so. That's fine, I'm an amateur, and I never needed to handle databases larger than a couple thousand rows. I could probably get away with tens or hundreds of thousands of rows before running into problems.
    Now if I were to develop something that needed a billion rows - that's a different story, and I do know my current approach won't work and I'd have to learn a lot about databases to pull it off. And submitter is obviously trying to do that (
- Re: (Score:2)
  
  by rthille ( 8526 ) writes:
  
  Where has this idea that Databases can't scale come from?
  the CAP theorem [wikipedia.org]
  Consistency, Availability, Partition-Resistance. Choose any two.
- Re: (Score:3)
  
  by Kjella ( 173770 ) writes:
  
  Where has this idea that Databases can't scale come from? - The world runs on Database for heaven sake. Do you think when you take money out of an ATM, its going to MONGODB? - And yet there are millions of ATM's and you can take money out of your VISA account in almost all of them anywhere in the world. That is called scale.
  Of course you can with lots of money in hardware and software and top notch database administrators, architects and query designers but it's a lot of hard work and expensive. The sales pitch for NoSQL is that it's built for horizontal scale-out by design, just throw more servers at it - mainstream servers, not the extremely expensive high-end servers and it'll scale almost indefinitely without having to rework everything. There's a lot of people in the "when we go viral we must be ready for it" category, wi
- - Re: (Score:2)
    
    by CadentOrange ( 2429626 ) writes:
    
    What if you have to use PostgreSQL? I've seen no evidence that it can scale or run multi-master.
    Are you high? Instagram (200 million users) uses PostgreSQL. PostgreSQL is web scale :)
MariaDB (Score:2, Insightful)

by Anonymous Coward writes:

I would consider using the latest release of MariaDB.
You can use it as a standard MySQL server, but they also have Cassandra NoSQL as an engine for it now (since the release of 10)... So you would be easily able to play with things on different database types and see what suits your situation better.
MongoDB obviously... (Score:2)

by kryps ( 321347 ) writes:

... since it is web scale. ;-)
https://www.youtube.com/watch?v=b2F-DItXtZs [youtube.com]
- Re: (Score:2)
  
  by jythie ( 914043 ) writes:
  
  I was hoping someone would post that ^_^ always good for a laugh.
Short Intro (Score:5, Informative)

by emblemparade ( 774653 ) writes: on Wednesday April 09, 2014 @05:51AM (#46702933)

It's a mistake to think that "NoSQL" is a silver bullet for scalability. You can scale just fine using MySQL (FlockDB) or Postresgl if you know what you're doing. On the other, if you don't know what you're doing, NoSQL may create problems where you didn't have them.
An important advantage of NoSQL (which has its costs) is that it's schema-free. This can allow for more rapid iteration in your development cycle. It pays off to plan document structures carefully, but if you need to make changes at some point (or just want to experiment), you can handle it at the code level. You can also support older "schemas" if you plan accordingly: for example, adding a version tag or something similar that can tell your code how to handle it. So, even ignoring the dubious potential of better scalability, NoSQL can still be beneficial for your project.
More so than SQL, NoSQL database are designed for different kinds of applications, and have different strengths:
MongoDB is a really good backend engine that gives programmers lot of control over performance and its costs: if you need faster writes, you can allow for eventual integrity, or if you need faster reads, you can allow for data not being the absolute freshest. For many massive multiuser applications, not having immediately up-to-date data is a reasonable compromise. It also offers an excellent set of atomic operations, which from my experience compensate well for the lack of transactions. Furthermore, MongoDB is by far the most feature-rich of these, supporting aggregate queries and map-reduce, which again can make up for the lack of joins. It also offers good sharding tools, so if you do need to scale, you can. Again, I'll emphasize that you need a good understanding of how MongoDB works in order to properly scale. For example, map-reduce locks the database, so you don't want to rely on it too much. The bottom line is that MongoDB can offer similar features to SQL databases (though they work very differently), so it's good for first-timers.
Couchbase is very good at dispersed synchronization. For example, if parts of your database live in your clients (mobile applications come to mind), it does a terrific job at resynching itself and handling divergences. This is also "scalable," but in a quite different meaning of the term than in MongoDB.
I would also take a look at OrientDB: it's not quite as feature rich as MongoDB (and has no atomic operations), but it can work in schema-mode, and generally offers a great set of tools that can make it easy to migrate from SQL. It's query language, for example, looks a lot like SQL.
The above are all "document-oriented" databases, where you data is not opaque: the database actually does understand how your data is structured, and can allow for deep indexing and updating of your documents. Cassandra and REDIS (and Tokyo Cabinet, and BerkeleyDB) are key-value stores: much simpler databases offering fewer querying features: your data is simply a blob as far the engine is concerned. I would be less inclined to recommend them unless your use case is very specific. Where appropriate, of course simpler is better. With these kinds of databases, there are actually very few ways in which you can create an obstacle for scalability: simply because they don't do very much, from a programming perspective.
There are also in-between databases that are sometimes called "column-oriented": Google and Amazon's hosted big data services are both of this type. Your data is structured, but the structure is flat. Generally, I would prefer full-blown "document-oriented" databases, such as MongoDB and OrientDB. However, if you're using a hosted service, you might not have a choice.
It's also entirely possible to mix different kinds of databases. For example, use MongoDB for your complex data and use REDIS for a simple data store. I've even seen sophisticated deployments that very smartly archive data from one DB to another, and migrate it back again when necessary.

Share
twitter facebook
- Re: (Score:2)
  
  by St.Creed ( 853824 ) writes:
  
  Any relational database can also do "schemaless" models, by using the EAV (anti-)pattern. Mainly this conveys a lack of understanding of your data and a lack of planning and design in your datamodel, but hey, it happens. The fun thing is that you still get all those nice database features like parallel processing, concurrency, SQL, ACID transactions if you want them, security and maintenance tooling, etc.
  And if you use a modern database like SQL 2014 or Oracle's latest, you will get column-based compression
  - Re: (Score:3, Insightful)
    
    by Brian Nelson ( 3610471 ) writes:
    
    And any text file can be transnational if you write your code right. We can keep going down this road about how you don't /need/ X technology, but nobody wins. It's really OK to see the good in different technologies.
    - Re: (Score:2)
      
      by St.Creed ( 853824 ) writes:
      
      I agree that that road isn't productive (otherwise we'd still write machine code since we can do everything in machine code), but the hint of "it's going to be on internet so I can't use and RDBMS" in the original question is silly, and that's what I react to.
      Given 3 trillion users your options are pretty much limited to horizontal scaling, no SQL etc. but most people never get that far with their applications and in that case, storing the data in a noSQL database and then getting actionable information out
Just Use SQL (Score:5, Insightful)

by Anonymous Coward writes: on Wednesday April 09, 2014 @05:52AM (#46702939)

I just felt I have to comment on this. So many developers start with the phrase "I need NoSQL so I can scale" and almost all of them are wrong. The chances are your project will never ever ever scale to the kind of size where the NoSQL design decision will win. Its far more likely that NoSQL design choice will cause far more problems (performance etc), than the theoretical scaling issues.
Take for example two systems I've been involved with for managing WiFi access to large scale networks (100,000+ concurrent users, 1000's of APs), one uses MongoDB the other based on PostgresSql. The MongoDB based solution has very real performance problems, its reporting takes a very long time to run taking very large amounts of system ram (24G in some cases) and that performance is only degrading as the system grows, there are also many other performance issue. These issues are not just mongo issues but simply that NoSQL is not well suited to the task. The system has been rewritten using an SQL backend and now works much better but importantly it's scaling but better. Growth in the system is no-longer degrading performance and the point where we need hardware upgrades or extra servers etc are now much more predictable so we can predict cost base growth in relation to user growth.
NoSQL does not guarantee scaling, in many cases it scales worse than an SQL based solution. Workout what your scaling problems will be for your proposed application and workout when they will become a problem and will you ever reach that scale. Being on a bandwagon can be fun, but you would be in a better place if you really think through any potential scaling issues. NoSQL might be the right choice but in many places I've seen it in use it was the wrong choice, and it was chosen base on one developers faith that NoSQL scales better rather than think through the scaling issues.

Share
twitter facebook
PostresSQL or Riak (Score:2)

by imbaczek ( 690596 ) writes:

Postgres might carry you further than you imagine with hstore and json extensions. I'd also try Riak if you really want NoSQL.
hyperdex (Score:2)

by fredan ( 54788 ) writes:

take a look at hyperdex if your are looking for a NoSQL DB: http://www.hyperdex.org/ [hyperdex.org]
Big mistake (Score:5, Insightful)

by msobkow ( 48369 ) writes: on Wednesday April 09, 2014 @06:58AM (#46703139) Homepage Journal

Telecommunications data is eminently suitable to schema table storage in any relational database, which with a little work, will let you index by the keys you intend to query by.
NoSQL solutions are better for unstructured data that doesn't come in predictable formats or value sets.
You need to take a step back and look at the problem before you decide on a solution. Don't be one of those idiots who tries to use a hammer to drive a screw.

Share
twitter facebook
Which luxury yacht after my new project? (Score:5, Funny)

by BlackPignouf ( 1017012 ) writes: on Wednesday April 09, 2014 @07:24AM (#46703233)

"I'm working on a new independent project. It will soon become the new Facebook, and I'll be billionaire next quarter. The only problem is that I don't know which luxury yacht to buy with all this money. I've been looking at Lady Moura, Christina O, Pelorus, Venus and others. What do you recommend? What problems have you run into with the ones you've tried?"

Share
twitter facebook
- Re:Which luxury yacht after my new project? (Score:5, Funny)
  
  by coofercat ( 719737 ) writes: on Wednesday April 09, 2014 @08:51AM (#46703659) Homepage Journal
  
  Pff! All that soon-to-have money and yet no imagination, huh? Buy an old diesel Navy submarine and have it refitted. Maybe cut some windows into the hull - that'll mean you can only go down to maybe 50 metres instead of 350, but that's still plenty, and if you get lost you can just look out of the windows to see where you are without having to worry about using sonar.
  I'd imagine surfacing your submarine in Monaco's marina will turn far more heads than your ridiculous yacht moored a mile offshore ;-) (besides, a submarine is phallically shaped, so works better in metaphorical dick measuring competitions)
  Oh, and be sure to use Postgres or MySQL for your on-board systems - it'll scale plenty well for a long time before you need to go all 'web scale' with a NoSQL DB.
  
  Parent Share
  twitter facebook
Two words (Score:3)

by ledow ( 319597 ) writes: on Wednesday April 09, 2014 @07:57AM (#46703377) Homepage

Premature Optimisation.

Share
twitter facebook
Stock inventory? (Score:2)

by biodata ( 1981610 ) writes:

Is this for your stock inventory project? If you want to do anything that involves keeping track of any goods or money or anything of value, then NoSQL is not necessarily the way to go. NoSQL is designed to keep track of value-less things like Twitter messages and Facebook postings, where it doesn't matter if you lose a few thousand transactions here or there. People keeping track of things with actual monetary value usually use SQL for the transactions, from what I've seen.
HBase (Score:3)

by scorp1us ( 235526 ) writes: on Wednesday April 09, 2014 @09:20AM (#46703855) Journal

First. everyone who is pointing out your premature optimization is probably right. You can get a lot of scalability out of existing databases, particularly if you optimize your data schema with indexes. Even if you store all possible 9,999,999,999 phone numbers, the log base-2 of that is 34. So you'll need a b-tree 34 levels deep. That's big, real big, but b-trees are fast. Worst case you are reading 34 blocks from disk, which is ~16kB.
Next, don't choose databases by name. Choose them by their features because you use features, not names. That said, HBase is probably what you want. It's a blend of distributable hadoop and tables. You don't need atomicity (it doesn't sound like) which is one thing you give up when leaving SQL behind.

Share
twitter facebook
Perhaps you should abstract your persistence model (Score:3)

by Assmasher ( 456699 ) writes: on Wednesday April 09, 2014 @09:31AM (#46703947) Journal

...so that you simply write an adapter for pushing/pulling data.
Then you don't have to worry so much about making what appears to be an extremely premature optimization.
In other words, have your backend web services (presuming you're using them and not manually POSTing from a socket yourself to your own socket server) instantiate an instance of iMyDBAdapter and use it.
Later, when you find out that you actually do need MongoDB, PostgreSQL, sharded MariaDB, whatever, you can simply write another adapter class that simply has to satisfy the iMyDBAdapter interface.
The reason this works so well is that it will force you to separate your business logic from your underlying DB implementation (which requires a lot of discipline to do otherwise, especially when you just want to get something 'done'.)
Also, as another poster pointed out, you're much more likely to suffer from other issues relating to scaling (and issues better solved elsewhere) than a modern database.
My advice, stick rigidly to the interface/adapter mechanism and implement an adapter for whichever DB you're most comfortable with right now.

Share
twitter facebook
Solution looking for a problem (Score:5, Insightful)

by luis_a_espinal ( 1810296 ) writes: on Wednesday April 09, 2014 @09:35AM (#46703991)

I would like to start with a NoSQL solution for scaling,

This is a solution looking for a problem. Or more precisely, you are looking for an excuse to use a piece of technology or paradigm. Don't get me wrong, your systems requirements might indeed be best served using a NoSQL solution, but what exactly has your analysis shown regarding this?
Scaling is not just a technical feature (NoSQL, SQL, Jedi mind-meld tricks). Scaling is a function of your architecture. You can NoSQL the shit out of your solution, but if your software and system architecture is not scalable, then having NoSQL will mean chicken poop as solutions go.
and ideally it would be dead simple if possible.
If you want simple, put a simple RDBMs schema (a properly normalized that) in place, and have your code use a simple, technology-agnostic persistence layer that maps your domain-level artifacts to database artifacts. If you ever had to replace the back-end, then you can do so with minimal changes to the API that domain-level artifacts use to persist themselves with the persistence layer.
Design your domain solution around domain-specific artifacts. Persistence technology is typically a low-level design/implementation detail, an important one obviously (and a critical one for some classes of systems).
But for what you are describing, the choice shouldn't even be coming into the picture without first having an architectural notion of your solution.

Share
twitter facebook
If you have to ask ... (Score:2)

by ehiris ( 214677 ) writes:

It means you don't have any big data requirements so you're better off sticking with MySQL or something easier to manage at a small scale.
If growth is high or you have a lot of data to analyze, you can look into importing data into Hadoop using sqoop and query it with Hive and HBase. But you most likely won't need that for at least a couple of years.
Files, flys and fries (Score:2)

by WaffleMonster ( 969671 ) writes:

Create a separate folder for each type of 'key' copying 'POST' data to files in these folders using filename as key for ... umm... lightning fast retrieval.
U should then totally think about creating other directories full of symbolic links rather than files enabling you to have many keys for reference or even generate materialized views without duplicating data.
Since you would be using a query language that is not SQL it is guaranteed to scale to infinity and beyond... (inodes sold separately)
Make it fast, don't marry first... (Score:2)

by NotesSensei ( 997996 ) writes:

and get to know it later :-). Fast here: your prototype creation, not primary the database I/O. The general comments are right: there is no one-fits-all solution and the database might change. It looks very much like you also haven't decided on the server platform: Ruby, PHP... you could look at node.js or vert.x too - server side JavaScript is at least neat for prototyping (I'm not making a statement that is is *only* neat for prototyping - that's a completely different discussion). We did a number of supe
MongoDB (Score:3)

by GameMaster ( 148118 ) writes: on Wednesday April 09, 2014 @11:42AM (#46705181)

Use MongoDB, it's web-scale. They produce kick-ass benchmarks by piping all your data to /dev/null.

Share
twitter facebook
Depends on the situation (Score:3)

by samwhite_y ( 557562 ) * writes: <(moc.oohay) (ta) (spwerci)> on Wednesday April 09, 2014 @03:47PM (#46707603)

I have used Oracle, MySQL, and Mongo in prod situations. I have looked at Cassandra for evaluating it for potential usage in prod.

I can imagine situations where I could recommend any of the above. For example, if you are large financial company with billions of rows, I would go with Oracle. If you have smarts but not money and didn't need somebody to sue if something went wrong, then maybe Postgres would do . If I were a simple web based app with simple form submits, I would go with MySQL. If I had complex unpredictable data blobs and unpredictable needs to do certain types of queries against the data, I might recommend Mongo. If I have large amounts of data on which I want to do analytics I would use Cassandra.

Cassandra wins when you have a lot of data and not a lot of complex real time queries against it. It is especially good at scaling up on cheap data storage (think 100s of terabytes). It also has an unreal "write" throughput (important for certain types of analytics which write out complex intermediate results) though that is not relevant for the case described.

The problem generally with noSql solutions is that they increase the amount of storage to store the equivalent amount of information. You are essentially redundantly storing schema design with each "record" that you store. This really matters more than some might suspect, because when you can put an entire collection into memory, the read performance is much higher. You usually need 1/5th to 1/10th as much RAM to do the job with a traditional relational database (especially since MySQL and their brethren handle getting in and out memory better than mongo). This isn't so much the case for Cassandra because of its distributed storage nature, but it really isn't usable for real time transactions.

My recommendation, use a traditional database -- if in a Microsoft shop use SQL Server, otherwise I like postgres or mysql. If however, you have complex data storage needs that a noSql solution is perfect for, then I would go with that. If you are into back end analytics, copy the data as it comes in and put into a Cassandra (or one of its similar brethren) as well.

Share
twitter facebook
- Re: (Score:2)
  
  by beerbear ( 1289124 ) writes:
  
  I second this. Easy to set up, easy to use.
- Re: (Score:2)
  
  by Tanaka ( 37812 ) writes:
  
  I like Postgres, and I like MongoDB too. Both have their strengths. Best tool for the job I say.
  The great thing about MongoDB is you can install two or three servers in different datacenters, and have redundancy out of the box. It's really simple. And you can scale horizontally if you need to without any downtime.
  The last time I looked at Postgres, to do the same, you had to use third party solutions, and the client side drivers didn't support it. Is it any better now?
  - Re: (Score:2)
    
    by VortexCortex ( 1117377 ) writes:
    
    The great thing about MongoDB is you can install two or three servers in different datacenters, and have redundancy out of the box. It's really simple. And you can scale horizontally if you need to without any downtime.
    I've never had to use 3rd party solutions to implement horizontal scaling, replication, pooling, clustering, etc. with Postgresql. I have often had to demand changes of 3rd party vendor-lockin-ware, or add a kludge myself to fit a business's needs. RTFM application used to be far more common [catb.org], but seems to have fallen out of fashion of late as more programmers and DBAs are increasingly discovered not to be hackers. Did you know Postgresql supports NoSQL features via HStore and JSON?
    Much experience has sho
    - Re: (Score:2)
      
      by Assmasher ( 456699 ) writes:
      
      I would have to agree about PostgreSQL, it is surprisingly flexible and powerful. I've used it for small business systems and recently on a 'big data' (oh, that overused buzzword...) project (millions of devices reporting dozens of times per day) and it has been fantastic.
      Wish I'd gotten on the bandwagon 10 years ago.
      - Re: (Score:2)
        
        by WuphonsReach ( 684551 ) writes:
        
        Wish I'd gotten on the bandwagon 10 years ago.
        
        Mmm, 10 years ago you would have been using 7.3 or 7.4 [wikipedia.org]. Which was not all that fast unless heavily tuned. It wasn't until the 8.x series in 2006-2008 (roughly) where they started focusing a bit more on performance. These days it is quite powerful and a definite competitor to the high-end paid offerings.
        
        There was also the issue that 7.x was a PITA to run on top of a Microsoft Windows system. The 8.x and 9.x series run natively and integrate far better wi
- Re: (Score:2)
  
  by VortexCortex ( 1117377 ) writes:
  
  Seriously - JUST USE POSTGRES - there is virtually nothing that it can't do.
  Indeed. With its native JSON type and HStore Key/Value store it has NoSQL features. Given Postgresql's ability to cluster, pool, and replicate it also scales quite well. IMO, it doesn't make sense to abandon all relational DB features in a NoSQL only solution (especially right off the bat) when you can have both. Postresql may just be the droids you are looking for.
- Re: (Score:3)
  
  by grcumb ( 781340 ) writes:
  
  CouchBase/CouchDB is probably the easiest and most available one out there. It's particularly well suited for app backends too, as both the backend and mobile apps can talk to the same database, in theory eliminating the need for the backend to handle data syncing.
  Those are good reasons, and it's also true that CouchDB will use a lot less resource overhead than a full-bore RDBMS under load. Depending on the use case, it might also prove decidedly easier to scale.
  But the place where NoSQL really shines is storing amorphous or heterogeneous data. Because you have no constraints about what goes into a given record, you can record more or less name/value pairs at your whim. As with Perl, though, freedom comes at the cost of potential disorder.
  But honestly, with the tiny
- Re: (Score:2, Informative)
  
  by Anonymous Coward writes:
  NoSQL is a good solution for horizontal scaling, CSV and SQL DB are not.
  I'd like to dispute this. Based on the OP's description of his application, two things come to mind:
  
  His application is mostly-write-only. He probably does not need instant query ability, but may need to be able to handle a very large number of inserts per second (assuming he's justified in his assertion that he needs scalability). For this kind of application, logging your incoming data to a plain text file (or sequentially-appended binary data file, or any other write-only plain file appr

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Do you need a database? (Score:3, Insightful)

Re:Do you need a database? (Score:5, Funny)

Re:Do you need a database? (Score:5, Insightful)

Re: (Score:3, Informative)

Re: (Score:3)

Re: (Score:2)

Re:Do you need a database? (Score:4, Interesting)

Re: (Score:2, Informative)

Re:Do you need a database? (Score:5, Funny)

Re:Do you need a database? (Score:5, Insightful)

Re: (Score:2)

Re:Do you need a database? (Score:5, Insightful)

Re: (Score:3)

Re: (Score:2, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re:S3 better than files on disk (Score:3)

Re: (Score:2)

Re: (Score:3)

Re:Do you need a database? (Score:5, Informative)

Re:Do you need a database? (Score:5, Insightful)

Comment removed (Score:5, Insightful)

Re: (Score:2)

Re: (Score:2)

Can you separate data collection from reporting? (Score:2, Interesting)

Re: (Score:2)

Re: (Score:3)

Use PostgreSQL (Score:5, Informative)

Re:Use PostgreSQL (Score:5, Insightful)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Sounds like you need a database (Score:5, Insightful)

Re: (Score:2)

Please specify a better scenario (Score:3)

Re:Please specify a better scenario (Score:5, Insightful)

Re:Please specify a better scenario (Score:5, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

NoSQL? (Score:5, Insightful)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re:NoSQL? (Score:5, Insightful)

Re: (Score:2)

Re: (Score:2, Insightful)

Re:NoSQL? (Score:5, Interesting)

Re:NoSQL? (Score:5, Funny)

Re:NoSQL? (Score:4, Funny)

Re: (Score:2)

MongoDB (Score:2)

light (Score:4, Insightful)

Database Scaleability. (Score:5, Insightful)

Re:Database Scaleability. (Score:5, Insightful)

Re: (Score:3)

Re:Database Scaleability. (Score:5, Insightful)

Re: (Score:2)

Re:Database Scaleability. (Score:4, Informative)

Re:Database Scaleability. (Score:5, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: (Score:2)

MariaDB (Score:2, Insightful)

MongoDB obviously... (Score:2)

Re: (Score:2)

Short Intro (Score:5, Informative)

Re: (Score:2)

Re: (Score:3, Insightful)

Re: (Score:2)

Just Use SQL (Score:5, Insightful)

PostresSQL or Riak (Score:2)

hyperdex (Score:2)

Big mistake (Score:5, Insightful)