Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Cassandra and Voldemort Benchmarked

timothy posted more than 4 years ago | from the rifling-the-file-cabinet dept.

Databases 45

kreide33 writes "Key/Value storage systems are gaining in popularity, much because of features such as easy scalability and automatic replication. However, there are several to choose from and performance is an important deciding factor. This article compares the performance of two of the most well-known projects, Cassandra and Voldemort, using several different mixes of access types, and compares both throughput and latency."

cancel ×

45 comments

Sorry! There are no comments related to the filter you selected.

No Winner (5, Informative)

WrongSizeGlass (838941) | more than 4 years ago | (#32140228)

Their conclusion was that there was "no clear winner". Not surprising. Both of these products are in their early stages of development (Voldemort v0.80.1, Cassandra 0.6.0-beta3) and will certainly work on optimization and performance issues after the product is stable.

I'd like to have seen them run MySQL, PostgreSQL or SQLite through the same tests so we could see how these NoSQL solutions compared.

Re:No Winner (0)

Anonymous Coward | more than 4 years ago | (#32140338)

I don't know that comparing an RDBMS vs key-value storage is meaningful. Unless they fucked up, the kv will be faster on benchmarks, but once you spend three days reimplementing joins, sorts, merges, indexes, and all that stuff that an RDBMS give you for free, kv starts to fail it.

Digg is going to no-sql, for example. They released some of their mysql schema/code and it was poorly designed (bad indexing, manual joins, braindead queries). They chose to go with no-sql because they're clueless retards.

Re:No Winner (1)

ThePhilips (752041) | more than 4 years ago | (#32140502)

Digg is going to no-sql, for example. They released some of their mysql schema/code and it was poorly designed (bad indexing, manual joins, braindead queries). They chose to go with no-sql because they're clueless retards.

Lazy to rephrase, so here it goes straight from rfc1925:

Some things in life can never be fully appreciated nor understood unless experienced firsthand. Some things in networking can never be fully understood by someone who neither builds commercial networking equipment nor runs an operational network.

In general I'm very very cautious when criticizing production code. After all it works.

Re:No Winner (2, Informative)

Anonymous Coward | more than 4 years ago | (#32140678)

I wouldn't have mentioned it if it wasn't pure shit that. 1.5 seconds for a query that should be 3-4 disk blocks at max?

Re:No Winner (1)

mini me (132455) | more than 4 years ago | (#32145824)

The guys at digg fully admit that they could spend their days tuning MySQL to achieve the performance they need. What is important to realize is that it costs real money and time to perform that tuning. Time and money that could be better spent improving the user experience of the website.

Cassandra, on the other hand, performs optimally no matter what the developers throw at it, without the need to tune every last detail to squeeze every last bit of performance out of it. As the site grows, if the cluster is not able to handle the load, new (cheap) systems can be added in minutes.

Digg's move to Cassandra was a business decision to keep their operational costs low. The links you posted mention that a good DBA would solve all of digg's problems, which is probably true from a technology standpoint, but what they fail to mention is that a good DBA costs as much, if not more, than it costs them to run their entire Cassandra cluster.

So yes, with the right tuning MySQL would be capable of powering digg. They admit it themselves. It just did not make sense from a business point of view because keeping MySQL running well is much more expensive than keeping Cassandra running well.

Re:No Winner (2, Informative)

Hognoxious (631665) | more than 4 years ago | (#32140740)

Production code works ... until it doesn't.

I've seen a situation where half of the bugs reports in our system were down to one badly conceived and shittily implemented module. But when I suggested binning it and doing it again properly, the answer was "but it works!".

Re:No Winner (0)

Anonymous Coward | more than 4 years ago | (#32143208)

I've seen a situation where half of the bugs reports in our system were down to one badly conceived and shittily implemented module.

You have modules? Can I have your job?

Re:No Winner (1)

Arancaytar (966377) | more than 4 years ago | (#32140802)

> After all it works.

Since they're abandoning MySQL, apparently their schema didn't work so great...

Re:No Winner (1)

DragonWriter (970822) | more than 4 years ago | (#32175130)

I don't know that comparing an RDBMS vs key-value storage is meaningful.

Since they are alternative approaches to implementing a backend store for an information system, and the decision between key/value and relational technology is in many cases a bigger decisions with greater risk involved in making the wrong choice than the decision between particular key/value or particular relational options (since the conversion between different systems using the same basic information model is cheaper than the conversion between system using different information models), I think it would be more important to have comparisons between, e.g., key/value stores, SQL-based RDBMSs, non-SQL (e.g., D -- the SQL-alternative relational language family, not the Digital Mars programming language)-based RDBMSs than to have comparisons between different members of the same family.

Re:No Winner (0)

Anonymous Coward | more than 4 years ago | (#32140532)

Would not put "early stages of development" as reason. Cassandra, even with that low version numbers, is being used in production in some of the biggest players in internet already. And they are specifically interested in performance, if there are some clear optimizations to be done, they surely contributed patches with it.

Re:No Winner (1)

mrmeval (662166) | more than 4 years ago | (#32140932)

I read the brief descriptions of each system and if there is any text that is as cotton mouthed fuzzy and unclear outside of legaleze I've not seen it.

Re:No Winner (1)

greg1104 (461138) | more than 4 years ago | (#32142498)

I'd like to have seen them run MySQL, PostgreSQL or SQLite through the same tests so we could see how these NoSQL solutions compared.

That wouldn't have made any sense given the replication scheme used: "N=3 (replicas for each entry), R=2 (nodes to wait for on each read), W=2 (nodes to block for on each write)". It's hard to translate that into the sort of replication features available in the other databases you mentioned.

Also, these tests focused on individual put/get operations, where a standard database is going to get creamed no matter what. You'd need to include something that had a higher-level query component to it than that to even approach fair. To use a social networking example, the benchmark ran tested how long it might take for one of these databases to look up a piece of information about one of your friends. To get something where a comparison against a regular database would be more fair, you'd want queries like "get the list of all my friends" instead.

Re:No Winner (2, Informative)

inKubus (199753) | more than 4 years ago | (#32143044)

And what about memcached [memcached.org] ? It's a simple key/value object database. What about an "associative array", isn't that basically a key/value database? I don't see what the hype is about.

Re:No Winner (0)

Anonymous Coward | more than 4 years ago | (#32154458)

I'd like to have seen them run MySQL, PostgreSQL or SQLite through the same tests so we could see how these NoSQL solutions compared.

... and MUMPS.

Drat (1)

sys.stdout.write (1551563) | more than 4 years ago | (#32140294)

Did anyone else read this as comparing Cassandra from King's Quest and Voldemort from Harry Potter?

Re:Drat (1, Funny)

Anonymous Coward | more than 4 years ago | (#32140324)

You mean "He-Who-Must-Not-Be-Named"

He-Who-Must-Not-Be-Named (-1, Flamebait)

Anonymous Coward | more than 4 years ago | (#32140606)

No that's Obambi. Or at least it is when the results of his policies are reported.

Re:Drat (5, Funny)

WWWWolf (2428) | more than 4 years ago | (#32140358)

Did anyone else read this as comparing Cassandra from King's Quest and Voldemort from Harry Potter?

I was expecting something about Cassandra producing a bunch of warnings in log files that no one ever bothers to read, and Voldemort having various problems managing the child processes in the cluster (mostly being unable to kill or reap them).

Re:Drat (1)

Arancaytar (966377) | more than 4 years ago | (#32140810)

Wouldn't the problem be rather that Voldemort would keep killing child processes randomly?

Re:Drat (3, Funny)

deniable (76198) | more than 4 years ago | (#32142624)

I hear Voldemort has a really good replication strategy.

Re:Drat (1)

mysidia (191772) | more than 4 years ago | (#32145014)

Maybe so, but Cassandra is sexier, and Voldemort is just plain evil.

Re:Drat (1)

Rennt (582550) | more than 4 years ago | (#32144652)

I always figured Cassandra was a reference to Red Dwarf.

Re:Drat (0)

Anonymous Coward | more than 4 years ago | (#32147712)

I always figured Cassandra was a reference to Red Dwarf.

I know, Arnold, because I know the rest of this conversation.

Let there be light (0)

Anonymous Coward | more than 4 years ago | (#32140496)

How about a comparison to other similar systems that don't reinvent the wheel? Yawn, wake me up when we're done reinventing technology we've had for decades. How about vs a graph/object DB? How about going into more detail on how the performance changes with scale-outs and different topologies? Fail.

Re:Let there be light (1)

OeLeWaPpErKe (412765) | more than 4 years ago | (#32140618)

I kinda wonder why it's not possible to use these projects as backends for mysql and postgres. Seems to me that shouldn't be that hard an exercise.

Or even having these as mountable volumes.

Re:Let there be light (0)

Anonymous Coward | more than 4 years ago | (#32140754)

I think you just invented NTFS.

Re:Let there be light (1)

DragonWriter (970822) | more than 4 years ago | (#32176028)

I kinda wonder why it's not possible to use these projects as backends for mysql and postgres.

You could, but as soon as you try to implement the features of SQL that they lack on top of them you'll end up making them peform far worse than existing backends that are designed from the ground up to provide these features, so what would be the point?

Key compression... (1)

shic (309152) | more than 4 years ago | (#32140588)

Are there ANY open source key/value stores that support prefix compression?

Re:Key compression... (-1, Offtopic)

Anonymous Coward | more than 4 years ago | (#32140960)

yes your mama's ass.

Silly question (3, Interesting)

Hognoxious (631665) | more than 4 years ago | (#32140822)

Is a key/value system a database with just one table that has one key field and one non-key field?

Re:Silly question (1)

CrashNBrn (1143981) | more than 4 years ago | (#32141224)

AFAIK it's akin to a Mapping/Hash (array), ie:
mArray[name] := ({ "Crash" }) or
mArray[stats] := ({ ({ "STR", 10 )}, ({ "DEX", 12 }) })

They could also be multi-tiered mappings:
mPlayer[data][name], mPlayer[data][stats]

DGD and LPmuds have done mapping/arrays for ~20 years. The underlying DGD core is C++ and the interpreted language is like-C. The underlying core of most other LPMuds is C and interpreted like-C.

Mappings and Compiled (Data) Objects were extremely useful in DGD. Named arrays with decent access speed.

Re:Silly question (0)

Anonymous Coward | more than 4 years ago | (#32145556)

As an aging programmer, these double brackets are playing havoc on my poor old eyes. By the way, you have a bug, i.e. When you say...
({ ({ "STR", 10 )}, ({ "DEX", 12 }) }) ... it should read as ...
({ ({ "STR", 10 }), ({ "DEX", 12 }) })
which just goes to show how this syntax is over complex for what it needs to be. Which leads me to think, I wish *all* programmers could optimize all their thinking, so that when they define a language, they try to make life easier for other programmers. But then sadly I've found some programmers love over complexity. I don't know what the answer is. Perhaps we could just have the overly complex programmers dragged outside and shot, to effectively deselect them from the gene pool. Then over time, we would get more optimized solutions ;) ... It is after all, for the common good. ;)

(Disclaimer, for mods with no sense of humor, I was joking. Obviously I don't really want them shot. After all dealing with overly complex programmers seal clubbing style would be just as satisfying ;)

Re:Silly question (1)

legoburner (702695) | more than 4 years ago | (#32141256)

At the simplest level yes, but cassandra (for example) is more like a multi-dimensional hashmap. Eg; Key-Value where Value points to another Key-Value and so on, so you can reference values such as: SomeApp.Users[UserID][username]=bob The advantage of this is being able to sort by time, alpha, etc, and therefore handle sorted pagination from the key/value listings. The main advantage though is that you can literally just plug in more systems and have it scale horizontally without any extra work, unlike databases which need sharding, bigger machines, redevelopment, etc. once you hit the limits of basic clustering.

Re:Silly question (1)

inKubus (199753) | more than 4 years ago | (#32143172)

That sounds like a tree. Like LDAP for instance, who has been doing this with extremely high performance, with replication, etc. for decades ;) These are all solved problems, new copies of the same comes out every 10 years in a cycle, and all the new kids don't realize that it already existed, came to full maturity and was bought by IBM long ago.. IBM has a product that will solve everyone's problems if you'd just call them. But the kids like to go it alone, as if the problem of indexing a few million web pages is anything hard like say, the U.S. Census or the NASDAQ trading system. I have a system I'm working on that has 400,000 users on mysql and it was running on a single Pentium 4 processor with a single 7200 rpm SATA drive and ran fine. A lot of stuff was done with Perl and flat files to keep stuff responsive. I mean, in one core of a modern PC you have more processing than the entire IRS had in 1980, yet they still managed to do everyone's tax returns, year after year. As if your stupid "Social" interactions are critical to anything. As if anyone gives a fuck what you had for breakfast today. Jesus, stop caring so much and learn how to program instead of jumping on the fucking hype wagon of cargo-cult data storage and retrieval.

And yes, I did that in one paragraph on purpose.

Re:Silly question (1)

Hurricane78 (562437) | more than 4 years ago | (#32143858)

It’s not a tree. It’s a graph. A tree is a graph’s retarded incomplete brother.

I wish people would stop using trees, and use full ontologies instead. It only creates problems. In file systems. In OO class hierarchies, in categories and tags, etc.

Re:Silly question (1)

adamchou (993073) | more than 4 years ago | (#32142006)

When you say "database", I imagine you're referring to the traditional relational database. I've never used Cassandra or Voldemort but I have used memcachedb and tokyodb and the one major difference is that you can't select on ranges in a key/value system. You can't select all keys > 100 or keys 100 - 500, etc

Re:Silly question - Couchdb (1)

MrTrick (673182) | more than 4 years ago | (#32144286)

Try couchdb if you want to select ranges.

Its keys are stored in a heap, so selecting ranges of values is a core use case.
The view system also uses the same mechanism, so by having a cached view you can emit any key you like per record, and grab individual or ranges of values.

Nifty. :-)

Well, Cassandra does better. (2, Funny)

Chas (5144) | more than 4 years ago | (#32141560)

Until Voldy pulls that whole Avada Kedavra thing...

Re:Well, Cassandra does better. (1)

blair1q (305137) | more than 4 years ago | (#32142440)

fawkes.hogwarts.edu # su - voldemort
Password:
2010-05-08 16:08:45 have you hugged your death eater today? alias avada_kedavra
kill -9
2010-05-08 16:08:45 have you hugged your death eater today?

Oracle/MySQL - Voldemort (1)

caluml (551744) | more than 4 years ago | (#32141892)

A very large company I know is moving from Oracle/MySQL to Voldemort for certain parts of their system. The two they evaluated were Cassandra and Voldemort.

File system? (1)

Bromskloss (750445) | more than 4 years ago | (#32142376)

Key-value storage? That sounds like the ordinary file system to me.

Re:File system? (1)

Xeriar (456730) | more than 4 years ago | (#32142568)

Not a particularly useful use of the inode table. The filesystem is great for a few hundred or even a few thousand records, but when you're dealing with billions of records, that adds up to a lot of wasted space.

And the winner is... (0)

Anonymous Coward | more than 4 years ago | (#32142602)

You know who.

Comparison Against Established Systems? (1)

RAMMS+EIN (578166) | more than 4 years ago | (#32160772)

I am in a bit of a rush, so I can't netgrep for it myself right now, but I am curious how these new contenders stack up against more established key-value stores such as Berkeley DB and GDBM. Has anyone run the benchmarks?

Check for New Comments
Slashdot Login

Need an Account?

Forgot your password?