Beta

×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Cassandra 0.7 Can Pack 2 Billion Columns Into a Row

timothy posted more than 3 years ago | from the but-only-if-they're-really-thin dept.

Databases 235

angry tapir writes "The cadre of volunteer developers behind the Cassandra distributed database have released the latest version of their open source software, able to hold up to 2 billion columns per row. The newly installed Large Row Support feature of Cassandra version 0.7 allows the database to hold up to 2 billion columns per row. Previous versions had no set upper limit, though the maximum amount of material that could be held in a single row was approximately 2GB. This upper limit has been eliminated."

cancel ×

235 comments

Sorry! There are no comments related to the filter you selected.

Typical applications? (3, Interesting)

oldhack (1037484) | more than 3 years ago | (#34900840)

What sorta applications need so many columns? Curious.

Re:Typical applications? (5, Funny)

Brummund (447393) | more than 3 years ago | (#34900896)

Any application developed by one or more Visual Basic developers, given enough time.

Re:Typical applications? (1)

jrumney (197329) | more than 3 years ago | (#34901090)

Any application developed by one or more Visual Basic developers, given enough time.

How could that possibly be true, MS Access only supports 255 columns.

Re:Typical applications? (3, Funny)

RobertM1968 (951074) | more than 3 years ago | (#34901548)

Any application developed by one or more Visual Basic developers, given enough time.

How could that possibly be true, MS Access only supports 255 columns.

And now you understand why Cassandra is so important! :-)

Re:Typical applications? (2)

bieber (998013) | more than 3 years ago | (#34901134)

In all seriousness, I'm horrified to see the potential abuses people will come up with for this.

"Still using MySQL? Man, you need to check out Cassandra! MySQL kept clashing with my every-user-gets-their-own-column architecture..."

Re:Typical applications? (1)

RobertM1968 (951074) | more than 3 years ago | (#34901564)

In all seriousness, I'm horrified to see the potential abuses people will come up with for this.

"Still using MySQL? Man, you need to check out Cassandra! MySQL kept clashing with my every-user-gets-their-own-column architecture..."

Wow, that is sloppy. I give each of my users their own table.

Re:Typical applications? (1)

goombah99 (560566) | more than 3 years ago | (#34901724)

Who the hell cares. I mean whup tee doo. so someone has a larger address space . like wow. for all 12 people with such a bad design that they need 12 billion columns, I'm suite they already figured out how to do have Keyed indexes. why is this on slashdot?

Re:Typical applications? (2)

Musically_ut (1054312) | more than 3 years ago | (#34900914)

What sorta applications need so many columns? Curious.

From the article:

An open source database capable of holding such lengthy rows could be most useful to big data cloud computing projects and large-scale Web applications, the developers behind the Apache Software Foundation project assert.

So, basically, they don't know either but think (probably rightly so) that this a pretty cool feature. So cool that they made this the heading of their article.

Re:Typical applications? (1)

feedayeen (1322473) | more than 3 years ago | (#34900948)

Chinese and Indian census data.

Re:Typical applications? (1)

Malcolm Chan (15673) | more than 3 years ago | (#34901136)

I know that was meant as a joke, but such data would be stored in the rows of the table, not the columns of the individual rows.

Re:Typical applications? (1, Funny)

adonoman (624929) | more than 3 years ago | (#34901178)

No no, one column for each resident, plus a column for the row header. Each row holds one item of information: Name, address, etc...
That way, adding a new data point to keep track of is a simple as inserting a new row.

Re:Typical applications? (2)

Sarten-X (1102295) | more than 3 years ago | (#34901242)

I don't know if that was sarcastic or not, but given that Cassandra is column-oriented, that's pretty much right (not so much with the header, but metadata is likely). Use a column family for each region, and you can process statistics in small chunks without a ridiculously-overpowered server. Only the requested column families need to be loaded into memory for processing.

Re:Typical applications? (5, Interesting)

gratuitous_arp (1650741) | more than 3 years ago | (#34900970)

Apparently the extra columns can be used to the effect of doing "more" than store data. A link in the article explains how lots of extra columns can be useful for querying data (Casandra doesn't use SQL). http://maxgrinev.com/2010/07/12/do-you-really-need-sql-to-do-it-all-in-cassandra/ [maxgrinev.com]

So the primary reason for this doesn't seem to be that one's run-of-the-mill database needs more columns.

Re:Typical applications? (2, Funny)

RobertM1968 (951074) | more than 3 years ago | (#34901572)

Apparently the extra columns can be used to the effect of doing "more" than store data. A link in the article...

Not sure what that last word means....

Re:Typical applications? (1)

Anonymous Coward | more than 3 years ago | (#34901022)

I'm not sure, but I don't think that the summary said "hold up to 2 billion columns per row" quite enough. Having it in the headline and then repeating it in two consecutive sentences was a nice touch, but I really think they should have mentioned it a couple more times.

Re:Typical applications? (3, Interesting)

SQL Error (16383) | more than 3 years ago | (#34901026)

The main reason was that Cassandra prior to 0.7 didn't support secondary indexes. Your keys in a table ("columnfamily" in Cassandra-speak) were indexed, and the names of the columns in a row were indexed. And Cassandra is schemaless, so the columns in one row could be completely different to the columns in another.

So you'd use columns as sub-records to get the data structures you need.

With 0.7 and secondary indexes, that's going to be less important.

Re:Typical applications? (5, Funny)

jrumney (197329) | more than 3 years ago | (#34901064)

What sorta applications need so many columns?

Facebook needs one column for every privacy violation.

Re:Typical applications? (0)

adamofgreyskull (640712) | more than 3 years ago | (#34901068)

I'm as clueless as you, perhaps more so, but the only thing I can think of is maybe for large amounts of raw data for Bio-Informatics or from a sufficiently large experiment, e.g. particle collider?

There's no way a sufficiently normalised data model for any normal,everyday application would require even 2,000 columns. This belief is normally tested whenever I go to TheDailyWTF though...

Re:Typical applications? (1)

oldhack (1037484) | more than 3 years ago | (#34901128)

Bio-informatics... OK, I see now. Hook this sucka up with some massive Perl codebase cooked up by postdocs and grad students, and you've got yourself the mother of all "hammer time".

Oh Yeah.

Re:Typical applications? (1)

TooMuchToDo (882796) | more than 3 years ago | (#34901592)

Nope. We use flat files for storing collider data from the LHC.

Re:Typical applications? (1)

Daniel Dvorkin (106857) | more than 3 years ago | (#34901650)

Speaking as a bioinformatician who does a lot of DB work (the only one in the lab who has professional DBA experience ...) and I'll be the first to say that I can't see myself storing data this way. I'd be willing to be convinced, but as it stands, I don't see any use for this. IMO, YMMV, etc.

Re:Typical applications? (2)

Whiternoise (1408981) | more than 3 years ago | (#34901102)

I can only think of something where you might want to input something ridiculously large like an image (or similar matrix of information with millions of points) so you could perform statistical analysis on a per-pixel basis. The pixel example would be for an image, but if you wanted to store something like, say, some parameter at a grid point and you wanted to compare those parameters between a load of different grids. It seems a very laborious way of doing things, but maybe if each point is storing a lot of data, it's easier to have a database where you can run "SELECT row1000col2000 FROM Things" (where row1000col2000 contains a blob or something) and get a long list instead of comparing a load of arrays.

In the example of an image, you could feasibly run into hundreds of millions of columns (assuming you want to store your data in one table and not a table per comparison object and of course for some obscure reason you're storing each pixel in a field) with astronomical cameras.

Failing that, never underestimate government and/or military databases. Heck, even someone like Google could probably find a use for a 2 billion column table.

Re:Typical applications? (2)

NNKK (218503) | more than 3 years ago | (#34901270)

Cassandra doesn't have "tables", and Cassandra's rows and columns have nothing to do with the rows and columns you're used to in SQL databases. Until you understand this, you will continue to be confused.

The "name" of a column is an arbitrary key -- you could have a row with a bunch of columns named things like "Country", or "Username", but you could also have columns named "jsmith", "jdoe", "12345", "USA", "Canada", etc., and you don't have to pre-define the column names.

Re:Typical applications? (1)

Anonymous Coward | more than 3 years ago | (#34901346)

Cassandra doesn't have "tables", and Cassandra's rows and columns have nothing to do with the rows and columns you're used to in SQL databases. Until you understand this, you will continue to be confused.

Then for the love of Pete, don't call them "columns and rows". You'll just confuse the hell out of us.

Re:Typical applications? (0)

Anonymous Coward | more than 3 years ago | (#34901238)

Facebook developed it to store per-user inverted indexes for their mailbox/feeds/whatever, among other things. Key is a user, column is a term, value is a list of documents.

Re:Typical applications? (1)

g4b (956118) | more than 3 years ago | (#34901244)

my first applicable use would be to have one row saving a domain in the first column (or other fix data in a fixed number of additional columns) and change the db dimensions on the fly by adding multiple columns serially saving information like access time and ip adress.
that would mean i can search by row to get the domain and log accesses easy

i would just try that and look if this is speedier than having two tables saving by row.

also comes in handy to add a column for each new user and a row for each new thread and save the views there

all depends on how fast database changes are in a nosql db, but i think pretty fast.

Nobody read "Jurassic Park"? (1)

Ken Hall (40554) | more than 3 years ago | (#34901264)

As I recall, one of the tasks given to Nedry in the design of the computer systems was to devise a database capable of holding a couple of billion fields to handle the sequencing of DNA strands.

About time (-1)

LukeWebber (117950) | more than 3 years ago | (#34900842)

I'm so sick of having to have multiple columns in my databases. I absolutely must have the freedom to cram all of my data into one huge row.

Seriously though, WTF?

Re:About time (1)

MichaelKristopeit401 (1976824) | more than 3 years ago | (#34900946)

you'd still have to have multiple columns...

Seriously though, WTF?

Re:About time (1)

LukeWebber (117950) | more than 3 years ago | (#34901430)

My bad. I meant "multiple rows".

Yeah boy. (0)

Anonymous Coward | more than 3 years ago | (#34900848)

Now I don't need that 2NF! Just use one column per customer!!!

Only 2 billion? (1)

Jeremi (14640) | more than 3 years ago | (#34900850)

They should have gone with the uint32_t counter, then they could support up to 4 billion!

Re:Only 2 billion? (5, Funny)

Anonymous Coward | more than 3 years ago | (#34900966)

You work for Gillette, don't you.

Re:Only 2 billion? (4, Funny)

zach_the_lizard (1317619) | more than 3 years ago | (#34901620)

He doesn't, otherwise it'd be uint64_t and a lather strip!

If you have more than 30 columns (1, Insightful)

loufoque (1400831) | more than 3 years ago | (#34900852)

... then you're doing it wrong

Re:If you have more than 30 columns (0)

Anonymous Coward | more than 3 years ago | (#34900872)

I don't know if that's *guaranteed* to be true, but a good rule of thumb.

Actually, can probably use a much smaller number as a rule of thumb, even...

Re:If you have more than 30 columns (0)

Anonymous Coward | more than 3 years ago | (#34900916)

I don't know... I store names in my name table as a column for each letter.

Re:If you have more than 30 columns (5, Informative)

ogrisel (1168023) | more than 3 years ago | (#34901044)

Not with column store databases such as Cassandra, HBase and BigTable.

Re:If you have more than 30 columns (-1)

Anonymous Coward | more than 3 years ago | (#34901358)

True. You're "doing it wrong" just by using them in the first place.

Re:If you have more than 30 columns (1)

butlerm (3112) | more than 3 years ago | (#34901648)

Cassandra, HBase and BigTable aren't traditionally what is meant by the term href="http://en.wikipedia.org/wiki/Column-oriented_DBMS">column store database at all. Much closer to hybrid "repeating group" databases like Adabas [wikipedia.org] and Pick [wikipedia.org] .

True column store databases are almost unheard of for online transaction processing because they are optimized for streaming, unindexed data storage and subsequent column oriented analysis over large datasets with very low per row overhead. A bitmap index is the closest a traditional relational database comes to column storage, although at least two major relational databases have means of physically clustering related rows from different tables on the same page, which is more or less what Cassandra is described as doing here, except perhaps with more flexibility and more overhead to go along with it.

Re:If you have more than 30 columns (1)

Anonymous Coward | more than 3 years ago | (#34901060)

Just that you (and so many others) can't see a use case doesn't mean that there aren't any. I deal a lot with data from very lengthy questionnaires. There are usually several thousand columns, sometimes tens of thousands. I run into the column limits of conventional row based databases more often than not. That's why I tend to use fixed width text files rather than databases. Being able to easily convert this data into a very wide table that can be queried (be it SQL or otherwise) would definitely be useful.

I suppose careful redesign into a relational structure could reduce the number of columns, but there are new datasets every week, so there is no time for that. Also, customers are not used to relational data.

Re:If you have more than 30 columns (-1)

Anonymous Coward | more than 3 years ago | (#34901104)

No offence, but you're either trolling or you are too stupid to do your job properly. No offence.

Re:If you have more than 30 columns (1)

Mitchell314 (1576581) | more than 3 years ago | (#34901680)

Two words: normalization.

Re:If you have more than 30 columns (2)

mini me (132455) | more than 3 years ago | (#34901214)

If you are writing SQL, maybe. Cassandra is not a relational database.

Bah, this is silly. (1)

intellitech (1912116) | more than 3 years ago | (#34900854)

If this really matters at all, besides being slightly cool, it will just lead to more bad db design.

Re:Bah, this is silly. (1)

Musically_ut (1054312) | more than 3 years ago | (#34900956)

If this really matters at all, besides being slightly cool, it will just lead to more bad db design.

Of course not! They clearly state the importance of "creating so many columns that they are nearly unlimited" in the article:

The ability to create so many columns is valuable because it allows systems to create a nearly unlimited number of columns on the fly, Ellis explained in a follow-up e-mail.

So that's that.

Re:Bah, this is silly. (2)

Sarten-X (1102295) | more than 3 years ago | (#34901114)

on the fly

Like storing the contents of a web crawl. The row key is the URL, the column is the crawl timestamp, and the cell contains the page (or keywords). That's a column created on the fly. Another application off the top of my head is storing access logs, where each row is a date, each column is a person, and each cell contains a resource they accessed. Having two billion columns is hardly excessive (in theory) for a suitably-large application.

Cassandra, like BigTable and HBase, is not the same as a traditional RDBMS. It's also a column-oriented [wikipedia.org] DBMS. Since each group of columns is stored separately, there's no performance impact to having extra columns. Columns that aren't needed (like old crawls in the example above) simply aren't loaded into memory. What's bad design for an RDBMS is perfect for Cassandra or HBase.

Why? (3, Insightful)

Xoc-S (645831) | more than 3 years ago | (#34900870)

Only a completely de-normalized flat-file database would need anything like that number of columns. That would mean many duplicate pieces of information, and a complete maintenance nightmare. The only purpose I can see is to have views of existing normalized data for fast searching, but that would be read-only data.

This is a feature in need of an application and I can see very few applications.

Re:Why? (3, Funny)

Jeremi (14640) | more than 3 years ago | (#34900934)

This is a feature in need of an application and I can see very few applications.

I think you're right, but as long as we're adding features for the sake of having features... why limit the table to two dimensions? Perhaps the next version of Cassandra can support 3D-data-cubes, with each cell specified via a (row,column,level) triplet. And the version after that will allow hypercubes of data with any number of dimensions (up to 2 billion dimensions maximum, of course).

Re:Why? (0)

Anonymous Coward | more than 3 years ago | (#34901156)

Actually that wouldn't be an entirely bad way to handle versioning. With some sensible defaults, of course.

Re:Why? (1)

Sarten-X (1102295) | more than 3 years ago | (#34901196)

Disclaimer: I haven't used Cassandra personally, but I have used HBase which operates similarly.

Cassandra uses column families, which are groups of columns, and are individually selectable. If all families contain the same columns, you have 3D (family, column, row) storage! Now, with HBase, excessive column family creation and maintenance isn't the ideal route, but if you actually need 3D storage, it would work pretty decently.

Cassandra, BigTable, and HBase are designed for applications that need lots of rarely-accessed details, for relatively few rows. As an example, let's consider a forum. One row per thread, one column per post. A single query (usually in a manner not related to SQL) pulls out a single row, and all the columns associated with it. Since columns are created by the application at runtime, 2 billion isn't that excessive, but it is worth bragging about.

Re:Why? (2)

Daniel Dvorkin (106857) | more than 3 years ago | (#34901624)

As an example, let's consider a forum. One row per thread, one column per post.

Um, okay, but why would you set your database up that way in the first place? I really don't see the advantage of this over a more standard table table having columns for, say, forum ID, thread ID, poster ID, timestamp, and content.

Re:Why? (2)

Dynedain (141758) | more than 3 years ago | (#34901702)

In the example you just made, I can see that the benefit is that you don't need another layer (PHP, stored queries, etc) to stitch the thread back together. The data structure inherently "knows" how the thread of posts are assembled.

Re:Why? (2)

Sarten-X (1102295) | more than 3 years ago | (#34901716)

Why not, if you expect to have several billion posts?

The more important issue in this architecture decision would be scaling needs and abilities. How many billion rows can a typical RDBMS handle on a $20,000 budget? If that budget goes to $40,000, will that capacity double? With a column-oriented database, only the needed column families are loaded into memory. For this forum example, you could have a family for each month of operation. Old threads would then be entirely in old column families, so they would remain untouched on disk, never even read until they're needed. The lower memory use leads to less expensive servers (since storage is cheap now), and linear scaling.

If your application doesn't need ridiculously large storage, go ahead and use a plain old RDBMS, just so you can avoid learning a new skill set. You're probably not going to hit a meaningful limit. If you're looking at a huge amount of data, or a moderate amount of data with a lot of processing, newer technology may be a better choice.

Cassandra can also run with Hadoop's MapReduce framework. Taking the forum example further, a periodic job could process all the posts, updating another table (or set of columns) with a map of keywords to posts that contain them. Scanning one thread at a time, each node in the Hadoop cluster could compute the index in parallel, allowing the index creation to be separated from the load of making a post. Again, it's not a big deal for a small application, but when you're dealing with something scaling up to the size of Facebook, StumbleUpon, or Google, new tools with new designs just work better.

Re:Why? (2)

maraist (68387) | more than 3 years ago | (#34901542)

There are many problem-sets where you might like to perform associative mapping. If the keys and or values are large, you can easily hit the 2GB limit on a single primary key. Imagine if you felt that cassandra could help you in CPU node mappings.. Or weather patterns. The associations can be in the billions, and while you may or may not have a primary key for each main node, the association list may approach N. In traditional RDBMS, such large association mappings M:N tables, are impractical to traverse. An object oriented database might be a better fit, but the open-source ones I'm aware of aren't sufficient in horsepower. I'm not saying Cassandra fits the bill, but with TB sized total DBs this would be significantly faster than RDBMS with row-oriented storage (column store tables might do ok). And probably more to the point - the population of those large associations is what's going to kill traditional RDBMS M:N tables (or even proprietary blobs).

Re:Why? (0)

Anonymous Coward | more than 3 years ago | (#34901188)

Well that's exactly why databases like Cassandra exist, maybe you're sharper than you give yourself credit for being. Facebook developed it to hold an inverted index for each users mailbox/feed. The key is the user, the column a term, and the cell contains a list of documents.

Re:Why? (1)

NNKK (218503) | more than 3 years ago | (#34901276)

Only a completely de-normalized flat-file database would need anything like that number of columns. That would mean many duplicate pieces of information, and a complete maintenance nightmare. The only purpose I can see is to have views of existing normalized data for fast searching, but that would be read-only data.

This is a feature in need of an application and I can see very few applications.

Um, a very common answer to Cassandra questions is "denormalize". This is not an RDBMS, stop treating it like one.

Re:Why? (0)

Anonymous Coward | more than 3 years ago | (#34901382)

Way to miss the point.

You don't even have to read the FA to find out that it is not about the column numbers and that there was no limit before hand.

They created a 2 billion limit after removing a very limited 2GB~ limit on row size. Likely for closure/standards/testing reasons (system integrity is hard to prove on an infinite scale).

I can see many applications in the wild that require more than 2GB of data in a row.

It would be wise, if in future, when you can't see a plausible explanation; that you don't scream "GOD DID IT", and instead just presume you don't have enough information to form an intelligent opinion.

Re:Why? (1)

schnozzy (218978) | more than 3 years ago | (#34901502)

Failure to see applications is a failure of imagination. Also, Cassandra is highly optimized for extremely fast write rates at large scale, not read-only. Cassandra also has no SPOF, includes variable consistency guarantees, and many other great features. Specifically focusing on the fact that it can have 2 billion columns in a row glosses over a number of other things, but that hardly denormalized or non-transactional databases have few applications.

Re:Why? (1)

maraist (68387) | more than 3 years ago | (#34901508)

Have you reviewed the BigTable architecture? The central idea is to store what would normally be normalized joined data instead as in-line column-families. Within a column-family, you have related columns that are effectively your name-value pairs. Each name in the name-value pair is called a column (which in RDBMS it would more likely be a table with 3 columns, foreign-key, name, value - but with the tremendous innefficiency of having to do the join). All this effectively means is that prior to this version, Cassandra only supported a logical collection of name-value pairs that were less than 2Gig.. Now you're unlimited - or more correctly, I'm sure they're using a larger bit-value for some grouping thing.

Re:Why? (0)

Anonymous Coward | more than 3 years ago | (#34901614)

Depending on what you sacrifice in ACID you can get decent speed. For example facebook usually sacrifices C. You can from one refresh to another get 2 totally different pages. Even when no one updates anything that would be on your page. But it is not that big of a deal you miss for 10 mins that update of someone getting the high score in some game...

SQL statement from hell (0)

Anonymous Coward | more than 3 years ago | (#34900882)

What would the SQL statement look like if you wanted to select nearly all of those 2 billion columns except a few?

Re:SQL statement from hell (2)

Sarten-X (1102295) | more than 3 years ago | (#34901216)

Cassandra doesn't use SQL, and isn't even like a RDBMS in any way other than "it stores a table of data", so the SQL statement would be nonexistent.

Sorry... (0)

Nemyst (1383049) | more than 3 years ago | (#34900888)

I couldn't not link to this xkcd [xkcd.com] comic.

2 billion columns... (4, Funny)

aBaldrich (1692238) | more than 3 years ago | (#34900904)

ought to be enough for everybody

Re:2 billion columns... (1)

adamofgreyskull (640712) | more than 3 years ago | (#34900990)

Joke away but, going by some of the shit I've seen at TheDailyWTF [thedailywtf.com] , that could well come back and bite you in the ass one day.

Re:2 billion columns... (1)

WGFCrafty (1062506) | more than 3 years ago | (#34901510)

Woooooooooooooooosh. WOOOSH wooosh woosh. Four woosh's ought to be enough for you.

Re:2 billion columns... (0)

Anonymous Coward | more than 3 years ago | (#34901462)

According to wikipedia [wikipedia.org] , we need about 6.894 billion columns to have enough for everybody.

Awesome! (0)

Anonymous Coward | more than 3 years ago | (#34900938)

Now I can write that application I have been wanting to write forever. Just couldn't find a suitable database because none of them supported two billion columns per row. Oh happy day.

This is a triumph for hideously bad schema (4, Informative)

Sarusa (104047) | more than 3 years ago | (#34900940)

Well good on them for solving an interesting technical problem, but the use cases for this are all bad.

Obvious first use: boss will suggest we optimize the database by using only one gigantic row with two billion columns.

Re:This is a triumph for hideously bad schema (1)

teknopurge (199509) | more than 3 years ago | (#34901526)

Database? Psha - we only use Excel for our most critical data storage needs....

really... (1)

Bizzeh (851225) | more than 3 years ago | (#34900954)

...whats the point...

Thank goodness! (1)

wonkavader (605434) | more than 3 years ago | (#34900964)

Now I can finally shoe-horn my coworkers' Excel spreadsheets into a database.

for those that absolutely positively cannot RTFA (5, Informative)

Son of Byrne (1458629) | more than 3 years ago | (#34901012)

Cassandra appears to be a multi-dimensional datastore that does not store data in the same fashion as a typical RDBMS. It uses columns and rows both to store sets of data uniquely. If you're familiar with Big Table, then, apparently, its kinda like that.

That just means that they've added even more storage vectors to it than before...not sure why it made slashdot front page...

Re:for those that absolutely positively cannot RTF (0)

Anonymous Coward | more than 3 years ago | (#34901272)

Cassandra appears to be a multi-dimensional datastore that does not store data in the same fashion as a typical RDBMS. It uses columns and rows both to store sets of data uniquely. If you're familiar with Big Table, then, apparently, its kinda like that.

That just means that they've added even more storage vectors to it than before...not sure why it made slashdot front page...

ah yes.... this will but my linear algebra class to good use!

Re:for those that absolutely positively cannot RTF (1)

maraist (68387) | more than 3 years ago | (#34901568)

I wonder if it's possible to represent a non-cartesian basis vector-space with a DB. Maybe one of the columns is sinusoidally looped - haha,, every 32nd insert wraps around itself.. Oh this could be a cool MLK holiday project.

Re:for those that absolutely positively cannot RTF (1)

Dahamma (304068) | more than 3 years ago | (#34901602)

Not knocking Cassandra, but basically it means that this metric of "2 billion columns", being completely different from the concept of RDBMS columns, really doesn't mean much from a comparative point of view...

It's kinda like saying "that army of ants will conquer all nations, they have 2 billion soldiers!" :)

This upper limit has been eliminated (0)

Anonymous Coward | more than 3 years ago | (#34901030)

By establishing an upper limit on a formerly unlimited limit.

designer shoes online for less-cheap wholesale mal (-1, Offtopic)

cheapwholesalemall (1977716) | more than 3 years ago | (#34901038)

our designer shoes online always sell well over the Europe and America countries. you will find the newest collections in vast variety of colours and sizes at the best prices on our website. http://www.cheapwholesalemall.com/ [cheapwholesalemall.com]

Re:designer shoes online for less-cheap wholesale (0)

Relayman (1068986) | more than 3 years ago | (#34901078)

Wow, the first spam comment I have ever seen on /. And not one piece is authentic. I especially like how they made the security icons clickable but not the way they should be.

Cheap Wholesale Jerseys-Wholesale NFL Jerseys-Disc (-1, Offtopic)

cheapwholesalemall (1977716) | more than 3 years ago | (#34901072)

Cheap Wholesale Jerseys, discount NFL jerseys online: Pittsburgh Steelers, San Diego Charger NFL jerseys, Arizona Cardinals, Dallas Cowboys, Also wholesale nba jerseys, mlb jerseys NHL Jerseys open source, online shopping Now. http://www.wholesalecheapjerseys.com/ [wholesalec...erseys.com]

Cassandra (5, Funny)

tverbeek (457094) | more than 3 years ago | (#34901132)

I predict that bad things will come of this.

Not that anyone will believe me.

Re:Cassandra (1)

thewils (463314) | more than 3 years ago | (#34901160)

I believe you :) There's a subset of coders who don't see anything wrong with "Select *" all over the place and I have a feeling this construct might chew up available memory real quick if a table has anywhere near this number of columns...

Re:Cassandra (0)

NNKK (218503) | more than 3 years ago | (#34901280)

I believe you :) There's a subset of coders who don't see anything wrong with "Select *" all over the place and I have a feeling this construct might chew up available memory real quick if a table has anywhere near this number of columns...

What's table?

(Seriously, Cassandra doesn't have tables. It's not an RDBMS, and doesn't use SQL.)

figured it out (2)

Bizzeh (851225) | more than 3 years ago | (#34901184)

I know why the developers thought this would be a good idea. A feature this mental would be sure to get them free publicity on slashdot

Re:figured it out (2)

mini me (132455) | more than 3 years ago | (#34901306)

A column in Cassandra is sort of, if you have to make a comparison, like a join in SQL. Using Slashdot as an example, the topic would be the row, and each comment within that topic would be a column. Wanting to store more than 2GB of column data doesn't seem mental at all.

Whether or not it is worthy of the front page is another question.

Re:figured it out (1)

butlerm (3112) | more than 3 years ago | (#34901708)

Non-relational databases that do this have been around for decades. Adabas and Pick are the examples that come to mind. The pertinent difference here is that the developers of those databases were sane enough not to call repeating groups "columns".

WHat?! (1)

snizzle (1879832) | more than 3 years ago | (#34901236)

Welcome to the new online dating experience when we match you to someone else with up to 2 Billion traits!

2 billion columns? (1)

flimflammer (956759) | more than 3 years ago | (#34901252)

This sounds purely like marketing gibberish when you can't create enough meaningful features to boast about.

I can't even think of a reason why you would need 2 billion columns. If you did, I think the ability to store it is the least of your problems.

Re:2 billion columns? (1)

MrP- (45616) | more than 3 years ago | (#34901558)

I prefer to store my 2 billion records in 1 row with 2 billion columns, thank you very much!

Indexes (3, Informative)

Twillerror (536681) | more than 3 years ago | (#34901328)

Cassandra like many of the "no sql" type databases doesn't have classic indexes.

So instead of having an index you typically have a separate table that acts as the index.

Image you have a users table. One of the field is country. Now you want to know all the users for a particular country.

In standard RDMS type systems you just scan each row or have a index that has done that "ahead of time" or as rows are inserted.

In Cassandra the rows of users are distributed possibly among 100s of servers. So scanning for all users that have a particular country would require scanning all rows which could a long time.

Unlike RDMS like system rows don't have a 2d structure and don't have real limitation on the number of columns they can have. And columns can essentially be arrays\rows of objects.

So as you design/bang out your application you typically realize you need to know "users by country" for some stupid report. So you create a new table to hold these values. This has one row per country. As users are entered you append to this row. This essentially creates an array like structure. You then lookup the row for a particular country and you now know all the users for that particular country.

Sounds like Cassandra is getting rid of a limitation that could have caused very large index to require multiple rows.

Yes and the funniest thing about all this is (4, Insightful)

Giant Electronic Bra (1229876) | more than 3 years ago | (#34901606)

That we had all of this stuff 30 years ago. It was called 'network' databases, which were pretty much the standard sort of technology before RDBMS came along and everyone realized how incredibly much better relational algebra was for the vast majority of problems. As with many other things older ideas eventually resurface with new names and a few more features. There are times when this kind of facility is useful. Nothing wrong with it. The vast majority of cases though where I've seen people using something like Cassandra or Big Table were ill advised. A properly optimized RDBMS with correctly designed schema can handle all but a few edge cases. Most of the hype these tools are generating is based on a lack of real understanding of how to properly use databases combined with people believing myths about other technologies and helped along by the industry's short memory span. The best part though is that when something turns into a giant mess guys like me can make nice money fixing the mess. lol.

Introduction to Cassandra (1)

Fnord666 (889225) | more than 3 years ago | (#34901460)

Here [maxgrinev.com] is a link to to an introduction to the Cassandra database system. One thing to realize is that Cassandra is one of the new "noSQL" DBMS. These operate very differently than an RDBMS such as Oracle or DB2.

2 billion is too small... (1)

Anonymous Coward | more than 3 years ago | (#34901494)

That's less than one column per person!

Paging Microsoft (1)

Nom du Keyboard (633989) | more than 3 years ago | (#34901546)

Now if only Excel would follow.

SELECT * FROM TWO_BILLION_COLUMN_TABLE; (1)

timeOday (582209) | more than 3 years ago | (#34901550)

Man this is great! Now I only need one table and never have to JOIN again. Most of the rows won't use most of the columns but that's what NULL is for, am I right?

Finally! (2)

Compaqt (1758360) | more than 3 years ago | (#34901660)

I'm was having trouble making a table for my new Web 3.0 m-commerce application on lesser databases:

CREATE TABLE peeps(
peep1_first_name VARCHAR(255),
peep1_last_name VARCHAR(255),
peep1_address VARCHAR(255),
peep1_address2 VARCHAR(255),
peep1_address3 VARCHAR(255),
peep1_creditcard VARCHAR(255),
peep1_creditcard2 VARCHAR(255),
peep1_creditcard3 VARCHAR(255),
peep2_first_name VARCHAR(255),
peep2_last_name VARCHAR(255),
peep2_address VARCHAR(255),
peep2_address2 VARCHAR(255),
peep2_address3 VARCHAR(255),
peep2_creditcard VARCHAR(255),
peep2_creditcard2 VARCHAR(255),
peep2_creditcard3 VARCHAR(255), ...

509 Bandwidth Limit Exceeded

And Oracle supports EXABYTE sized databases (3, Interesting)

dirkdodgers (1642627) | more than 3 years ago | (#34901752)

So I can appreciate that this announcement sounds like News for Nerds, but can someone why it Matters that Cassandra can support 2 billion columns?

The article basically says "because you can't execute SQL you need lots of columns". OK, great, why would I want that? The article doesn't tell me. The Cassandra website sure doesn't tell me.

Oracle 11 supports up to 8 fucking EXABYTES of data in an RDBMS that I can execute SQL against. What Cassandra puts in columns, I put in rows.

I've scoured this thread like all the other ones on Cassandra for the killer feature, for the "you can do this with Cassandra that you can't do as well with an RDBMS" and I can't find it.

The best I can come up with is "I want to store lots of indexed data, I don't care about transactional integrity, and I don't want to pay Oracle". Is that it? That's fine if it's it, Oracle doesn't come cheap and that can be a deal breaker for new companies, but I just wish someone would spell out that this is the justification for Cassandra's existence.

Nice (0, Funny)

Anonymous Coward | more than 3 years ago | (#34901754)

Nice, but can it run Flash?

Load More Comments
Slashdot Login

Need an Account?

Forgot your password?
or Connect with...

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>