Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Database Bigwigs Lead Stealthy Open Source Startup

ScuttleMonkey posted more than 7 years ago | from the hope-it-isn't-vaporcorp dept.

187

BobB writes "Michael Stonebraker, who cooked up the Ingres and Postgres database management systems, is back with a stealthy startup called Vertica. And not just him, he has recruited former Oracle bigwigs Ray Lane and Jerry Held to give the company a boost before its software leaves beta testing. The promise — a Linux-based system that handles queries 100 times faster than traditional relational database management systems."

cancel ×

187 comments

Sorry! There are no comments related to the filter you selected.

Omg top 5 (-1, Offtopic)

tylerwylie (956331) | more than 7 years ago | (#18016550)

Am I first? I don't know what to say omg.

Re:Omg top 5 (3, Funny)

bob.appleyard (1030756) | more than 7 years ago | (#18017366)

You're 100 times faster than anyone else, obviously.

Partners (5, Informative)

stoolpigeon (454276) | more than 7 years ago | (#18016580)

The article mentions that redhat and hp are listed among their partners. i'm not surprised by red hat or informatica (another partner though they aren't mentioned in the article) but i was a little surprised by hp - since they have been trying to get the word out [hp.com] about their own data warehousing and bi stuff. i wonder what that indicates about how they regard this new player.
 
also interesting is the wikipedia article on Michael Stonebraker [wikipedia.org] if you aren't already familiar with him.

Re:Partners (4, Insightful)

AKAImBatman (238306) | more than 7 years ago | (#18016644)

i was a little surprised by hp - since they have been trying to get the word out about their own data warehousing and bi stuff.

It's called "hedging your bets". If the little company doesn't work out, no big deal. If it does, then HP is in a position to either benefit from contractual relations, acquire it, or squash it. Whichever happens to be their fancy.

Re:Partners (1)

stoolpigeon (454276) | more than 7 years ago | (#18017282)

based on stonebraker's history -- somebody acquiring it at some point seems a relatively safe bet.

Column oriented databases (2, Interesting)

Anonymous Coward | more than 7 years ago | (#18016582)

The article seems to describe the big advantage as being column oriented.

How does this differ than KX System's kdb (www.kx.com) which IIRC is similar in that way; and is alredy in use at many if not most major financial institutions (see their customer list)?

Re:Column oriented databases (4, Informative)

georgewilliamherbert (211790) | more than 7 years ago | (#18016808)

KX is primarily in-memory. The competing column-oriented product is primarily Sybase IQ, which has been on the market for a while now.

How does this compare to SQLite's column store? (0)

Anonymous Coward | more than 7 years ago | (#18018184)

SQLite has offered high speed column storage for at least the past year. What's so good about Vertica's offerring?

Re:Column oriented databases (1)

JimDaGeek (983925) | more than 7 years ago | (#18017936)

Well, kdb+ is proprietary and expensive. Maybe this product will be Open Source or at the very least kill kdb+ on price? There are not many real players in this market, the more the better IMO. What would be best is a competitive Open Source offering in this space. The Open Source product could steal away most of the market share, or at the very least, really drive down prices! :-)

Did it the right way? (1)

Kazrath (822492) | more than 7 years ago | (#18016598)

It appears this was made new from the ground up? I am so used to logical progression with specific technologies trying to squeeze out the last drop of performance with no real new innovation that this idea seems foreign. It is refreshing to actually see something that is potentially new. Hopefully it holds up to the quoted 100 times faster.

First - 100x faster (0)

Anonymous Coward | more than 7 years ago | (#18016602)

Wow- that was fast.

When Will This Be Ported? (4, Funny)

Anonymous Coward | more than 7 years ago | (#18016610)

The question is when will this be ported to a mainstream OS such as Windows?

Re:When Will This Be Ported? (1)

dfgchgfxrjtdhgh.jjhv (951946) | more than 7 years ago | (#18017184)

lol

Re:When Will This Be Ported? (2, Funny)

Mad Merlin (837387) | more than 7 years ago | (#18017748)

The question is when will this be ported to a mainstream OS such as Windows?

Where by mainstream, you mean useless?

Everyone, we are moving to ASP now (3, Funny)

varmittang (849469) | more than 7 years ago | (#18016616)

It was LAMP, now its LAVA. Much cooler name.

You're bound to get some strange looks... (5, Funny)

Anonymous Coward | more than 7 years ago | (#18016956)

during the transition when you tell people your business runs on LAVA-LAMP technology.

Re:Everyone, we are moving to ASP now (1)

SatanicPuppy (611928) | more than 7 years ago | (#18017318)

If only you could get A on LA with V...The OSS version would be LAVM(ono).

Bah, it's no use...This system is already doomed like Postgres because it has no cool acronym.

it's fast, but can it penetrate enemy airspace? (1)

President_Camacho (1063384) | more than 7 years ago | (#18016618)

Michael Stonebraker, who cooked up the Ingres and Postgres database management systems, is back with a stealthy startup called Vertica ... The promise -- a Linux-based system that handles queries 100 times faster than traditional relational database management systems.

Yeah, but what does its radar signature look like?

Re:it's fast, but can it penetrate enemy airspace? (2, Funny)

varmittang (849469) | more than 7 years ago | (#18016686)

V

Re:it's fast, but can it penetrate enemy airspace? (2, Funny)

Aqua_boy17 (962670) | more than 7 years ago | (#18016710)

Yeah, but what does its radar signature look like?
Probably, a flock of seagulls.

I apologize in advance (1)

President_Camacho (1063384) | more than 7 years ago | (#18018134)

Yeah, but what does its radar signature look like?
Probably, a flock of seagulls.

That would make sense for a remote application. When ran, they're ran so far away.

Re:it's fast, but can it penetrate enemy airspace? (1)

misleb (129952) | more than 7 years ago | (#18017008)

I think in this case "stealthy" means that nobody has really heard of them before and nobody seems to care.

-matthew

Re:it's fast, but can it penetrate enemy airspace? (1)

ShieldW0lf (601553) | more than 7 years ago | (#18017078)

They've got 23 million in funding... apparently someone cares...

Re:it's fast, but can it penetrate enemy airspace? (1)

misleb (129952) | more than 7 years ago | (#18017170)

But it is monopoly money!

Re:it's fast, but can it penetrate enemy airspace? (2, Funny)

Gospodin (547743) | more than 7 years ago | (#18017788)

Microsoft is backing them?

Re:it's fast, but can it penetrate enemy airspace? (1)

Bastard of Subhumani (827601) | more than 7 years ago | (#18017512)

Well thanks for informing us that it doesn't refer to the company's radar cross section or other measure of its radiative/reflective detectability. It's a miracle we survived so far under such a delusion.

Re:it's fast, but can it penetrate enemy airspace? (2, Funny)

eclectro (227083) | more than 7 years ago | (#18017072)

Yeah, but what does its radar signature look like?

It's not bad, but the new startup synergistica that I'm working on is gonna be completely invisible.

buzzword enabled (3, Insightful)

hey (83763) | more than 7 years ago | (#18016620)

"grid-enabled, column-oriented relational database management system"
What does that mean?
If anything.

Re:buzzword enabled (0)

Anonymous Coward | more than 7 years ago | (#18016784)

imagine a beowolf cluster of database servers that index every column. No, really.

Re:buzzword enabled (0)

Anonymous Coward | more than 7 years ago | (#18016894)

"grid-enabled, column-oriented relational database management system"
What does that mean?
Flat-text files on NFS shares :)

Re:buzzword enabled (1)

edittard (805475) | more than 7 years ago | (#18017720)

Flat-text files on NFS shares :)
Thingy and whatsisface, ia bar.

Re:buzzword enabled (5, Informative)

c0nst (655115) | more than 7 years ago | (#18017032)

Here you go:
Stonebraker, Mike; et al. (2005). C-Store: A Column-oriented DBMS [mit.edu] (PDF). Proceedings of the 31st VLDB Conference.
From the paper:
Among the many differences in its design are: storage of data by column rather than by row, careful coding and packing of objects into storage including main memory during query processing, storing an overlapping collection of columnoriented projections, rather than the current fare of tables and indexes, a non-traditional implementation of transactions which includes high availability and snapshot isolation for read-only transactions, and the extensive use of bitmap indexes to complement B-tree structures
:-)

Re:buzzword enabled (0)

Anonymous Coward | more than 7 years ago | (#18017416)

What does that mean?

It means the next internet bubble is on its way. woot!!

Re:buzzword enabled (5, Funny)

Jherek Carnelian (831679) | more than 7 years ago | (#18017472)

"grid-enabled, column-oriented relational database management system"
What does that mean?

Uh, a spreadsheet?

Re:buzzword enabled (5, Informative)

perfczar (1064296) | more than 7 years ago | (#18017616)

Buzzwords, yes, but they have a little bit of meaning left. Grid-enabled means that it works on a "shared nothing" environment, that you can use a networked cluster of commodity computers if one isn't enough to hold the data, and so on. This is in contrast to using one big huge box (big computer, big storage array, or whatever). Of course many databases are similarly grid-enabled. Column-oriented means that data is stored on disk by column, this makes it fast to process a subset of columns that touch lots of rows, as is typical in data warehouse applications. This is a key architectural difference among databases; Oracle, DB2, etc., are "row stores", while Sybase IQ, Vertica, etc. are "column stores". Note: I work for Vertica Systems

Re:buzzword enabled (4, Informative)

ChrisA90278 (905188) | more than 7 years ago | (#18017618)

Column oriented means it can read data in from one column from the disk without pulling in all the other bytes in the row. Possibly much less reduced I/O bandwidth usage depending on the query. (kind of like if you turned the normal file structure side ways.)

Grid enabled - This means the DBMS can make use of a large distributed group of computers and potentially have access to a huge amount of computing power. The typical DBMS runs on at beat a multi-processor server. Thi sis kind of like a DBMS server running a a "seti at home" type network.

Going solely by the developer's reputation, this could be a big deal. He is not some random hacker. He is a well known university professor who has several times in the past lead projects that have been revolutionary and turned the field around. His ideas are widely used Still "100X faster" is a big claim. Lots of smart people have been working on DMBSes for many years, a two order of magnitude improvement is a "I will have to see it to believe it" type claim

I'm using PostgreSQL to handle some telemetry data right now. If my 45 minute run times can be reduced to seconds, I'll be happy.

Re:buzzword enabled (0)

Anonymous Coward | more than 7 years ago | (#18017876)

Depending on what you're crunching and your other needs, and how often you run them, you could try mysql. Yeah yeah, I know the fanboy crap. Seriously though, I look after a system that has to run on oracle, postgresql (very tuned) and mysql. The reporting nature results over 10,000 grouping queries each time it's run. mysql takes less than a minute using InnoDB tables. The others crawl home closer to 20. If the data happens to be cached, the run time drops to less than 10 seconds, oracle and postgresql don't show much improvement, even on dedicated systems. InnoDB has a longer start up time than the old and crappy MyISAM, but it's 2-3x faster after the first run on our volume and tables. I haven't bothered with MAXDB or any other engines within mysql, it's way in the lead already. We've had two guys tuning postgresql, both fail to make any difference. The primary requirement is load it up and hammer the hell out of it.

If you want pure throughput, give it a whirl. It's been good enough for yahoo finance for the last 6 years or so.

Personally, I'm looking into faster alternatives now a nasty dataset mining project has come my way. SQL access to memory based engines is where I'd like to go...

Column oriented? (1)

JLavezzo (161308) | more than 7 years ago | (#18016624)

A column oriented relational database? I'd like some more details on how that works. I don't suppose it's just a regular SQL db with Excel's Pivot Tables run on it...

Seriously, though, the target market for grid-based high volume data-warehousing type dbs are a lot smaller than the MySQL crowd. Not as big a deal as it seems, but it'd be nice to have if you needed it.

Re:Column oriented? (2, Insightful)

stoolpigeon (454276) | more than 7 years ago | (#18016758)

smaller in number - but i'm willing to bet much more profitable and growing rapidly. we've been looking at data warehousing options and frankly most of them suck in one way or another. if someone can do it right - they can make a killing.

Re:Column oriented? (1)

MrAnnoyanceToYou (654053) | more than 7 years ago | (#18016776)

Depends. Reporting and data warehousing are pretty important; Business Objects / Crystal Reports / etc. all seem to be slower than they could. If you were to be able to throw in the rows as quickly as in MySQL or Postgres and then report on it with ten times the efficiency, you've got a decent demand in store for you. If, say, Google or Amazon could run with 1/10th the overall servers I have this feeling they would. Just a guess though. It's always possible a new approach to the old problems has resulted in real performance increase. I have my doubts, but it's a DBMS so noone REALLY gets validated either way for at least a couple years.

Re:Column oriented? (2, Interesting)

stoolpigeon (454276) | more than 7 years ago | (#18017344)

info week just ran an article on hp [informationweek.com] getting into data warehousing and bi that had this paragraph pretty early on: Until sitting down with InformationWeek recently, the company has been mum on the initiative--not so much as a peep from its normally talkative marketing team. Indeed, it's an unlikely move into a sector where IBM, Oracle, SAS Institute, and Teradata have years of experience, well regarded products, and loyal customers. Those four vendors--along with Microsoft, which has muscled in on the strength of its SQL Server database--hold about 85% of the $5.2 billion-a-year data warehousing software market, a sector IDC projects will grow 9.5% annually through 2010.
 
so you are right - there's a lot of opportunity there, even for a small player.
 
on a side note, i thought the opening paragraph described the current situation pretty well
  For more than a decade, big companies and sophisticated data aggregators have adopted data warehouses, yet few have mastered them, and many have outright failed in the effort or have been scared off by the complexity. The goal is to give workers access to real-time data across departments and geographic units, but more often than not, data warehouses end up as costly clunkers with outdated, inconsistent, and missing information.
 

Don't forget to put the cover sheet, shitcock (-1, Troll)

Anonymous Coward | more than 7 years ago | (#18017776)

"Reporting and data warehousing are pretty important"


Bullshit. What they are is very pretty very fucking GAY, just like you.

Re:Column oriented? (1)

truthsearch (249536) | more than 7 years ago | (#18016788)

A lot of web sites that started out with small MySQL databases are now using replication. It can be a tough transition if not accounted for in the original development of the site. But if those sites started out with something that's "grid-based" maybe it would be much easier to grow (maybe). I have the feeling the market may be bigger than many people realize, especially if they start with something free.

Re:Column oriented? (4, Informative)

AKAImBatman (238306) | more than 7 years ago | (#18016880)

A column oriented relational database? I'd like some more details on how that works.

http://en.wikipedia.org/wiki/Column-oriented_DBMS [wikipedia.org]

It's basically an optimization of the current data access patterns. Databases have been row-oriented for decades, because they evolved from fixed width flat files. Once we eliminated COBOL-style accesses to databases, the full row data became less important. It became far more important to be able to scan a column as fast as possible. For example:

select * from names where lastname LIKE '%son'

The above query might have an index available to find what it needs. But it's just as likely that the database will need to do a table-scan. Since table-scans involve looking through every record in the database, you can imagine that it would be faster to just load the lastname column rather than loading every row in the database just to discard 90% of that data.

Re:Column oriented? (0)

Anonymous Coward | more than 7 years ago | (#18018168)

We eliminated COBOL-style accesses to databases? Someone needs to tell my boss quick.

Re:Column oriented? (1)

prog99 (319739) | more than 7 years ago | (#18016884)

Fairly certain Sybase IQ server is column orientated.

Re:Column oriented? (5, Insightful)

georgewilliamherbert (211790) | more than 7 years ago | (#18016910)

A column oriented relational database? I'd like some more details on how that works.

Column oriented is easy. Imagine a database as a set of tables, each of which has rows of data records, in organized columns (column 1 = "User name", column 2 = "User ID", column 3 = "Favorite slashdot admin", etc).

Normal row-oriented databases store records which have a row of the data: "User name", "User ID", "Favorite slashdot admin" for user row #12345.

Column oriented databases store records which have a column of the data: "User name" for user rows 1-100,000; "User ID" for user rows 1-100,000; etc.

Updates are faster with row-oriented: you access the last record file and append something, or access an intermediate record file and update one "row" across.

Searches are faster with column-oriented: you access the record file for "Favorite slashdot admin" and look for entries which say "Phred", and then output the list of rows of data which match. Instead of going through the whole database top to bottom for the search, you just search on the one column. If you have 100 columns of data, then you look through 1/100th of the total data in the search. To pull data out, you then have to look at all the column files and index in the right number of records, but that goes relatively quickly.

Indexes are useful, but column-oriented is more efficient in some ways. You don't have to maintain the indexes, and can just automatically search any column without having indexed it, in a reasonably efficient manner.

Column-oriented also lets you compress the data on the fly efficiently: all the records are the same data type (string, integer, date, whatever) and lists of same data types compress well, and uncompress typically far faster than you can pull them off disk, so you can just automatically do it for all the data and save both speed and time...

Re:Column oriented? (2, Insightful)

flyingfsck (986395) | more than 7 years ago | (#18017168)

Yup, it is all about making the individual files smaller and more regular. Kinda the opposite of XML.

Re:Column oriented? (1, Informative)

Anonymous Coward | more than 7 years ago | (#18016978)

I don't suppose it's just a regular SQL db with Excel's Pivot Tables run on it...

Essentially it is - take each column and put it in a file, sequentially by row number. Queries are really easy (read record n out of each column-file) but inserts are rather difficult. Searches are quite efficient (you can jam a lot of data in a data block without all those other columns in the way) but updates aren't so much. Data compresses better because a column tends to be consistent in format and repetetive, so you can pack even more information in each data block (and search even faster, but make updating even slower). It's cool, as long as you don't change much data.

I can't find anything to suggest it, but I suspect this group has some tricks to make updates less painful, or maybe they're just shooting for the warehouse market. It'll never take over the OLTP market but they may find a niche.

Re:Column oriented? (1)

stoolpigeon (454276) | more than 7 years ago | (#18017130)

I can't find anything to suggest it, but I suspect this group has some tricks to make updates less painful,
 
or - they are doing it an environment where data gets in via etl (or this streams stuff) and you aren't doing updates -- you are doing bi and reporting to make management's widgets do all kinds of nice things on their dashboards.
 
i think they are targeting the data warehouse market - not the transactional or general purpose market.

Awesome (2, Interesting)

Fyre2012 (762907) | more than 7 years ago | (#18016658)

This is totally what we need.

With comodity hardware getting faster and cheaper by the minute, having a system that can handle a higher than average load with optimized software is, imho, a winner.

I'm sure everyone here can add some anecdotal evidence to how they had a heavy-hardware, database serving machine die on them because of some software bug.
This is one of the reasons I've been looking forward to ZFS. Hopefully the DB guru's will take the best of what's good about software, drop the legacy crap and really deliver something that's going to handle the kind of load that a good slashdotting delivers with hardware that didn't require a lease to be affordable.

Re:Awesome (1)

Grinin (1050028) | more than 7 years ago | (#18017006)

I couldn't agree more. The OpenSource community always comes up short when it comes to taking on the big corporate names. Ultimately the more choices the consumers have the lower the prices, the higher the standards, and thats what a mixed economy is all about.

open source? (1)

Anonymous Coward | more than 7 years ago | (#18016724)

how is this open-source?

Re:open source? (3, Informative)

perfczar (1064296) | more than 7 years ago | (#18017680)

Vertica is not open source. Not sure where the confusion came from.

Note: I work for Vertica.

Re:open source? (1)

Fyre2012 (762907) | more than 7 years ago | (#18017794)

I didn't mention that it was Open Source.

Which one of us is confused?
...or were you not actually replying to me?
Now i got myself all confused =\

Re:open source? (0)

Anonymous Coward | more than 7 years ago | (#18018110)

Vertica is not open source. Not sure where the confusion came from.

Note: I work for Vertica.


Perhaps the title of this slashdot article has something to do with it?:

"Database Bigwigs Lead Stealthy Open Source Startup"

I saw that title, read the summary and said to myself suurrreee... 100 times faster than existing rdms AND open source? Fat chance.

I'm glad you showed up to clarify that and confirm my suspicion. If I were you, I'd be asking slashdot to correct their inaccurate article title.

But does it save the children? (1)

StikyPad (445176) | more than 7 years ago | (#18016728)

The promise -- a Linux-based system that handles queries 100 times faster than traditional relational database management systems... ...using the power of oxygen!

Perfect timing (3, Interesting)

defile (1059) | more than 7 years ago | (#18016736)

Loading a million random records out of a set of one hundred million records is an enormously difficult task for an RDBMS on commodity hardware (e.g. magnetic rotating disks). This is a more common task than you would think. ORM systems backed by an RDBMS, such as Ruby on Rails, Django, Hibernate, have exactly this requirement and will only demand more as these models become more mainstream. Think about what search engines have to do: find millions among billions, all to show a user a dozen.

These problems are solvable now, but there's a lot of duplication of effort going on that a smart database vendor could solve for us.

Good..If it works (1)

Gomer79 (43434) | more than 7 years ago | (#18016744)

Without any benchmarks of any kind and a lack of data I remain skeptical but if it works this could be a huge breakthrough for the database management as data storage amounts continue to skyrocket. I am curious if it will be ported to Windows or other proprietary systems and if so what affect it will have on the speed claims. Because if the speed claims are true and it stays Linux I would think companies would have to consider moving to Linux to realize the speed gains.

Re:Good..If it works (1)

nuzak (959558) | more than 7 years ago | (#18017054)

It's not a breakthrough, it's simply a vertical database design, and it will accellerate SOME kinds of queries, and not do so well on others. It's great for the kind of data mining where you're going to vertically slice the data anyway, not so good for OLAP and decision support where you usually want the whole record at once. You replicate to one of these databases, you don't usually primarily enter data into it -- with trading data being one notable exception. Financial apps love using kx, which is blindingly fast and has a programming languages drawing from APL, including its awesome terseness. I'm told that kx doesn't do so hot once you need to hit the disk though.

Oracle is able to do this with vertical partitions too, though partitions are a rather large-grained thing, so I imagine there are some limits to doing it that way.

I think I've karma-whored enough for one post :p

Re:Good..If it works (0)

Anonymous Coward | more than 7 years ago | (#18017206)

I think I've karma-whored enough for one post :p

No, I appreciate the detailed explanations you and others have provided here! It makes wading through stuff like the GP's "It's fasterer because it runs on Lunix so everyone will have to use Lunix!" worthwhile.

Re:Good..If it works (1)

I!heartU (708807) | more than 7 years ago | (#18017816)

According to Wikipedia, Column Oriented is better for reads, meaning OLAP(data warehousing) slower on writes, so not so good on OLTP. The idea being since columns are stored in different files, and queries usually don't want every column, so you only have to look at some of the files and not all of them.

Re:Good..If it works (1)

nuzak (959558) | more than 7 years ago | (#18017918)

Yikes, I actually meant to say OLTP, since I contrasted it with data warehousing. I just choked on my alphabet soup a little. Thanks :)

it won't work 100 times faster - I'll take bets (0)

tota (139982) | more than 7 years ago | (#18017094)

You'll be lucky if it is 2 to 3 times faster (and even then, I'll believe it when I see it).

On the subject I've just published a new benchmark [devloop.org.uk] .
And the largest margin of all the tests that we ran is around 4 times in multi-threaded tests in favour of MySQL.


This is just marketing, nothing concrete to see - move along.

Re:it won't work 100 times faster - I'll take bets (1)

georgewilliamherbert (211790) | more than 7 years ago | (#18017158)

A) Is your benchmark a data warehouse type app benchmark or transactional? Column oriented is slower for transactional typically but much faster for data warehouse. I don't care how many frames per second you measure if I'm buying a LAMP web server system.

B) Your benchmark data doesn't show that you've tried to run Sybase IQ or C-store column-oriented databases against the workload.

Are you really sure that you want to be so sure about this, given that you may not be testing the right thing, and haven't tested the comparable things? 8-)

Re:it won't work 100 times faster - I'll take bets (2, Insightful)

Splab (574204) | more than 7 years ago | (#18017396)

Uhm... wtf?

Seriously, you tested MySQL vs. other databases with "out of the box" setups? MySQL isn't a real database when running MyIsam engine, you simply cannot compare that with anything else. And on top of that, try do a proper insertion in MySQL, one single transaction with a few millions of rows and see how well that does. Oh and did you ever stop to think about _why_ MySQL does perform so much faster on that test? Try doing it on a InnoDB table with standard setup, even at 600k rows it slows to a crawl. (Easily fixable, but requires some optimizations)

Seriously the reason why big vendors have a clause in their eula for people to NOT do benchmarks is exactly people like you, you have no idea about what you are comparing, just figured that setting up something out of the box will give a good insight into the speed. Sheesh.

Ohh and the 100 fold increase in speed is very much likely to happen - on certain types of queries. With horizontal representation you can do sequential scan only on the part of the data that you need, not the entire set, which should be very very fast.

you read 40 pages in under 4 mins - you're fast? (1)

tota (139982) | more than 7 years ago | (#18017562)

> people like you, you have no idea about what you are comparing, just figured that setting up something out of the box will give a good insight into the speed.

I guess you didn't read the first page, or the second?

As stated (multiple times), the purpose of this report is to compare various aspects with "out of the box" performance, with all the caveats that it implies.
And FYI I will be comparing MySQL InnoDB next time around.

> Ohh and the 100 fold increase in speed is very much likely to happen

> With horizontal representation you can do sequential scan only on the part of the data that you need..

A scan is still a scan is still a scan.
And even with horizontal representation you shouldn't be too far off indexed data access speed, so the "100 times" figure is still unrealistic.

But, heh, you don't know me, so keep talking...

Re:you read 40 pages in under 4 mins - you're fast (1)

Splab (574204) | more than 7 years ago | (#18017910)

Right, so you got a table with lets say, 5 million rows spanning perhaps 2 GB of data. Now if you want to find out how many of the rows contain a specific value, lets say foo_bar is below 50, with vertical representation you need to scan your entire dataset, that is 2 GB. With horizontal representation you only need to dig through 5 million entries, let's assume it's 32bit integers and you are down to 20MB of data. Of course you can fix a bit of this by using an index covering just that column, but on very large datasets it just isn't an option.

And about me not knowing you - I don't need to, a quick scan of you paper told me you had a lot to learn. If you already know your test is flawed why the hell do you keep it online?

You would lose (1)

acidmonth (767616) | more than 7 years ago | (#18017528)

I've worked with another of this type of system - Alterian. For query intensive work, as compared to well designed / indexed Oracle, it was easily 100s of times faster. Sure, LOADING sucks, but given the goal was occasional loads with lots and lots of queries, it worked, and worked well.

Re:Good..If it works (0)

Anonymous Coward | more than 7 years ago | (#18017128)

Stonebraker and co. recently presented an academic paper with some benchmarks in it. Here's the link: http://nms.csail.mit.edu/~stavros/pubs/osfa.pdf [mit.edu] See section 3, which starts on page 2.

Re:Good..If it works (1, Informative)

Anonymous Coward | more than 7 years ago | (#18017204)

Personally, I think the breakthrough for managing data warehousing volumes of data with real-time response is going to come from NitroSecurity's NitroEDB [nitrosecurity.com] . I saw a demo they gave running on a single commodity laptop which delivered query responses thousands of times faster than Oracle, on a data set with billions of records. They're working with MySQL [mysql.com] creating an interface to use NitroEDB as a storage engine as well.

Re:Good..If it works (0)

Anonymous Coward | more than 7 years ago | (#18018240)

Without any benchmarks of any kind and a lack of data I remain skeptical but if it works this could be a huge breakthrough for the database management as data storage amounts continue to skyrocket. I am curious if it will be ported to Windows or other proprietary systems and if so what affect it will have on the speed claims. Because if the speed claims are true and it stays Linux I would think companies would have to consider moving to Linux to realize the speed gains.


In my experience MySQL is considerably faster (~4x) on Windows than on the FreeBSD, QNX and Slackware installations I have tested it on.

I expect the motivation to port it to Windows will be strong considering the speed gains that would be realised.

Doesn't "stealthy" require some stealth anymore? (3, Insightful)

georgewilliamherbert (211790) | more than 7 years ago | (#18016774)

Vertica's website has had all the details about what they're doing for months. They've had a Wikipedia article for a long time.

This is some new Network World definition of "Stealthy", apparently...

Re:Doesn't "stealthy" require some stealth anymore (2, Funny)

drinkypoo (153816) | more than 7 years ago | (#18016846)

Vertica's website has had all the details about what they're doing for months. They've had a Wikipedia article for a long time. This is some new Network World definition of "Stealthy", apparently...

Network World is a trade rag. To them, anything not advertised is stealthy. Especially since they want to motivate people to think "oh no, I don't want to be stealthy, that means unknown! quick buy some advertising!"

Sounds great but.. (0)

randomErr (172078) | more than 7 years ago | (#18016796)

This sounds great but will it work with Windows applications? How proprietary is their system? Do they have a suitable set of signed ODBC drivers that will let my legacy applications talk to their system? Do they have .NET enabled database connectors so I can dump it into my project? How well has their DB been tested again chatty network environments like a mix of Windows and Mac's or weird routing? What are their DB management system like? Is it CLI or GUI?

I can claim my custom written DOS database system is 20X faster then anything on the market(which it is), but if it can't easily work in a Windows and/or Linux (which it can't) then it worthless as marketable product. (But you should see what it can do on a serial network.)

MOD PARENT UP! (1)

Dysfnctnl85 (690109) | more than 7 years ago | (#18016960)

It's hard for something like this to be relevant if it cannot interface with existing systems.

Re:Sounds great but.. (0)

Anonymous Coward | more than 7 years ago | (#18017424)

I can claim my custom written DOS database system is 20X faster then anything on the market(which it is)

I'm pretty sure they have more documentation to back up their claims than you do.

Re:Sounds great but.. (4, Informative)

perfczar (1064296) | more than 7 years ago | (#18017784)

The Vertica business model is to sell a database engine (software to store and query data). Clearly use of standard interfaces is important, otherwise nobody would be able to make use of the product (which really ends up being a component of a larger system or strategy) without going to a heap of trouble. So of course Vertica has:

  • A JDBC driver
  • An ODBC driver
  • An interactive SQL client
  • A growing list of tested integrations with other software

Note: I work for Vertica

Best of luck (5, Insightful)

140Mandak262Jamuna (970587) | more than 7 years ago | (#18016836)

I dont want to rain in their parade. But typically whenever people start with a spec like "100 times better than what they can do", they assume they will continue to perform at current levels while these people take years to develop and mature their new technology. In the real world, the traditional methods too improve and unless they can maintain a 100x lead continually the new technology flops.

What happened to Gallium Arsenide replacing silicon? What happened to solid state memory completely repalcing magnetic disks? Technology field is littered with such fiascos.

Re:Best of luck (1)

georgewilliamherbert (211790) | more than 7 years ago | (#18016932)

Sybase IQ already shows that class of speedups on lots of datasets. Proof of concept is out there...

Patent Problems (2)

IflyRC (956454) | more than 7 years ago | (#18016860)

Watch...they'll run into patent problems with patents held by Oracle, Sybase, and MS.

Re:Patent Problems (1)

kfg (145172) | more than 7 years ago | (#18018266)

...they'll run into patent problems with patents held by Oracle, Sybase, and MS.

The priciple patent base of RDBMSs is actually held by IBM, many of which have actually expired. In any case there is a known solution to most patent issues. We call it "money."

Linux is only free if your time is worthless

Windows TCO is only lower if your data is worthless.

KFG

open source? (1)

oohshiny (998054) | more than 7 years ago | (#18016862)

Where does it say that Vertica is going to be open source?

In any case, if people wonder how they get 100x speedups, it's probably related to Stonebraker's previous company called Streambase [streambase.com] .

never mind (1)

oohshiny (998054) | more than 7 years ago | (#18017134)

There wasn't much information on the web site, but everything is in Wikipedia (look under C-Store, the BSD-licensed open source version). It really is just a column-oriented database.

Why does a company promising Linux solutions... (2, Interesting)

WindBourne (631190) | more than 7 years ago | (#18016934)

Re:Why does a company promising Linux solutions... (2, Interesting)

Mad Merlin (837387) | more than 7 years ago | (#18017704)

Look again...

$ curl -I www.vertica.com
HTTP/1.1 200 OK
Date: Wed, 14 Feb 2007 23:00:26 GMT
Server: Apache/1.3.33 (Unix)
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Expires: Sun, 19 Nov 1978 05:00:00 GMT
Pragma: no-cache
X-Powered-By: PHP/4.4.4
Set-Cookie: PHPSESSID=488de093f5b89a78277a234e1e9886a6; expires=Sat, 10 Mar 2007 02:33:46 GMT; path=/
Last-Modified: Wed, 14 Feb 2007 23:00:26 GMT
Content-Type: text/html; charset=utf-8

Re:Why does a company promising Linux solutions... (0)

Anonymous Coward | more than 7 years ago | (#18018092)

Speculation (5, Informative)

cartman (18204) | more than 7 years ago | (#18016998)

I noticed that Stonebraker is the company founder. Stonebraker has contributed extensively to database research over the years.

He's known for advocating the "shared-nothing" approach to parallel databases. The shared-nothing approach means that nodes in the parallel database don't attempt memory or cache synchronization, and each node has its own commodity disk array. In a shared-nothing parallel database, the data is "partitioned" across servers. So, for example, rows with id's 1-10 would be on the first server, 11-20 on the second server, etc. Executing the SQL query "select * from table where id < 1000" would send requests to multiple commodity servers and then aggregate the results. The optimizer is modified to take into account network bandwidth and latency, etc.

My guess on what they're doing: they're working on a shared-nothing parallel RDBMS with an in-memory client similar to Oracle TimesTen.

The are a few drawbacks to the shared-nothing approach: 1) the RDBMS software is more difficult to implement; 2) since the data is partitioned, any transaction that updates tuples on more than one database node requires a two-phase distributed commit, which is much more expensive; and 3) some queries are more expensive because they require transmitting large amounts of data over the network rather than a memory bus, and in rare cases that network overhead cannot be eliminated by the optimizer.

The advantage, of course, is linear scalability by adding commodity hardware. No more need for $3M+ boxes.

Re: Shared-Nothing Architecture (0)

Anonymous Coward | more than 7 years ago | (#18018058)

Gee, I don't know anyone who's been succuessfully doing this for years [ibm.com] ... or getting crazy performance [ittoolbox.com] with partitioned databases, or anything...
/Caveat, I work for the folks who make this product... but nobody pays me for PR or anything

I've been waiting for something like this ... (2, Insightful)

Qbertino (265505) | more than 7 years ago | (#18017058)

... for a long time.
Classic RDBMSes are crutches. A forced-upon neccesitiy we have to put up with for our app models to latch on to real world hardware and it's limitations. A historically grown mess with an overhead so huge it's insane. With a Database PL and 30+ dialects of it from back in the days when we flew to the moon using a slide-ruler as primary means of calculation.
If what they claim is true, these guys are probably finally ditching the omnipresent redundant n-fold layers user and connection management in favour of a lean system that at last does away with the distinction of filesystem and database and data access layer. Imagine a persistance layer with no SQL, no extra user management, no extra connection layer, no filesystem under it and native object suport for any PL you wish to compile in.
I tell you, finally ditching classic RDBMSes is *long* overdue, they're basically all the same ancient pile of rubble, from MySQL up to Oracle. If these guys are up to taking on this deed (or part of it) and they get finished when solid-state finally relieves our current super-slowpoking spinning metal disks on a broad scale we'll feel like being in heaven compared to the shit we still have to put up with today.
I wish these guys all the best. They appear to have the skills to do it and the authority to emphasise that todays RDBMSes and their underlying concepts are a relic of the past.
My 2 cents.

Doesn't sound like a big deal (0)

Anonymous Coward | more than 7 years ago | (#18017306)

As relational databases aren't known for their astonishing efficiency, I wouldn't hold my breath if they are comparing their vaporware to rdbms performance.

Given that... (4, Informative)

CodeShark (17400) | more than 7 years ago | (#18017312)

MonetDb, [monetdb.cwi.nl] is similarly configured as a column oriented AND Open source, and appears to clean the clock of most of the major commercial and Open Source databases for huge data set queries, (see the benchmarks at axyana.com [axyana.com] for an example), where is Vertica's market advantage supposed to be?


By which I am asking that while Vertica is obviously well-researched and well funded as a start up, MonetDB is well-researched, already benchmarked and available now.. So why would I wait to invest my time, energy, and $$ in a proprietary future product rather than the time and energy, etc. to develop market leadership in my chosen corporate area in the present?

Re:Given that... (5, Informative)

perfczar (1064296) | more than 7 years ago | (#18018116)

Here are a few of the technical reasons one might choose Vertica over Monet; I'll not get into business issues.


Vertica is designed for large amounts of data, and is optimized for disk based systems. Monet does benchmarks against TPC-H Scale Factor 5 (30 million records, an amount which would fit in main memory) running on Postgres; Vertica does TPC-H Scale factor 1000 (6 billion records) against commercial row stores tuned by people who do such work to make a living.

Vertica runs on multi-node clusters, allowing the cluster to grow as the amount of data grows, while Monet doesn't scale to multiple machines.

There are numerous differences in the transaction systems, update architecure, tolerance of hardware failure, and so on, that make Vertica better suited to the enterprise DW market.


Note: I work for Vertica

No scoop. PR being sneaked by Vertica! (0)

Anonymous Coward | more than 7 years ago | (#18017718)

This is the second time that Vertica has managed to sneak into Slashdot on non-news. the previous time based on some shallow whitepaper. What vertica is doing has been done by multiple other companies (kx kdb, Sybase IQ). It is bound to run against patents held by them. I also have to wonder about Vertica and the company Stonebraker keeps there. These fluffy submissions are not the mark of a company which has pride in its originality or intellectual heft.

Re:No scoop. PR being sneaked by Vertica! (1)

georgewilliamherbert (211790) | more than 7 years ago | (#18018132)

It takes balls to say things like that about Michael Stonebraker in the database field... ...and lack of brains or historical clue...

Google uses this approach (3, Informative)

russryan (981552) | more than 7 years ago | (#18017774)

See http://en.wikipedia.org/wiki/Bigtable [wikipedia.org] for a description of Google's column oriented database.

More Scalability (1)

Doc Ruby (173196) | more than 7 years ago | (#18017964)

How about a database with the exact same query API (not just "but it's all SQL") as, say, Oracle or MS-SQL, or even Postgres, that allows any number of parallel query servers to work against a single datastore?

In other words, instead of yet another incompatible database, how about one that we could just switch to from an existing one, that is arbitrarily scalable against shared data. If you're going to get clever and act like you can solve hard problems, why not give people what we need, and not just what you think you can give us?

sybase IQ? (0)

Anonymous Coward | more than 7 years ago | (#18018022)

Wait how is this different than Sybase IQ server?

This is a commercial version of MIT C-Store (4, Informative)

ramakant (256472) | more than 7 years ago | (#18018052)

This looks like it will be a commercial version of the Michael Stonebraker and MIT developed C-Store column-oriented:
- Web site: http://db.lcs.mit.edu/projects/cstore/ [mit.edu]
- Wikipedia Entry: http://en.wikipedia.org/wiki/C-Store [wikipedia.org]
They distribute the source with a fairly liberal license, so this looks like something the open source community could pick up and run with.
Load More Comments
Slashdot Login

Need an Account?

Forgot your password?

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>