Beta

Slashdot: News for Nerds

×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Web Analytics Databases Get Even Larger

CmdrTaco posted more than 5 years ago | from the who-watches-the-oh-never-mind dept.

Databases 62

CurtMonash writes "Web analytics databases are getting even larger. eBay now has a 6 1/2 petabyte warehouse running on Greenplum — user data — to go with its more established 2 1/2 petabyte Teradata system. Between the two databases, the metrics are enormous — 17 trillion rows, 150 billion new rows per day, millions of queries per day, and so on. Meanwhile, Facebook has 2 1/2 petabytes managed by Hadoop, not running on a conventional DBMS at all, Yahoo has over a petabyte (on a homegrown system), and Fox/MySpace has two different multi-hundred terabyte systems (Greenplum and Aster Data nCluster). eBay and Fox are the two Greenplum customers I wrote in about last August, when they both seemed to be headed to the petabyte range in a hurry. These are basically all web log/clickstream databases, except that network event data is even more voluminous than the pure clickstream stuff."

cancel ×

62 comments

Web Analytics Databases Get Every Larger? (1, Redundant)

eldavojohn (898314) | more than 5 years ago | (#27771261)

"Web analytics databases are getting every larger. eBay now has a 6 1/2 petabyte ...

Um, was there a major development in the English language while I was sleeping last night?

Re:Web Analytics Databases Get Every Larger? (3, Funny)

jez9999 (618189) | more than 5 years ago | (#27771333)

Yesy. It mighty take a whiley to get used to, but I thinky it's quite a plusy overall.

Re:Web Analytics Databases Get Every Larger? (1)

koutbo6 (1134545) | more than 5 years ago | (#27771713)

Yesy. It mighty take a whiley to get used to, but I thinky it's quite a plusy overally.

I sure hopey that people checky their grammary more ofteny in the future.
There, fixedy that for you.

Re:Web Analytics Databases Get Every Larger? (1)

ivucica (1001089) | more than 5 years ago | (#27776593)

yay

Re:Web Analytics Databases Get Every Larger? (0)

Anonymous Coward | more than 5 years ago | (#27778615)

meeesa jar jar binks!

Re:Web Analytics Databases Get Every Larger? (0, Offtopic)

Rosco P. Coltrane (209368) | more than 5 years ago | (#27771357)

Actual, yes there was. It's a very subtly new rule on the properly use of adverbs and adjectives.

"Every larger"? (1, Redundant)

dtmos (447842) | more than 5 years ago | (#27771279)

What's "every larger"? Can I get one, too?

Re:"Every larger"? (-1, Offtopic)

Anonymous Coward | more than 5 years ago | (#27771415)

I'd prefer "every lager" - but then again, I may be a bit strange.

Re:"Every larger"? (-1, Offtopic)

Maclir (33773) | more than 5 years ago | (#27771533)

No - I want "every lager".

Re:"Every larger"? (-1, Offtopic)

value_added (719364) | more than 5 years ago | (#27771565)

What's "every larger"?

The sum total of "each larger"?

Seriously, kids, if we're going to call a typo ("every" instead of "ever") a grammatical error, I'd suggest a critique of the rest of the submission is in order. Bonus points for finding real errors. And the usual mod points for everyone else contributing hand-wavy cliches like "language evolves" to justify things.

Me, I'm still choking on the "enormous metrics" construct. I wonder how well complimenting a woman on the size of her metrics would go over.

Re:"Every larger"? (1)

thePowerOfGrayskull (905905) | more than 5 years ago | (#27774997)

I love that some poor sot used up all of his mod points except one to mod down this entire conversation.

Wait... one left. Oh, shit...

from the who-edits-the-oh-never-mind dept. (0, Redundant)

Speare (84249) | more than 5 years ago | (#27771293)

Databases "get every larger"? WTF? Maybe it's a second language for the poster, but as far as I know, CmdrTaco is a plain white-bread murriken who has had a couple of decades to practice the language.

Re:from the who-edits-the-oh-never-mind dept. (-1, Offtopic)

Rosco P. Coltrane (209368) | more than 5 years ago | (#27771429)

CmdrTaco is a plain white-bread murriken who has had a couple of decades to practice the language.

So are you saying he was mute (or spoke native Klingon) until 13 years of age? No wonder he has hard times with english today...

Re:from the who-edits-the-oh-never-mind dept. (0, Offtopic)

fbjon (692006) | more than 5 years ago | (#27771455)

CmdrTaco is a plain white-bread murriken

It's a little known fact that he's actually multi-grain.

The good news... (5, Funny)

Yoozer (1055188) | more than 5 years ago | (#27771323)

At least these won't get out in the open that easily because someone copied them to an USB drive and lost it somewhere.

Re:The good news... (2, Funny)

Jurily (900488) | more than 5 years ago | (#27771665)

At least these won't get out in the open that easily because someone copied them to an USB drive and lost it somewhere.

Imagine a Beowulf cluste- OW! OW!

Re:The good news... (1)

jollyreaper (513215) | more than 5 years ago | (#27771759)

At least these won't get out in the open that easily because someone copied them to an USB drive and lost it somewhere.

No, that's what firewall holes are for.

Re:The good news... (0)

Anonymous Coward | more than 5 years ago | (#27776445)

Just wait until SanDisk or Kingston releases a 16 peta usb drives...

Every Larger? (-1, Redundant)

rivendahl (220389) | more than 5 years ago | (#27771327)

Really?

Looks like grammar is getting every worse... (1, Insightful)

ActusReus (1162583) | more than 5 years ago | (#27771331)

For shame, Taco...

Re:Looks like grammar is getting every worse... (-1, Redundant)

Anonymous Coward | more than 5 years ago | (#27771571)

I don't see the problem. Why don't they just move onto ale if they've already got every lager?

Sure, they get every larger... (2, Funny)

lunchlady55 (471982) | more than 5 years ago | (#27771431)

...but do they move every zig?

Re:Sure, they get every larger... (1)

ivucica (1001089) | more than 5 years ago | (#27776625)

they no have to!!!!1 they have chance to survive, they make their time!!!1

FAIL (0, Offtopic)

ZwJGR (1014973) | more than 5 years ago | (#27771465)

Slow news day alert!

The topic in question is blindingly obvious to anyone who has heard of this newfangled "Internet" thing, and frankly is not worth an article in the first place.
Furthermore, such a blatant error in the headline and summary is simply ridiculous. Do the submitters or editors not reread text prior to submission? This is sloppy /. reporting at it's finest... For shame.

Re:FAIL (0, Offtopic)

hoover (3292) | more than 5 years ago | (#27773549)

OTOH, you fail at "its" and "it's". ;-)

All your Gramar checker are belong to us! (-1, Offtopic)

edwardd (127355) | more than 5 years ago | (#27771475)

Take off every 'ZIG'!!

Seriously, did anyone read this before posting?

Analytic DBs are also not cheap (-1, Offtopic)

Anonymous Coward | more than 5 years ago | (#27771485)

Teradata is getting a run for it's money from Vertica - started by the guy who wrote PostgreSQL.

I accidentally the every larger database... (1, Offtopic)

magic_fyodor (1453365) | more than 5 years ago | (#27771493)

is this ok?

Re:I accidentally the every larger database... (1)

LordKane (582228) | more than 5 years ago | (#27771735)

Not the every larger database!

Re:I accidentally the every larger database... (1)

ZERO1ZERO (948669) | more than 5 years ago | (#27779983)

What is the origin of this 'I accidentally the ....' ? I see it around a lot.

Re:I accidentally the every larger database... (1)

f()rK()_Bomb (612162) | more than 5 years ago | (#27799279)

it started on 4chan. http://encyclopediadramatica.com/I_accidentally_X [encycloped...matica.com]

Re:I accidentally the every larger database... (1)

ZERO1ZERO (948669) | more than 5 years ago | (#27799685)

sweet, dude. thanks.

Hmmmm (-1, Offtopic)

Zouden (232738) | more than 5 years ago | (#27771497)

Slashdot Editors Get Every Lazier.

Every breath you take... (-1, Offtopic)

Anonymous Coward | more than 5 years ago | (#27771567)

Every search you take
Every click you make
Every connection you break
Every ad you take
I'll be logging you

By the cyber-police, of course.

Can I get a copy? (0)

Anonymous Coward | more than 5 years ago | (#27771627)

If I asked them nicely, can they copy it to a floppy and send it to me?

Hopefully it'll compress nicely.

Another win for PostgreSQL... (3, Insightful)

tcopeland (32225) | more than 5 years ago | (#27771769)

...since that's that database on which Greenplum is based. PostgreSQL 8.4 is coming out soon and looks like it's got a lot of improvements [postgresql.org] . Too bad replication didn't make it in... hopefully in 8.5.

One of the improvements that looks good is the parallelized restore; RubyForge's upgrade from PostgreSQL 8.2 to 8.3 [blogs.com] took 30 minutes to restore the db and it seems like this feature will speed that up considerably.

Re:Another win for PostgreSQL... (1)

koutbo6 (1134545) | more than 5 years ago | (#27771779)

fp after first on topic post

Recursive queries too (3, Interesting)

coryking (104614) | more than 5 years ago | (#27771865)

These little puppies [postgresql.org] , i.e. recursive queries, look pretty cool too. Sounds like a good tool for threaded comment systems or finding related items in a table:


Recursive queries are typically used to deal with hierarchical or tree-structured data. A useful example is this query to find all the direct and indirect sub-parts of a product, given only a table that shows immediate inclusions:

WITH RECURSIVE included_parts(sub_part, part, quantity) AS (
        SELECT sub_part, part, quantity FROM parts WHERE part = 'our_product'
    UNION ALL
        SELECT p.sub_part, p.part, p.quantity
        FROM included_parts pr, parts p
        WHERE p.part = pr.sub_part
    )
SELECT sub_part, SUM(quantity) as total_quantity
FROM included_parts
GROUP BY sub_part

... It will take a while to wrap my brain around this new concept though. That doesn't look like a normal query I'm used to reading!

They'll get replication some day soon. But there is a lot of cool, very useful stuff with every new release. I usually feel like kid in a candy store wondering what's new that I can exploit.

Re:Another win for PostgreSQL... (2, Interesting)

TooMuchToDo (882796) | more than 5 years ago | (#27772363)

I have to say, I love postgresql. We use it to store hundreds of gigabytes of metadata for our 17 petabyte disk/tape storage system at my day gig.

Re:Another win for PostgreSQL... (1)

InsurrctionConsltant (1305287) | more than 5 years ago | (#27772435)

17 petabyte! Good grief, who are you working for?!

Re:Another win for PostgreSQL... (1)

TooMuchToDo (882796) | more than 5 years ago | (#27772525)

Someplace in the US handling data from the Large Hadron Collider =)

Re:Another win for PostgreSQL... (2, Funny)

pfleming (683342) | more than 5 years ago | (#27774591)

Does the black hole effect help with compression?

Re:Another win for PostgreSQL... (1)

ivucica (1001089) | more than 5 years ago | (#27776875)

CREATE TABLE lhc_data (i INT, c CHAR(10)) ENGINE = BLACKHOLE;
INSERT INTO lhc_data(1,"whoosh");

Oops, wrong DBMS.

Re:Another win for PostgreSQL... (2, Informative)

greg1104 (461138) | more than 5 years ago | (#27773327)

And Aster nCluster is PostgreSQL based [intelligen...rprise.com] . Yahoo's "homegrown system" also started with PostgreSQL [toolbox.com] .

The list of offtpic posts (0)

Anonymous Coward | more than 5 years ago | (#27771867)

get every larger

Help with the Fractions (0, Offtopic)

bigdaisy (30400) | more than 5 years ago | (#27771979)

2/12 can be expressed more simply as 1/6.

Re:Help with the Fractions (1)

bigdaisy (30400) | more than 5 years ago | (#27827969)

Sure, it's "-1: Offtopic" now that they fixed the article summary. Before that it was "+1: Special Needs Assistance".

They are watching YOU (0, Offtopic)

aggles (775392) | more than 5 years ago | (#27772003)

If you have ever touched one of their Web sites and caught their cookie, your tracks can be followed into unexpected places. This data is a gold mine for them, if they can figure out how to sell it without pissing off users with how much they know.

2/12? (1)

N3Roaster (888781) | more than 5 years ago | (#27772839)

2/12? Most people would just write that as 1/6, but I guess that doesn't sound as impressive?

Re:2/12? (1)

Exawatt (1463719) | more than 5 years ago | (#27773299)

I was wondering what that number was supposed to be. Perhaps 2 1/2? This is why I prefer decimal points.

Re:2/12? (0)

Anonymous Coward | more than 5 years ago | (#27774715)

Read the article linked for Facebook. In it they say 2 1/2 is the amount of data being stored.

Google? (2, Interesting)

wiedzmin (1269816) | more than 5 years ago | (#27773499)

Who cares about eBay and MySpace... tell me about the major players! What is Google running?

MySQL and Bigtable (1)

Wee (17189) | more than 5 years ago | (#27774575)

They use MySQL for storing adwords data [typepad.com] and Google Analytics for web site metrics (which itself stores data in Bigtable [google.com] ).

Bigtable holds a mind-bogglingly huge amount of information. The amount of stuff in their MySQL clusters is merely "absurdly large" by comparison.

-B

Re:MySQL and Bigtable (1)

bami (1376931) | more than 5 years ago | (#27776131)

Google Analytics is dog slow. It usually takes up to 70% of the time to load a page here (might be some shoddy ISP routing issues, but most of Google's stuff loads fast, so I doubt that), so I adblocked/point it to 127.0.0.1 for the whole domain. Same for most analytics websites.

Sorry, analytics is fun and all, but if you insist doing everything in javascript, at least make sure the page behind it is capable of giving enough bandwidth or something.

Re:MySQL and Bigtable (0)

Anonymous Coward | more than 5 years ago | (#27776949)

Not only slow, but badly inaccurate. When I compare the AdSense reports to my Apache logs, it's obvious that Google doesn't register most hits. Their data handling is absolutely incompetent. Of course they have an incentive to lose data that would cost them money.

Re:MySQL and Bigtable (0)

Anonymous Coward | more than 5 years ago | (#27784519)

It's in Google's best interest to count the most clicks, since the advertiser pays more for a click than Google pays out to publishers. It's in the advertisers interest that the (rampant) click-bot spam on the web is not counted, or else they'd take their business to another ad network. It's in your interest, as a publisher, to ignore the spam issue when looking at your own logs, because you'd like to be paid for as many clicks as possible.

Any system where Google pays for the same number of clicks as they charge advertisers could be said to be "balanced". If you are contending that it is not balanced, you should join together and start a class-action lawsuit. Otherwise, you are just complaining about where the threshold lies in the balance. Keep in mind that without happy advertisers, you wouldn't have any ad inventory (and thus no revenue) in the first place.

Welcome to the economics of the modern web; there is more than one side to things.

Storing the atoms of a human body (1)

MBoffin (259181) | more than 5 years ago | (#27773987)

This astounds me. These numbers only represent a few companies. Consider that it would take about 5,790 yottabytes* to store a 150lb human body (at a byte per atom). Now consider that people keep in their pocket more storage than existed on the planet 30 years ago. So in another 30 years.... wow. Just think about that for a minute.

* giga tera peta exa zetta yotta

Database? (0)

Anonymous Coward | more than 5 years ago | (#27774429)

This is running on MS Access, right?

Um... (0)

Anonymous Coward | more than 5 years ago | (#27774555)

torrent?

Still get lame recomendations (4, Funny)

se7en11 (833841) | more than 5 years ago | (#27774717)

With all that user data, you'd think they would know me better by now. But I still get these lame recommendations.

"You might be interested in action DVDs because you bought one in the past" - BRILLIANT!!

Adcrawler Analytics (0)

Anonymous Coward | more than 5 years ago | (#27774769)

I wonder how much of that is my adcrawler bot with its referer and user agent randomizer. You guys all clear cookies after every unique domain visit, right?

Greenplum? Really? (1)

reginaldo (1412879) | more than 5 years ago | (#27777351)

These articles make me believe that Greenplum has some good PR working, because in all the analytics I have done, people tend to scoff at Greenplum.

Hadoop clusters are more scaleable, more flexible, and strangely more supportable than Greenplum. When I worked with Greenplum, we would be able to bring down the server easily by executing simple 'select * from table' queries.

Netezza, which is strangely not mentioned, is much better for doing distincts, which is used quite often in analytics. Greenplum chokes on correlating the data sets.
Check for New Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Create a Slashdot Account

Loading...