×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

PageRank-Type Algorithm From the 1940s Discovered

samzenpus posted more than 4 years ago | from the oldest-google-juice dept.

Google 108

KentuckyFC writes "The PageRank algorithm (pdf) behind Google's success was developed by Sergey Brin and Larry Page in 1998. It famously judges a page to be important if it is linked to by other important pages. This circular definition is the basis of an iterative mechanism for ranking pages. Now a paper tracing the history of iterative ranking algorithms describes a number of earlier examples. It discusses the famous HITS algorithm for ranking web pages as hubs and authorities developed by Jon Kleinberg a few years before PageRank. It also discusses various approaches from the 1960s and 70s for ranking individuals and journals based on the importance of those that endorse them. But the real surprise is the discovery of a PageRank-type algorithm for ranking sectors of an economy based on the importance of the sectors that supply them, a technique that was developed by the Harvard economist Wassily Leontief in 1941."

cancel ×
This is a preview of your comment

No Comment Title Entered

Anonymous Coward 1 minute ago

No Comment Entered

108 comments

Good advice for all developers (4, Insightful)

ls671 (1122017) | more than 4 years ago | (#31178378)

Well, this is actually pretty good advice for any developer; Don't reinvent the wheel. Look around, search for what's been done before and adapt it to suit your needs. Of course, as a last resort, one can design something new once he has done his homework and made sure nothing that has been done before may be re-used.

Through my life, I have seen a amazing high level of work that has been done in vain because it yielded poor results and that something doing the same better already existed anyway.

Don't get me wrong here, once you have made sure that nothing already existing suits your needs or can be reused, it is fine to innovate and create real new stuff. Just don't get caught trying to reinvent the wheel unless you reinvent it better ;-)

Also, an exception to that principle could be allowed for trivial tasks that are really quick to implement and where searching for an existing solution might cost more than implementing it yourself but be really careful applying that exception rule, it is an open door that leads to trying to reinvent the wheel sometimes ;-))

Re:Good advice for all developers (0)

Anonymous Coward | more than 4 years ago | (#31178448)

Or spend less effort and come up with a non-original idea and get the dumb yahoos at the patent office to approve it. Laziness = profit

Re:Good advice for all developers (0)

Anonymous Coward | more than 4 years ago | (#31179314)

Are we sure Microsoft hasn't invented a time machine, gone back to 1941 and planted this book just to try to tarnish Google? I mean really ... can it be a coincidence that this was discovered right on the heals of Buzz?

Re:Good advice for all developers (-1, Redundant)

commodore64_love (1445365) | more than 4 years ago | (#31178534)

>>>Don't reinvent the wheel. Look around, search for what's been done before and adapt it to suit your needs.

You just described my engineering job in a nutshell. I no longer create anything myself, but instead just search the internet for already-existing circuits, or cour through my company's back designs since the project has most-likely been done already.

As for ranking pages by links to other important pages, that's rather flawed? It would mean that foxnews.com and glennbeck.com would be ranked much, much higher than they deserve to be. [There that should earn me some bonus points and boost my damaged karma.]

Re:Good advice for all developers (2, Interesting)

ls671 (1122017) | more than 4 years ago | (#31179068)

> As for ranking pages by links to other important pages, that's rather flawed?

I hinted that it was OK to innovate or even re-invent. You have to know what you are doing although.

Actually, I totally agree with you on the quoted phrase but I am still looking for a solution (Holy Grail?) to supplant Google implementation. ;-))

Seriously, I really spent some time thinking about this...

Re:Good advice for all developers (1, Interesting)

commodore64_love (1445365) | more than 4 years ago | (#31179966)

I'm beginning to suspect I'm being targeted. Nearly every one of my posts has been demodded -1 point (which drops me down to (0) score).

And there's nothing "offtopic" about agreeing with the previous poster's statement: "Don't reinvent the wheel. Look around, search for what's been done before and adapt it to suit your needs." That really is what I do in my day-to-day routine as an engineer.

Re:Good advice for all developers (1)

paganizer (566360) | more than 4 years ago | (#31181542)

If that sort of thing happens relatively often, it does sound like one of your "foes" might be targeting you. I don't see anything that could be considered a troll in your post.
In reference to your OP, though, have you looked in to the original infoseek formula?

Re:Good advice for all developers (1)

JWSmythe (446288) | more than 4 years ago | (#31183716)

    If you pay attention to the moderation scheme on here, that isn't very likely.

    "moderators" are any average user. They get 10 points at a time to spend as they'd like. If they write a comment to a particular story, they can't moderate on that story.

    "Metamoderation" lets you re-evaluate moderations, but you get limited information on the comment, basically the comment itself. Only 10 of these really count. More metamoderations can be done, but don't count as high.

    In your first post, it was insightful, until you said that it should help your karma. Trolling for karma doesn't really help you out. Write good comments on a regular basis, and it will help you out.

    I've seen your posts before, and from what I recall, they've been good ones. If you feel like commenting, then do it. Don't worry about your karma score. Maybe you've picked up an enemy or two. You can't satisfy everyone all the time. For example, I'm sure this will be scored down, because it is off-topic. Oh well, shit happens. For me, it's heavily outweighed by the number of good on-topic posts that I do.

    Even my good posts get scored wildly. It's funny to look at the moderation that can send the score fluctuating from a -1 to a 5, but in the end, good posts end up with high scores, because so many people are moderators. I don't lose any sleep over it, and neither should you.

Re:Good advice for all developers (0, Offtopic)

commodore64_love (1445365) | more than 4 years ago | (#31179992)

P.S. And the reason I'm being targeted (I believe) is because of my comments yesterday that I consider Linux to be inferior to Windows and Mac (as far as ease-of-use by the average John Q. Customer). Heaven forbid you criticize the Holy Bible, Sacred Koran, or the Beloved Ubuntu.

The zealots will stone you.

Re:Good advice for all developers (5, Funny)

cyphercell (843398) | more than 4 years ago | (#31178670)

It would have been a pretty exhaustive search without google.

Re:Good advice for all developers (3, Informative)

commodore64_love (1445365) | more than 4 years ago | (#31178712)

They could have used one of the other search engines in existence in 1997-98, like Altavista or Lycos or Magellan or WebCrawler or Yahoo or Excite or Hotbot.

Re:Good advice for all developers (4, Funny)

pipatron (966506) | more than 4 years ago | (#31178792)

I can see that you must be young enough never to have used the search engines you list, if you suggest that you would have been able to find anything useful.

Re:Good advice for all developers (1, Interesting)

Anonymous Coward | more than 4 years ago | (#31178850)

You must be the one who hasn't. While they can't be compared to what we have today, they were useful. I particularly liked Infoseek. Oh and GOOGLE SUCKED at first, I remember this VERY WELL, i tried it when it started and it wasn't any good. I guess it took a while to gather enough data to really show their algorithms' potential, now i always use google of course...

Re:Good advice for all developers (1)

PCM2 (4486) | more than 4 years ago | (#31182206)

I particularly liked Infoseek.

Infoseek was my flavor, too. Then Disney bought it, changed the name to Go.com, brought in a terrible new design, and all of a sudden the search results went to shit. Hope it was worth the money, Disney.

Re:Good advice for all developers (2, Insightful)

Midnight Thunder (17205) | more than 4 years ago | (#31179150)

I can see that you must be young enough never to have used the search engines you list, if you suggest that you would have been able to find anything useful.

Well the results were there, somewhere between all the adverts.

Re:Good advice for all developers (2, Insightful)

commodore64_love (1445365) | more than 4 years ago | (#31179938)

Remember the internet was a LOT smaller in 1997, so while my preferred engine at that time (altavista) may not have been as good as google today, there was also a lot fewer websites to search through.

>>>I can see that you must be young enough never to have used the search engines you list

I'm familiar with the slashdot practice of not RTFA (reading the frakking article), but when did people stop reading user names? You think I'm young? My first computer was a Commodore and my second an Amiga. I was making music, producing primitive videos, and "surfing" the internet before the web even existed.

Re:Good advice for all developers (1)

rusl (1255318) | more than 4 years ago | (#31180900)

I would use Yahoo primarily and Altavista occassionally. But I don't know why you claim that you're so old to have done that stuff. I was young then. I remember the Commodore64. What is your definition of old then? I'm nostalgic about that too but I think of that as being because I was (and still am sorta) young and I get nostalgic - whereas an older person would be less sentimental because that wasn't the first computer and also because it wasn't that great. really.

Re:Good advice for all developers (1)

Canazza (1428553) | more than 4 years ago | (#31182320)

I'm 24, I had my Commodore 64 when I was 4, I wasn't surfing the web until about 1997 either. I don't consider myself old (Sometimes I do, I just don't "get" this modern music nonsense). I can remember using AltaVista and Yahoo search. Sometimes searching through Geocities directories back in the day.

I've forgotten what my point was.

Anyway. I'm not old.

Re:Good advice for all developers (1)

theaveng (1243528) | more than 4 years ago | (#31182944)

I can understand C64_love's frustration. When some mods purposely damage your reputation/karma, you lose your ability to post for 1-2 days time. You feel that you are being muzzled like a Chinese citizen instead of a free person.

NOBODY should set out to destroy another slashdotter like that.

Re:Good advice for all developers (1)

b4dc0d3r (1268512) | more than 4 years ago | (#31180006)

In that era, search engines were like mini databases. You had to put in exactly the right query to get results.

I was famous for being able to find anything, and I mean anything, using AltaVista, when others couldn't. It was a lot like programming in a way, which explains why I was good.

I used to be able to be king of Google until they made it much more natural-language tolerant. Now my 10 year old neighbor can find things when I can't.

On the plus side, I can now type in "When is the damn stupid bowl mutherfucker" and Google shows the first hit to NFL.com, with the teams listed next to the date and time, each linking to the team's home page. Not the sponsored result, and clearly optimized for the day, but the goal of cataloging the world's data is on track.

Re:Good advice for all developers (0)

Anonymous Coward | more than 4 years ago | (#31180950)

Perhaps that is why the world is going down the road towards idiocracy. Google can find anything, regardless of how intelligent you are. Then you pass off your google search results to your boss, who gives them to his boss, who gives them to his boss, who does a google search and finds that everything is running fine. Except for the fact that no one has done any real research or reading and has just done a couple quick phrase matching techniques. Eventually the company goes under, and you begin your job search on google, with the search reading, "I can google." Only a couple people will realize, that no, no you can't. In fact, most people can't do much of anything. I'd be surprised if by the end of the decade, there will be anyone alive who can still read. I'm not being cynical. Google searches showed me most people say the world will end by 2012, so it must be true.

Re:Good advice for all developers (2, Funny)

MobileTatsu-NJG (946591) | more than 4 years ago | (#31180828)

I can see that you must be young enough never to have used the search engines you list, if you suggest that you would have been able to find anything useful.

What are you talking about? I found useful results all the time! I just wish my interest in being a porn connoisseur could be turned into a career.

Re:Good advice for all developers (3, Insightful)

Weirsbaski (585954) | more than 4 years ago | (#31178858)

Well, this is actually pretty good advice for any developer; Don't reinvent the wheel. Look around, search for what's been done before and adapt it to suit your needs, and patent it.

Re:Good advice for all developers (3, Interesting)

Junior J. Junior III (192702) | more than 4 years ago | (#31178908)

"don't reinvent the wheel" is kindof dumb advice when you think about it.

If I didn't already have a wheel, it would take me a really long time to traverse the world in search of a wheel to see if it had been invented yet. If it has, and it's got sufficient penetration into the market that I know about them already, then, sure, it's a no brainer not to reinvent it. On the other hand, if no one in my immediate vicinity has ever heard of the wheel, then inventing one -- quickly -- is a lot smarter than traversing the known world until I run into a culture that already has wheels. Especially if they might exploit their superior technology to subjugate and enslave my people. Better to have a home-brewed shitty wheel to start off with, and upgrade quickly if I discover that there are other friendly cultures that have better wheels already, and have at least something if I don't discover anyone else, or discover hostiles who already have them.

As long as the cart is loosely integrated with the wheels I have, upgrading to better wheels when they are found to be available should be easy. And I might just learn something about wheels while studying them that applies to other problems, or could even possibly improve the existing state of the art with respect to wheels.

Re:Good advice for all developers (1)

ls671 (1122017) | more than 4 years ago | (#31178934)

Re:Good advice for all developers (2, Insightful)

Junior J. Junior III (192702) | more than 4 years ago | (#31179036)

Reinventing the wheel is a phrase that means to duplicate a basic method that has long since been accepted and even taken for granted. [wikipedia.org]

K, so how is Brin and Page developing PageRank when an obscure economics paper published at Harvard in 1941 and only re-discovered in 2010 reinventing the wheel?

Would Page and Brin have gotten there faster if they'd rooted through Harvard's economics library in the hopes that the best-yet algorithm for search results ranking would be there, somewhere? Was the Harvard paper "long accepted and taken for granted" or was it "forgotten and ignored"? Is PageRank a "duplication of a basic method"?

Personally, I think google got there quicker by re-inventing the wheel, if that's what this was. In my opinion, if something is only recognized as reinvention of the wheel after the fact, it ipso facto is not reinvention of the wheel in the sense described in the wikipedia article you cited.

who says it was rediscovered in 2010? (1)

SuperBanana (662181) | more than 4 years ago | (#31179952)

K, so how is Brin and Page developing PageRank when an obscure economics paper published at Harvard in 1941 and only re-discovered in 2010 reinventing the wheel?

Who says it was first re-discovered in 2010, and not in the 90's, by two guys from Stanford?

There are a million possible ways Brin and Page could have come across that paper. A friend who was studying economics, maybe a magazine like the Economist mentioned it, etc.

Pretty funny claiming to be the first to re-discover something, though. What if you didn't 'discover' the other mentions in the, oh, SIXTY YEARS since the paper was published?

Also, I don't know which is worse. The idea of them stumbling across the paper and using the idea (and thus having no original idea except the application of the concept to ranking of web pages, which isn't nearly as impressive) or the idea that the PageRank concept is something of an obvious idea?

Re:who says it was rediscovered in 2010? (1)

Nazlfrag (1035012) | more than 4 years ago | (#31180324)

It's an obvious idea that they most likely implemented independently. I say this because the only similarity between the models is a trivial feedback loop summed up as 'x is important if it is endorsed by important x', the methodology is completley different in each case.

Re:who says it was rediscovered in 2010? (2, Informative)

TerranFury (726743) | more than 4 years ago | (#31180598)

Dude; it's just Jacobi iteration applied to a particular, reasonably intuitive model. This isn't to knock it -- just to say that it was probably easier to reinvent it than to find some obscure paper --- especially one which probably isn't in the electronic databases.

Re:who says it was rediscovered in 2010? (1)

Kevin Stevens (227724) | more than 4 years ago | (#31184326)

They didn't reinvent anything, the analogy is poor. If anything, they took the idea and found a new application for it, then created a functional product based on it.

I was in college around the time Google was just getting up and running, and it was widely known that the quality of a paper could be measured in how many times it was cited by other people, in fact IIRC, some journal search engines would even show the number of citations when you did a search. So the idea that important documents were "linked to" more often and more relevant than those that are not, was not new. What was new, is taking that concept, applying it to web pages. Not only that, but running the analysis on a very large dataset and making it instantly searchable. This mental leap was not trivial, the implementation even less so. Even after the page rank algorithm became publicly known, it took years for the other search engines to catch up, and search engines have always been big business, not some underfunded niche that just hadn't gotten proper attention from the commercial or academic community.

I would say most innovation these days is driven by taking either an abstract idea or an idea used in one application, and then applying it to an area that no one has before. And never underestimate the huge gap between "we could use X to do Y and Z better" and "we have implemented X and now we do Y and Z better."

Re:Good advice for all developers (2, Insightful)

b4dc0d3r (1268512) | more than 4 years ago | (#31180272)

Wow, +5 Pedantic.

Reinventing the wheel implies that they set out to find a measure of importance. So they would have had to decide to make a search engine, have it produce relative results, decide that relative importance is the key, and then go searching for ways to find relative importance. There's a major gap there, a leap in thinking that simply wasn't present at the time. How important a page is indicates what order it should be in results. That makes sense, but it wasn't obvious at the time. Previous results were based on things like the number of times the page mentions your word, or what order the pages were added. People tried to figure out: what is that quality which makes a page more relevant than others? And they failed.

The key was settling on Importance, or we could call it Charisma. When you mention the name "Brad", do more people think of the guy you work with first or Brad Pitt? Brad Pitt is more relevant (technically relevant to more people), but why? More people know of him, and more people speak of him. More importantly, more important people speak of him.

Scientific papers have been measured for their influence factor this way (but that might be a false correlation: http://www.physorg.com/news165950992.html [physorg.com] "... papers published early in a field receive citations essentially regardless of content because they are the only game in town.") If they had looked to science to see what makes something influential, they would have seen the same concept. But they didn't know they needed to find "influential", just "relevant".

The breakthrough Google made was deciding on a quality which made things more relevant, which is roughly equivalent to notoriety. Not just the number of references to a page, but the weight of those references in relation to who references them. That's where link farming sprouted, and they had to figure a way to cancel that effect out.

How many people know this page, as opposed to that page? And then they had to figure a way to find the pages, process the data, calculate notoriety, and serve it up quickly, and create a revenue stream from all of that. It's not about whether you're going to arrive more quickly by re-inventing the wheel. Often times you can, especially if it's in a language where you don't know all of the built-in functions. You can write a linked list with sorting faster than searching for how the language implements it under certain circumstances.

It's about whether you will arrive better, at a better solution in other words. The point was to look elsewhere for implementation, but the part you missed was you have to have inspiration to know where to look.

Google wanted to get something to lots of people. They might have had stuff in baskets. They could re-invent the wheel to make the baskets easier to transport, or they could build up a farm, attracting farm hands and their families and gradually build up a town, making the people come to Google instead. Once you know you need a transportation solution, it's easier to copy an existing idea. It just so happens that once you decide the solution to your problem is relative importance, there's a description of how to do it in a book from the 1940's.

Re:Good advice for all developers (1)

LoztInSpace (593234) | more than 4 years ago | (#31182338)

What I reckon they did was invent a square wheel (i.e. sub-optimal search algorithm) then googled ranking algorithms, stumbled upon the aforementioned papers and then didn't re-invent it all over again.
A side effect of this was the introduction of the term 'dogfooding' which was then re-invented by Microsoft.
Maybe.

Re:Good advice for all developers (1)

c0lo (1497653) | more than 4 years ago | (#31178962)

Don't reinvent the wheel. Look around, search for what's been done before and adapt it to suit your needs.

What if I need a cog or a chassis? Scrapheap Challenge [wikipedia.org] rings a bell? With, in most of the occasions, better build the chassis yourself than struggle to get it free and work furiously to adapt it?

Re:Good advice for all developers (2, Interesting)

Hurricane78 (562437) | more than 4 years ago | (#31179076)

There is a second rule of advice that goes with this, but that unfortunately usually is forgotten:

Don’t imitate. Innovate!

Yes, it is a good idea to not reinvent the wheel. But it’s even better to invent an airplane! (You know: Thinking outside the box. “Inside the box” would be a square wheel. ;)

Re:Good advice for all developers (0)

Anonymous Coward | more than 4 years ago | (#31180776)

Yes, it is a good idea to not reinvent the wheel. But it’s even better to invent an airplane!

An airplane without wheels? How do you get off the ground? =P

Re:Good advice for all developers (3, Informative)

twosat (1414337) | more than 4 years ago | (#31179206)

Reminds me of the invention of Turbo Codes in the early Nineties for forward error correction for communication networks. It was later discovered that in the early Sixties, low density parity check (LDPC) coding was developed that performed a similar function but was not used because of the lack of computer power and memory back then. The LDCP patents had expired by then, so now there are two technologies doing the same thing in a different way but one is patent-free. In a similar vein, I read some years ago of a company in the UK who search through expired and current patents looking for inventions that meet their customers' needs. They would often find solutions in a totally different field to the area being researched and a lot of it was stuff that was ahead of its time and its technology had been abandoned.

Re:Good advice for all developers (2, Funny)

Ben1220 (1503265) | more than 4 years ago | (#31181996)

This is exactly why I would rather be a Computer Scientist working at a university or industrial research lab then a software developer. Because I want to create real new stuff.

Re:Good advice for all developers (1)

invalid_user (253723) | more than 4 years ago | (#31182340)

Ha ha, then you won't be able to publish any paper.

The research world is now filled with researchers who simply don't read others' works, and just keep writing. Even if it's been done before (or even if it doesn't make any sense at all), eventually they will be able to find a venue that will accept the paper, because no reviewer can know everything.

And worries not about citation! Those work will be cited because of exactly the same reasons, or even worse, because the party who cites it has performed work that has been superseded by other more advanced studies, and that shallow paper is the only one that can justify his (equally shallow) work.

Remember, in the end, it's the guy who publishes the most that wins.

Re:Good advice for all developers (1)

Fantom42 (174630) | more than 4 years ago | (#31183620)

Well, this is actually pretty good advice for any developer; Don't reinvent the wheel.

That is some good folksy wisdom for your intrepid yound developer. Also:

"Don't count your chickens before they're hatched!"
"A little too late is much too late!"
"A small leak will sink a great ship!"
"Lost time is never found!"
"A chain is as strong as its weakest link!"
"A bad broom leaves a dirty room!"

Anyway, my point is not that your advice is bad. But are you seriously suggesting that Brin, when he had this idea, should have gone straight for the library until he found this 1941 Economics paper? I wonder how long that would take? How about this advice:

"Go ahead and reinvent the wheel. If its useful enough, no one will care."

Very primitive (0)

Anonymous Coward | more than 4 years ago | (#31178436)

The algorithm ran on pigeons that were the size of entire rooms, and with less processing power than today's pigeons which fit into the palm of your hand.

Patent? (1)

gmuslera (3436) | more than 4 years ago | (#31178462)

So it could be used as previous art to invalidate Google's patent?

Re:Patent? (3, Funny)

Jorl17 (1716772) | more than 4 years ago | (#31178500)

If you hate Google: Yes. If you don't: No. If you want Bananas: Get them.

Re:Patent? (1, Insightful)

commodore64_love (1445365) | more than 4 years ago | (#31178592)

CHECKLIST:

Is Google a megacorp? Check.
Did Google threaten to move to India if Obama raises corporate taxes? Check.
Does Google spy on users and collect data? Check.
Did Google receive monetary assistance from the Taxpayer's Public Treasury? Not sure.

Okay. I'll let them slide and not try to invalidate their pagerank patent, but that likely won't stop Microsoft from making the attempt.

Re:Patent? (-1, Offtopic)

commodore64_love (1445365) | more than 4 years ago | (#31180012)

The person who marked this post -1 redundant needs to DEVELOP A FRAKKING *SENSEOFHUMOR*. Jeez. Loosen up. You're wound tighter than a girl at a middle school dance.

Re:Patent? (1)

bertoelcon (1557907) | more than 4 years ago | (#31180244)

Commodore, quit egging them on. You posting whining about the way you were modded only gives more targets to people modding you down.

Re:Patent? (5, Informative)

Meshach (578918) | more than 4 years ago | (#31178512)

So it could be used as previous art to invalidate Google's patent?

From my read of the linked article it seems that Sergey and Larry cited the previous art in their publications. So it looks like there was no plagiarism, just building a new idea using the tools provided by an earlier idea.

Knew about this before I first used google (1)

dbIII (701233) | more than 4 years ago | (#31182918)

A librarian described the new google thing to me back then as being like the science citation index only applied to the web in general instead of published papers.

Re:Patent? (3, Informative)

nedlohs (1335013) | more than 4 years ago | (#31178542)

No, since the one from 1941 didn't say "on the internet" or "with a computer".

Re:Patent? (1)

Theaetetus (590071) | more than 4 years ago | (#31178952)

No, since the one from 1941 didn't say "on the internet" or "with a computer".

Neither did Google's PageRank patent. You shouldn't use Slashdot as your source for patent law, particularly when search engines like Google exist to help you find out facts.

Re:Patent? (1, Insightful)

Anonymous Coward | more than 4 years ago | (#31179470)

Patent?

I think the way the Google maintains its search superiority has more to do with massive banks of machines and keeping their algorithms secret rather than sending lawyers after anyone. The PageRank algorithm is little more than a useful application of Markov Chains... hardly seems patentable. (Of course, RSA doesn't seem like it should have been patentable either...)

Just more proof... (2, Interesting)

pizza_milkshake (580452) | more than 4 years ago | (#31178492)

Nil novi sub sole

Re:Just more proof... (0)

Anonymous Coward | more than 4 years ago | (#31178612)

Nihil?

Re:Just more proof... (0)

Anonymous Coward | more than 4 years ago | (#31178638)

semper ubi sub ubi

Re:Just more proof... (0)

Anonymous Coward | more than 4 years ago | (#31178980)

Iam perdes ludum.

NetCraft confirms it... (0)

Anonymous Coward | more than 4 years ago | (#31179138)

Latin is dead

linearity (3, Interesting)

bcrowell (177657) | more than 4 years ago | (#31178626)

What really shocked me when someone first described page rank to me was that it was linear. I felt that this just had to be wrong, because it didn't seem right for a *million* inbound links to have a *million* times the effect compared to a single inbound link. Maybe this is just the elitist snob in me, but I don't feel that the latest American Idol singer is really a thousand times better than Billie Holliday, just because a thousand times more people listen to him than to her. If it was me, I'd have used some kind of logarithmic scaling. I think people do usually describe page ranks in terms of their logarithms, but that's taking the log on the final outcome. I'm talking about taking logs at each step before going on to the next iteration.

To me, this has an intuitive connection to the idea that the internet used to be more interesting and quirky, and it was more about individuals expressing themselves, whereas now it's more like another form of TV.

Of course that's not to say that I want to go back to the days before page rank. God, search engine results were just horrible in those days.

From an elitist snob point of view, one good thing about page rank is that it doesn't let you just vote in a passive way, as Nielsen ratings do for TV. In order to have a vote, you have to do something active, like making a web page that links to the page you want to vote for.

Re:linearity (5, Insightful)

Ibiwan (763664) | more than 4 years ago | (#31178756)

From a naive, off-the-cuff armchair analysis, it seems to me that PageRank only serves as a way to provide ordering of search results. Funny thing... sorting on positive values will always yield the same ordering as a sort on those values' logs.

Re:linearity (1)

blahplusplus (757119) | more than 4 years ago | (#31179384)

I think the best search engine is the one that can tailor it's algorithms to the specific users tastes. I'm certain that custom-algorithms would be the way to go for instance.

When some people search for stuff they usually have a good idea and class of information they are looking for. Either way it would be an interesting project/

Re:linearity (1)

bcrowell (177657) | more than 4 years ago | (#31179450)

Funny thing... sorting on positive values will always yield the same ordering as a sort on those values' logs.

Yes, but if you look back at my GP post, I explained that I'm not talking about taking the log of the final result (which would be irrelevant for sorting).

Re:linearity (4, Informative)

martin-boundary (547041) | more than 4 years ago | (#31178770)

What really shocked me when someone first described page rank to me was that it was linear. I felt that this just had to be wrong, because it didn't seem right for a *million* inbound links to have a *million* times the effect compared to a single inbound link. Maybe this is just the elitist snob in me,

The algorithm is not at all linear in the effect of inbound links. Two inbound links don't have the same effect, instead their effects are first weighted by the PageRank of each node of origin.

Now the distribution of PageRank among nodes is approximately power-law distributed on the web. Intuitively, this means that among all inbound links of a particular node, when that number is high, then 99% have practically no effect on the rank of that node, exactly as you probably thought in the first place. More precisely, you can expect a pareto (or similar) distribution for the influence of incoming nodes, which is not unexpected since these sorts of distributions occur a lot in social sciences.

That said, the PageRank algo is actually linear, but only in the sense of being a linear operator on weight distributions. If you normalize the weights after each iteration, the algo is actually affine (on normalized distributions) rather than linear.

Re:linearity (3, Informative)

Shaterri (253660) | more than 4 years ago | (#31178844)

The reason why PageRank 'has to be' linear is essentially mathematical; treating importance as a linear function of the importance of inbound links means that the core equations that need to be solved to determine importance are linear and the answer can be found with (essentially) one huge matrix inversion. If you make importance nonlinear then the equations being solved become computationally infeasible.

What's interesting to me is how close the connections are between PageRank and the classic light transfer/heat transfer equations that come up in computer graphics' radiosity (see James Kajiya's Rendering equation); I wonder if there's a reasonable equivalent of 'path tracing' (link tracing?) for computing page ranks that avoids the massive matrix inversions of the basic PageRank algorithm.

Re:linearity (4, Insightful)

Jerry Talton (220872) | more than 4 years ago | (#31179622)

*sigh*

You understand neither how the parent post is using the word "linear" nor the PageRank algorithm itself. You can rewrite the eigenproblem at the heart of PageRank as the solution to a linear system, but very few people do. Moreover, this is not the correct intuition to employ to understand what's going on: there are no "massive matrix inversions" here, just a simple iterative algorithm for extracting the dominant eigenvector of a matrix.

Furthermore, you've got it exactly backwards regarding the "connection" between PageRank and light transfer. Since the Markov process used to model a web surfer in the PageRank paper is operating on a discrete domain with an easily-calculable transition function, the stationary distribution (or ranking) can be determined exactly. In rendering, you have a continuous problem for which Markov chain Monte Carlo techniques provide one of the most efficient ways to approximate the solution...but you have to actually simulate a Markov chain to get it (see, for instance, Veach's seminal paper on Metropolis Light Transport). Computing PageRank is an "easy" problem, by comparison.

Re:linearity (0, Redundant)

Foolicious (895952) | more than 4 years ago | (#31181722)

Why the *sigh*? By your answer it's clear what you believe (about yourself and the topic). The *sigh* adds no real value, except perhaps as a sort of additional marker of how wrong you think this was.

Re:linearity (2, Insightful)

phantomfive (622387) | more than 4 years ago | (#31178874)

Maybe this is just the elitist snob in me, but I don't feel that the latest American Idol singer is really a thousand times better than Billie Holliday, just because a thousand times more people listen to him than to her.

You are measuring the wrong thing. Google isn't measuring who is 'better,' it is trying to measure what page is more interesting to a web surfer, and pages tend to be more popular because they are more interesting to more people. Thus if you do a search for Brittany, you are more likely to find Brittany Spear's fan club than you are an academic study of why her beautiful innocence was so popular (and oh yes was it beautiful!). People who are looking for more specific academic things learn to add extra search terms to their query, like "Brittany analysis" or "Why is Brittany popular?"

The way to solve this problem better is for Google to get to know you and your preferences: if Google knows that you are mainly interested in academic sorts of things, then it can automatically return that sort of thing when you do a search for Brittany. This is convenient, but drives certain people crazy because of privacy issues.

Re:linearity (1)

TheLink (130905) | more than 4 years ago | (#31180212)

> The way to solve this problem better is for Google to get to know you and your preferences
> if Google knows that you are mainly interested in academic sorts of things, then it can automatically return that sort of thing when you do a search for Brittany.

What would be better is if you can choose an "Aspect" or "Point of View" or "Stereotype" for a search. After all I could be doing a search on behalf of someone else. Or I could be interested at different things at different times.

So say I pick the "Thirsty Joe Sixpack" POV then type "beer", I should get a bunch of stuff about beer that's more related to what a thirsty beer drinker would want e.g. nearest places to buy beer, online beer ordering and delivery etc.

Whereas if I select the "doing highschool homework" Aspect and type beer, I would probably get history of beer, making beer etc.

Of course that's just a crude/bad example. I'm too lazy to think of better examples.

Currently Google too often lists a lots of mailing lists for some of my searches (which would be fine but they don't have answers to the questions), and sometimes even "link spam" sites.

Even worse is when Google lists pages in journals that I cannot read. Which is rather hypocritical of Google since they penalized BMW Germany for showing Google's spider bot different content from what the users will see.

Re:linearity (2, Informative)

j1m+5n0w (749199) | more than 4 years ago | (#31178916)

it didn't seem right for a *million* inbound links to have a *million* times the effect compared to a single inbound link

Pagerank isn't just a citation count; it's defined recursively, such that a link from a page with a high pagerank is worth more than a link from a page with low pagerank. Similarly, a link from a page with many outlinks is worth less than a link from a page with the same rank but few outlinks.

It does turn out to be more of a popularity contest than a quality metric, though. I think you're absolutely right about that.

Re:linearity (1)

justleavealonemmmkay (1207142) | more than 4 years ago | (#31182512)

Maybe this is just the elitist snob in me, but I don't feel that the latest American Idol singer is really a thousand times better than Billie Holliday

It would be a problem if your search for "Billie Holiday" would return links to American Idol. Pagerank is not the WHERE clause, it's the ORDER BY clause.

Let me be the first to say (0)

Anonymous Coward | more than 4 years ago | (#31178652)

Who gives a rats ass ...

indeed it is not new (1, Insightful)

Anonymous Coward | more than 4 years ago | (#31178734)

In many different types of jobs, people use the counts the number of times their research papers have been referenced and quoted with different "points" depending on where it was quoted. Fx. someone working with medicine hos has his work referenced in The Lancet, counts more than a reference in local-hillbilly-news.
I belive there are sources collects this information. Sorry for being so vague, but I can't remember the specifics. (but hey, isn't that just what we do i Slashdot comments)

Markov Chains (0)

Anonymous Coward | more than 4 years ago | (#31178736)

Markov Chains were introduced in 1906 according to Wikipedia. That's the origin of PageRank. People have also been using these tools for ages to rank the impact of journals, etc.

additional ranking algorithm in the 1940s paper (4, Funny)

commodoresloat (172735) | more than 4 years ago | (#31178846)

allowed pages to be ranked and categorized according to whether it was "insightful," "interesting," "informative," "funny," "flamebait," or "troll."

Vistas in Information Handling, Spartan Press 1962 (2, Interesting)

ArmchairAstronomer (724678) | more than 4 years ago | (#31178992)

The most amazing computer book ever. It has Doug Englebart's first description of “augmenting the human intellect” using computers. It describes what we know now as windows (generic) with pointing devices. It has an early linear document retrieval system using page ranks based on word co-occurrences and it has an early language translation system (Russian to English with examples of translating Soviet missile papers). What a preview of things to come.

It is worth a read just to get into the heads of some of the computing pioneers.

Another required reading book for all aspiring CS students should be John Von Neumann’s the “Computer and the Brain.” Dated, but again this is what they were thinking.

We have a lot to be humble about given the hardware and compilers they had to work with. Not to mention primitive development environments, a.k.a. the card punch.

Quite a difference from theory to practice! (1)

BillKaos (657870) | more than 4 years ago | (#31178994)

I guess previous work had the idea right, but actually building a system which can handle millions of links and reply in no time is not a small feature.

This reminds me of the discussion we had previously about the gap from research prototype transistors to having factories actually deliver them.

I did the same thing, albeit not in the 40s (1)

BoberFett (127537) | more than 4 years ago | (#31179028)

Back in the late 90s I created a relevance ranking system for my employer to rank the output of our legal research system. Similar to a hyperlink, legal documents have a unique references. Sometimes they're created by the publisher such as West (now Thomson) and their Federal Supplement and other times for unpublished documents it's a docket number from the court. Long story short, the documents were indexed and at run time using a combination of hit density and the number of times a document was referred to by other documents, we had a fairly accurate relevance engine. I even took it a step further and for the documents that referenced the found document, looked to see if the original search term was present in the linking document. If so, we assumed that it was linking to the found document for reasons related to the search rather than for some other reason, as court cases often are referenced for reasons outside of their main ruling.

Prior art? (1)

ridgecritter (934252) | more than 4 years ago | (#31179042)

Could some /. member who is an IP attorney comment on whether this might constitute prior art that could open up relevant Google patent(s) to an invalidation attack based on obviousness? Which, in my limited understanding, would go something like: "Well, a person skilled in the art as of the date of Google's patent application would have known of the Leontief work (published, knowledge therefore presumed) and it would have been obvious to implement the Leontif work on a computer.". And for extra interest, could anyone with "standing" (which could be any of us who use Google) file a petition for re-examination of the patents with the USPTO?

Re:Prior art? (1)

BZ (40346) | more than 4 years ago | (#31179304)

Two things:

1) Google does not hold a patent on PageRank. Stanford does.
2) If you read the patent, it's more than just the ranking system. At least as far as I can
      tell. I am not a patent lawyer, of course.

Re:Prior art? (1)

StripedCow (776465) | more than 4 years ago | (#31182968)

It is still a quite relevant question, since in the wikipedia article it says:

Google has exclusive license rights on the patent from Stanford University.

other prior work (1)

pydev (1683904) | more than 4 years ago | (#31179046)

The mathematics of PageRank go back a century. There have been many different applications since then, including to hypertext and the web. From Wikipedia:

PageRank has been influenced by citation analysis, early developed by Eugene Garfield in the 1950s at the University of Pennsylvania, and by Hyper Search, developed by Massimo Marchiori at the University of Padua (Google's founders cite Garfield's and Marchiori's works in their original paper[5]).

As such, the algorithm wasn't new, but Google was the first to build a working, large-scale search engine around it.

I can show you one form the ~10s or ~20s (1)

sbeckstead (555647) | more than 4 years ago | (#31179140)

It's called being a Movie star. Your importance is ranked by how many people really like you. And it can be gamed just like the Google one.

Re:I can show you one form the ~10s or ~20s (1)

rubycodez (864176) | more than 4 years ago | (#31179612)

actually, could think of better ranking systems for movie stars: ranking factor for each role (lead=1000, other starring role=100, minor speaking role = 10, extra = 1) summed for all movies. maybe even scale by gross receipts for each movie GNP-deflator normalized to 1970 dollars.

there is nothing new under the sun (1)

skoony (998136) | more than 4 years ago | (#31179604)

hey, i thought of that. just did'nt patent it. we now are own you regards, mike(hunkering down in the frozen north )

creativity always builds on the past (0)

Anonymous Coward | more than 4 years ago | (#31179810)

creativity always builds on the past -- okay, redundant using subject line in body, my bad

re-inventing the wheel or searching for the wheel (0)

Anonymous Coward | more than 4 years ago | (#31180068)

some would like to dev
some would like to search

just no one knows all in all
and why
patents blockage
humans blockage

so no improvement overall

Journalists Finally Put Two and Two Together (1)

kmoser (1469707) | more than 4 years ago | (#31180390)

The concept of rating something based on the weighted reputation of those entities that endorse it ("endorse" being used in the general sense) has been around, well, forever. People do it all the time when they decide who to trust. TFA should have been titled, "Brin and Page Rediscover Leontief-Type Algorithm from the 1940s" with the subtitle, "Journalists Finally Put Two and Two Together."
Load More Comments
Slashdot Account

Need an Account?

Forgot your password?

Don't worry, we never post anything without your permission.

Submission Text Formatting Tips

We support a small subset of HTML, namely these tags:

  • b
  • i
  • p
  • br
  • a
  • ol
  • ul
  • li
  • dl
  • dt
  • dd
  • em
  • strong
  • tt
  • blockquote
  • div
  • quote
  • ecode

"ecode" can be used for code snippets, for example:

<ecode>    while(1) { do_something(); } </ecode>
Sign up for Slashdot Newsletters
Create a Slashdot Account

Loading...