Beta
×

Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.

Thank you!

Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.

Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and  learn more about it. Thanks for reading, and for making the site better!

Interest Still High In the Netflix Algorithm Competition

Soulskill posted more than 5 years ago | from the if-sandler-then-quit dept.

Math 77

circletimessquare brings us an update to the status of the million-dollar Netflix competition to develop a better algorithm for movie recommendations. We've discussed aspects of the competition since it started two years ago, but the New York Times has a lengthy overview of where it stands now. "The Netflix competition is still going strong, with a vibrant, competitive roster of some 30,000 programmers around the globe hard at work trying to win the prize. The Times provides a look at some of the more obsessive searchers, such as Len Bertoni, a semi-retired computer scientist near Pittsburgh who logs 20 hours a week on the problem, oftentimes with the help of his children. There's also Martin Chabbert in Montreal: 'After the kids are asleep and I've packed the lunches for school, I come down at 9 in the evening and work until 11 or 12.' The article gets into the history of the search algorithm Netflix currently uses, and explores the hot commodity called 'singular value decomposition' that serves as the basis for most of the algorithms in competition."

Sorry! There are no comments related to the filter you selected.

Netflix (5, Interesting)

boyter (964910) | more than 5 years ago | (#25856515)

It's actually not that hard to build an algorithm which works well. Following a demonstration at TechEd I built my own implementation using Python in about 2 hours (using a vector space algorithm) or so with reasonable results. The problem is that it is very difficult to win the prize.

The best thing about it is that you get a lot of data to play with. If you are interested in parallel algorithms and large data sets give it a go. Its surprisingly interesting and sucks you in. In fact I might go play with it now.

Re:Netflix (4, Funny)

gardyloo (512791) | more than 5 years ago | (#25856555)

Its surprisingly interesting and sucks you in. In fact I might go play with it now.

This week on Life of Geeks: What not to say on slashdot.

Re:Netflix (5, Funny)

boyter (964910) | more than 5 years ago | (#25856701)

You and your dirty mind. Then again large amounts of data is pretty sexy...

Re:Netflix (0)

morgan_greywolf (835522) | more than 5 years ago | (#25856963)

Its surprisingly interesting and sucks you in. In fact I might go play with it now.

Next week on Life of Geeks: More things not to say on slashdot.

Re:Netflix (0)

Anonymous Coward | more than 5 years ago | (#25929131)

Boyter, where is this TechEd demonstration? Can't find it.

Gotta hand it to the article's author (4, Funny)

gardyloo (512791) | more than 5 years ago | (#25856543)

Each new algorithm takes on average three or four hours to churn through the data on the family's "quad core" Gateway computer.

Anyone who puts "quad core" in quotes like that is either clueless, or---when talking about Gateways---astoundingly ironic. It's kudos either way!

Re:Gotta hand it to the article's author (1)

rookworm (822550) | more than 5 years ago | (#25861245)

Pragmatic Theory -- two French-Canadian guys in Montreal

Residents of the province of Quebec in Canada are ineligible to participate.

looks like these guys are SOL :(

Re:Gotta hand it to the article's author (0)

Anonymous Coward | more than 5 years ago | (#25861881)

Clueless.
She also refers to the teams participating in the challenge as "hackers". So "hackers" are building predictive models.

Re:Gotta hand it to the article's author (1)

gardyloo (512791) | more than 5 years ago | (#25863977)

She also refers to the teams participating in the challenge as "hackers". So "hackers" are building predictive models.

Well, there she could be using the original meaning of hackers (which is not in quotes in the article). She does it without explaining some difference between a hacker and a cracker, or white-hats and black-hats, or whatever the nomenclature du jour is. That, I quite like.

Algorithm or Human inaccuracy? (4, Interesting)

cjfs (1253208) | more than 5 years ago | (#25856567)

When Bertoni runs his algorithms on regular hits like Lethal Weapon or Miss Congeniality and tries to predict how any given Netflix user will rate them, he's usually within eight-tenths of a star

Makes me wonder how accurate my own ratings would be. The difference between clicking 3 or 4 stars is often very minor and arbitrary. At the end of a movie I might rate it something totally different than 20min later. Sounds like they're doing pretty good so far.

There's a sort of unsettling, alien quality to their computers' results ... But many categorizations are now so obscure that they cannot see the reasoning behind them. Possibly the algorithms are finding connections so deep and subconscious that customers themselves wouldn't even recognize them.

Realizing the program you wrote out-performs you and you can't explain why is a rather odd feeling.

Re:Algorithm or Human inaccuracy? (1)

interstellar_donkey (200782) | more than 5 years ago | (#25856595)

Yes. I've been told that I would really like Garfield 2, and I don't know why. It's spooky, really, since all I've rated is obscure French films from the 50s and 60s.

Re:Algorithm or Human inaccuracy? (5, Funny)

davester666 (731373) | more than 5 years ago | (#25856757)

Jim Davis performed in obscure French porn in the 60's.

Re:Algorithm or Human inaccuracy? (1)

mpeskett (1221084) | more than 5 years ago | (#25858409)

The lesson here is that everyone likes Garfield 2

Don't try to deny it people, you know that it's true.

Re:Algorithm or Human inaccuracy? (0)

Anonymous Coward | more than 5 years ago | (#25860123)

Simple. The French don't like the English (particularly the English upper class) and the English upper class is portrayed rather harshly in Garfield 2.

Yes, I admit it, I did like Garfield 2. I wonder if Netflix will tell those who rate it highly that they'll also like A Tale of Two Cities?

Re:Algorithm or Human inaccuracy? (0, Flamebait)

Boronx (228853) | more than 5 years ago | (#25861999)

Netflix definitely adds weight to movies nobody wants. If you've got Garfield 2 out, that's one slot they didn't have to fill with a good movie.

Re:Algorithm or Human inaccuracy? (1)

kestasjk (933987) | more than 5 years ago | (#25856797)

Realizing the program you wrote out-performs you and you can't explain why is a rather odd feeling.

It really is, but it's very satisfying :-)

Re:Algorithm or Human inaccuracy? (3, Funny)

jacquesm (154384) | more than 5 years ago | (#25856853)

Second that. I once wrote a chess program just for kicks and it beat me on the second game and I was like 'wtf ?'

6502 assembler long long long ago...

Re:Algorithm or Human inaccuracy? (2, Insightful)

lysergic.acid (845423) | more than 5 years ago | (#25857887)

well, i think that's one of the interesting things about social sciences. it's very difficult--nigh impossible in some cases--to accurately predict the behavior of a single individual. however, it is possible to predict the collective behavior of a large group of people.

this seems counter intuitive at first, but it's kinda like not being able to predict how a particular dice roll will land but still being able to predict the statistical average of 100 dice rolls.

the capriciousness of individuals eventually balances itself out if you use a large enough data set. likewise, fringe opinions also balance each other out when dealing with statistical averages.

Wow! Think about how many free man-hours Netflix (3, Insightful)

rolfwind (528248) | more than 5 years ago | (#25856589)

got from this, even when it has to pay out the prize it will be very cheap against any going rate.

Re:Wow! Think about how many free man-hours Netfli (2, Insightful)

Alterion (925335) | more than 5 years ago | (#25856657)

exactly this is spec work, plain and simple, anyone entering this kind of competition is selling themselves short

Crowdsourcing (0)

Anonymous Coward | more than 5 years ago | (#25856695)

Is it really slavery if you can leave whenever you want?

Re:Crowdsourcing (0)

Anonymous Coward | more than 5 years ago | (#25857599)

exploitation is exploitation, self imposed or not; as well, contests are, by their very nature, one-sided, there must be a loser, if there is to be a winner, therefore, competition is only valuable within the framework of co-operation, otherwise, yes, it is exploitative

see? oh well probably not....

Re:Crowdsourcing (5, Insightful)

try_anything (880404) | more than 5 years ago | (#25859579)

exploitation is exploitation, self imposed or not; as well, contests are, by their very nature, one-sided, there must be a loser, if there is to be a winner, therefore, competition is only valuable within the framework of co-operation, otherwise, yes, it is exploitative

see? oh well probably not....

I think you don't understand the concept of "fun." Read the article and the comments and tell me that the people "working" at this competition aren't getting paid handsomely. If money is the only compensation that means anything to you, you must be an economist. Congrats, you're doing your part to keep up economics' reputation as the "dismal science."

If it's exploitative for a company to provide enjoyment and intellectual stimulation to a lot of people and benefit financially as a result, then I guess publishing companies don't deserve my support, either. And the movie studios... theater companies... restaurants... and of course any bands that get paid for gigs are just a bunch of ruthless exploiters.

I guess the only commercial entertainment that's okay is what I can enjoy completely passively, without any mental effort at all. That way I'm not being exploited, right? Because work is an awful, awful thing ;-)

Wake up; it's not the nineteenth century or even the twentieth century. Everyone has a natural appetite for work, and unlike our unfortunate ancestors, ours is not overwhelmed and sickened by the work required for mere survival. You're addressing a relatively privileged group of people; we earn enough to support ourselves on less work than we have an appetite for. As a result, we don't have to regard all work as a curse imposed on us by necessity. Work freely done and enjoyed is a blessing.

If you insist that our entire appetite for work be channeled through grim-faced contract negotiations, then that blessing is ruined. What's the point of ruining our fun? So you can save us from the misery suffered by our great-grandparents?

Obviously none of what I said applies to call center employees, game company employees, and technical support employees. They should pay careful attention to what you say ;-)

Re:Crowdsourcing (0)

Anonymous Coward | more than 5 years ago | (#25859891)

Yes, I indeed understand fun. I'm having a ball reading your so called arguments. If fun were enough, NetFlix wouldn't need the prize, would they now? Clearly they are appealing to greed, and not fun. Of course, you won't agree. By the way, economics hardly limits itself to cash transactions. I am not an economist, but I understand enough to know that you can't maintain an economy on contests.

Yes it is exploitative for any enterprise to pay less than fair compensation in any exchange, and yes, I'd agree that your examples, as most modern companies enterprises take more from the community than they deserve or return, imho.

I'm not against work, nor did I say so in any fashion, but you'd like to distract from the real discussion here, is all I can see from your words.

LOL. If anyone is living in the past it's people like you who think exploitation is business. Sadly, what we call business is, in many cases, exploitation of the workers by the owners. Boy, that's something you don't have the honesty to consider or admit may be true.

Your arguments are inadequate and specious and seem to only serve to buttress a weak belief system.Indeed, all I see is typical right wing rhetoric, pure and simple.

Re:Crowdsourcing (2, Insightful)

try_anything (880404) | more than 5 years ago | (#25860297)

it is exploitative for any enterprise to pay less than fair compensation in any exchange

I only demand to be compensated for some pain or loss on my part. I don't need to be compensated for pleasure. Should I be jealous of the benefit another party gets when I engage in a mutually beneficial exchange? If I give a gift to someone, should they be angry if I enjoy giving it more than they enjoy receiving it?

Ah, but you said "enterprise." I do believe that companies are inherently more prone to evil than the people they comprise, and they need careful watching and manipulation (by consumers and by regulators) to make sure they don't abuse their economic power or legal status (which is important when dealing with special legal entities such as corporations.) However, when assessing a mutually beneficial exchange with another party, I don't see any need to add a special penalty if the other party is a business entity instead of a person.

most modern companies enterprises take more from the community than they deserve or return

You are playing a subtle game with words here -- if "taking" is bad then you must be referring to some loss or cost incurred by another party. There is surely nothing wrong with "taking" pleasure from someone without incurring any cost or displeasure for them, so you are strictly using the word "take" to refer to the case where one party gets something and as a result the other party loses something.

In this case Netflix may get something from the work around the prize, but what do they take, and who do they take it from? Do they "take" the time and effort of the participants? The participants don't seem to feel that their time and effort are being "lost" in any way, so I don't see how that is possible.

They may "take" some right to any IP generated by the winner, but they will pay one million dollars for it, and any participant who feels that is unfair is free to keep his algorithm to himself instead of submitting his results and claiming the prize.

Re:Wow! Think about how many free man-hours Netfli (5, Insightful)

Animaether (411575) | more than 5 years ago | (#25856831)

Why is it considered selling yourself short if you do work for free for a commercial entity... but not when you contribute to, say, FireFox, ThunderBird, Apache, the Linux kernel, and so forth and so on?

In both cases you are typically doing work for absolutely zilch as far as cash or prizes go. You may get a fuzzy warm feeling on the inside, you may simply enjoy doing the work (similar to the fuzzy warm feeling), maybe you enjoy the popularity it gives you. On rare occasions, maybe the work you do there lands you a job further down the line but that's not really something you can bank on. In both cases, you are also doing work somebody else -could- have been doing, for actual pay. I won't get into an argument of whether doing work for free means you're 'stealing jobs' - fact simply is that Netflix -is- getting a lot of work done practically for free that they would otherwise have had to hire somebody for; you would have to agree as otherwise "selling themselves short" would not apply.

So yes, you're doing work that should be landing you some cold hard cash when you...
- devise a matching algorithm for Netflix
- create a video for Radiohead
- submit photos to a Canon photo competition that they are then free to use in any and all marketing material aka ads regardless of whether your photo actually won that competition.

But isn't that pretty much the status quo that many here -want- to go to? Those making their money with proprietary programming, creating arts, etc. are dinosaurs in dying business models, no?

(only semi-flamebait)

Re:Wow! Think about how many free man-hours Netfli (4, Insightful)

Kryptikmo (1256514) | more than 5 years ago | (#25856885)

It's not selling yourself short to work on FOSS for a very simple reason. Work on FF, or Thunderbird, or open-sourcing a script that I wrote to convert music is free at the point of delivery. That is, anyone can use it without paying. Freely given, and freely distributed.

However, in this case the user of the algorithm is paying Netflix. Netflix takes the work that I have done, and closes it off from other people. My work goes not to benefit the community, but merely to benefit one company - a company that has paid me (cheaply) for my work. Since companies by definition only care about the bottom line, their intent is not to benefit the community, but to benefit themselves. You are effectively working for them for cheap, selling yourself short.

If netflix were to give away the algo for use by anyone else too, then it would be very generous and then you may be able to make a comparison with FOSS. I( have no idea if they will do that or not. However, if I were a shareholder, I would not want them to give away a potentially killer feature for which they paid $1m.

Saying that, if you enjoy playing with this, go ahead! Just be honest with yourself about. If you still want to do it, wallow in it. But it's an extremely pernicious thing to do to link this with working on something that is done to benefit everyone. It simply is not the same thing.

Re:Wow! Think about how many free man-hours Netfli (5, Informative)

Spy Hunter (317220) | more than 5 years ago | (#25857009)

Actually Netflix closes nothing off. In fact, in order to receive the prize, the winner must publish their algorithm to the public. The winner could easily open-source the entire thing, or OTOH they're also free to patent it out the wazoo and start pimping it out. The only condition Netflix imposes is that Netflix gets a non-exclusive license to use the algorithm in exchange for the prize money, which is eminently reasonable.

Re:Wow! Think about how many free man-hours Netfli (2, Interesting)

Kryptikmo (1256514) | more than 5 years ago | (#25857121)

That's remarkably reasonable. If I was LOVEFiLM or Amazon I'd be cackling with glee. I'm not though, so I'll just be depressed that one could hope to patent an algorithm. Not hardware that carries out an algorithm, but just an algorithm.

Although if I were a netflix shareholder I'd be pissed off that the company were giving away my funded research for free, when they could probably get it closed off and reap the rewards. Mind you, the amount of publicity that they have received - I know about Netflix now and I don't watch DVDs or live in the USA! - is probably more than worth it...

Re:Wow! Think about how many free man-hours Netfli (0)

Anonymous Coward | more than 5 years ago | (#25860007)

Actually Netflix closes nothing off. In fact, in order to receive the prize, the winner must publish their algorithm to the public. The winner could easily open-source the entire thing, or OTOH they're also free to patent it out the wazoo and start pimping it out. The only condition Netflix imposes is that Netflix gets a non-exclusive license to use the algorithm in exchange for the prize money, which is eminently reasonable.

In order for an invention to be patentable it must be new as defined in the patent law, which provides that an invention cannot be patented if: âoe(a) the invention was known or used by others in this country, or patented or described in a printed publication in this or a foreign country, before the invention thereof by the applicant for patent,â or âoe(b) the invention was patented or described in a printed publication in this or a foreign country or in public use or on sale in this country more than one year prior to the application for patent in the United States .http://www.uspto.gov/web/offices/pac/doc/general/index.html#whatpat

Re:Wow! Think about how many free man-hours Netfli (1)

Spy Hunter (317220) | more than 5 years ago | (#25863211)

I'm sorry, what is your point? Nothing prevents you from patenting an algorithm *before* using it to win the prize. In fact, according to the section you quoted, you could even wait and file for the patent a year *after* winning the prize and publishing the algorithm.

Re:Wow! Think about how many free man-hours Netfli (2, Informative)

morgan_greywolf (835522) | more than 5 years ago | (#25857023)

Saying that, if you enjoy playing with this, go ahead! Just be honest with yourself about. If you still want to do it, wallow in it. But it's an extremely pernicious thing to do to link this with working on something that is done to benefit everyone. It simply is not the same thing.

Exactly. Working on FOSS is a magnanimous thing to do. You are giving freely to the entire world -- anyone who needs done what your particular code does. It's volunteerism.

When you participate in the Netflix competition, you might not be getting paid, but the work you're doing benefits only Netflix and you -- if you win the $1 million prize, that is. There are side benefits even if you don't win the million dollar prize -- you increase your own abilities in the areas of programming, mathematics, critical thinking, etc.

But that's the only place where there are similarities in working on FOSS, and it's where the similarities end. At the end of the day, doing the Netflix competition is spec work at best. If you exclude the $1m -- which is very cheap, BTW, The sole beneficiary is Netflix.

But if you write something that benefits others in the world who share your problem and distribute that freely to the world, the beneficiary is the entire world, or at least some portion of it.

Re:Wow! Think about how many free man-hours Netfli (1)

Cylix (55374) | more than 5 years ago | (#25857061)

I don't know which company you work for, but despite how much I might save or earn for my company I never receive a million dollars compensation.

Any openings in your uncle penny bags org?

Re:Wow! Think about how many free man-hours Netfli (1)

maxume (22995) | more than 5 years ago | (#25857757)

Congratulations on winning the Netflix prize.

Or are you talking about a hypothetical million dollars? At least your company pays you for the work you do incrementally, rather than for the best answer.

Re:Wow! Think about how many free man-hours Netfli (2, Insightful)

msromike (926441) | more than 5 years ago | (#25858427)

Oops. One small problem that invalidates most of what you said.

First, a company is not a "self." Second, a company is run to profit the people that put capital at risk (i.e. the owners of the company.) Those are the "selves" that are being rewarded. Why shouldn't people that put capital at risk be rewarded for their effort? Third, the company is providing a benefit to everyone. They are providing a good product at a fair price. If they weren't, then it wouldn't be a successful company.

I still contend that the basics are no longer taught in school.

Re:Wow! Think about how many free man-hours Netfli (1)

Eighty7 (1130057) | more than 5 years ago | (#25860413)

Why is it considered selling yourself short if you do work for free for a commercial entity... but not when you contribute to, say, FireFox, ThunderBird, Apache, the Linux kernel, and so forth and so on?

Duh, because they also open source their (paid for) improvements? Its quid pro quo.

Re:Wow! Think about how many free man-hours Netfli (1)

jlarocco (851450) | more than 5 years ago | (#25862957)

Yeah, there's also this line in the article:

They also knew that, as Reed Hastings, the chief executive of Netflix, told me recently, getting to 10 percent would certainly be worth well in excess of $1 million to the company.

If somebody were really smart, they'd develop the algorithm, skip the contest, and sell it to Netflix for quite a bit more than $1 million.

Re:Wow! Think about how many free man-hours Netfli (2, Insightful)

ai3 (916858) | more than 5 years ago | (#25856823)

That's what I was thinking. This shows how a company with good management can really save money. Instead of the standard outsourcing-to-cheap-country thing, you get 30,000 people, many of them very bright and motivated, working on your problem for free, and you only have to pay when they are successful per your definition. What a bargain!

Re:Wow! Think about how many free man-hours Netfli (1)

myxiplx (906307) | more than 5 years ago | (#25857483)

Not only that, but they're getting free advertising off the back of it too. Slashdot have run the story at least twice, and I'll bet the other tech magazines covered it a few times too.

Right now Netflix execs are laughing all the way to the bank, and their competitors are kicking themselves that they never thought of this.

Somebody thoroughly deserves their bonus for thinking of this one.

Ohh, ohh, I've got one... (1, Offtopic)

CarpetShark (865376) | more than 5 years ago | (#25856709)

How about they stop with the silly DRM and/or geoip limitations, and let everyone use the service who wants the service. That way they might get more balanced recommendations from more than just a small part of the world's population.

Re:Ohh, ohh, I've got one... (1)

dmneoblade (848781) | more than 5 years ago | (#25856731)

Licensing issues. I'm sure Netflix would LOVE to do that, but good luck talking the studios into letting them go international with streaming service. Plus, its pretty expensive to go from servicing the US, to the US and Europe.

age and gender would boost it way over 10 percent (0)

Anonymous Coward | more than 5 years ago | (#25856717)

Since they don't have any personal information about the customer, this makes it very difficult. They could try to estimate both categories by looking at their previous rentals but what if they have very few pieces of data?

A teenaged girl is going to like NAPOLEON DYNAMITE much better than a 55 year old man.

So Netflix is purposely making this very difficult.

Re:age and gender would boost it way over 10 perce (3, Funny)

Smauler (915644) | more than 5 years ago | (#25857293)

In the same vein, a 55 year old man is going to like a teenaged girl much better than NAPOLEON DYNAMITE.

Re:age and gender would boost it way over 10 perce (1)

Metasquares (555685) | more than 5 years ago | (#25857425)

In Soviet Russia, NAPOLEON DYNAMITE likes YOU? :)

Re:age and gender would boost it way over 10 perce (2, Insightful)

MikeURL (890801) | more than 5 years ago | (#25857923)

This was saved for the very last page of the story

"Hastings is even considering hiring cinephiles to watch all 100,000 movies in the Netflix library and write up, by hand, pages of adjectives describing each movie, a cloud of tags that would offer a subjective view of what makes films similar or dissimilar."

That is exactly what I was thinking is needed. It would remove all the complexity that is currently being used to come up with, likely, inaccurate movie characteristics. There is a reason that so many websites are allowing users to not only rate items but to tag them too (including /.). Having tags removes the need to guess at the first approximation of the relevant attributes of the item.

Netflix is going to need to add relevant metadata if they really want to improve recommendations and tags would be a good start. In fact, it would probably be helpful to have people rate the top 3 tags for each movie rather than one rating for the entire film.

Damn you, slashdot. (-1, Offtopic)

Fumus (1258966) | more than 5 years ago | (#25856747)

I opened up the RSS in tabs, saw "Incest Still High In t..." and couldn't resist immediately clicking.

And a short things that bugs me. Why does the "In" start with a capital letter, but "the" doesn't?

Re:Damn you, slashdot. (1, Interesting)

nonewmsgs (1249950) | more than 5 years ago | (#25856801)

I opened up the RSS in tabs, saw "Incest Still High In t..." and couldn't resist immediately clicking.

And a short things that bugs me. Why does the "In" start with a capital letter, but "the" doesn't?

i usually do all my code comments like that. it's the same system that book titles use where important words and the first word start with a capital letter and all other important words are as well. in german they use every noun is capitalized and that's an interesting system too.

Re:Damn you, slashdot. (1)

Fumus (1258966) | more than 5 years ago | (#25856969)

I understand this principle, but I'm asking why is a preposition more important than a determiner?

Re:Damn you, slashdot. (0)

Anonymous Coward | more than 5 years ago | (#25858117)

I understand this principle, but I'm asking why is a preposition more important than a determiner?

"The" is an article not a determiner nor a decider, Mr. Bush. ;-)

Re:Damn you, slashdot. (1)

Fumus (1258966) | more than 5 years ago | (#25858613)

Articles are a subgroup of determiners. [wikipedia.org]

Re:Damn you, slashdot. (0)

Anonymous Coward | more than 5 years ago | (#25857783)

You capitalize most of the words in your comments??? I hate you. I really really really hate you, and all the people like you that do the same thing. Comments are there for humans to read. If it is worth explaining, it is worth writing a real actual sentence about it. Most people don't write normal sentences with most of the words capitalized for one reason: It's freakin' hard to read, especially when all the other sentences are that way! So for the sake of the sanity of those who have to maintain your code (and read your comments) after you're gone, stop it!!!

It's fundamentally flawed (1)

Tyrannicalposter (1347903) | more than 5 years ago | (#25856987)

My wife and I don't share much in common in movies. She likes academy award winners, I like "ridiculous stupid movies that I'm to old to be watching". Since we share a netflix account, our incompatable likes and dislikes mess up the predictions.

Re:It's fundamentally flawed (2, Informative)

boyter (964910) | more than 5 years ago | (#25857043)

It doesn't make a difference. If you are using the same account for scoring then you are using the same account for the recommendations. So if the algorithm suggests something your wife will like but you don't it is still successful because for the account in general it gave a good match. Besides, you can actually look into the data more deeply and find accounts like this (not too difficult) and vary your scoring weights to improve accuracy for other people.

Re:It's fundamentally flawed (0)

Anonymous Coward | more than 5 years ago | (#25857299)

You should try the profiles [hackingnetflix.com] feature. It solves that exact problem.

Re:It's fundamentally flawed (0)

Anonymous Coward | more than 5 years ago | (#25857387)

Didn't they just get rid of that feature unless you payed more or payed for 2 accounts?

Re:It's fundamentally flawed (1)

TimTucker (982832) | more than 5 years ago | (#25857675)

On the other hand, assuming that the algorithm works properly, eventually it may start to hit on movies that you BOTH like.

Re:It's fundamentally flawed (2, Funny)

Alpha830RulZ (939527) | more than 5 years ago | (#25862625)

It doesn't matter. The algorithms are focused on what the 'account' will like. They will find that your account likes both Oscar winners and Mel Brooks films.

almost impossible to really win (4, Informative)

mlwmohawk (801821) | more than 5 years ago | (#25856997)

The problem with the Netflix prize, and I myself am working on it :-) is that it is pretty darn near impossible to do better than what they have.

It is based on user ratings and how close you can come to actual user ratings. For instance, their record set has a frozen point in time, you job is to create a system that will accurately predict what another person will rate a movie in the future.

It doesn't take much psychology to understand that these are very subjective values. If you watch a movie on a "good" date, you'll rate it higher than if you watch the same movie with a "bad" date. Then there's the level of drunkenness under which you watch the movie. The day you had at work. How much money you lost in the stock market, etc.

In aggregate, you can come close, but the percentage of variability in the data suggests that Netflix chose their numbers well enough to never have to pay the prize.

Also, the "data" is nothing more than movie titles and obfuscated user ratings. Any sort of contextual or meta data about the movies you have to go find yourself.

It is a fun project on which to work, but I'm dubious of the end prize. I'll keep working on it because its fun, but I have my doubts as to the winability of the contest based on the criteria for success.

Re:almost impossible to really win (4, Interesting)

Cylix (55374) | more than 5 years ago | (#25857053)

If I recall correctly, the last person I remember winning a milestone used an additional data source for rating. (which is fine by their rules)

It's probably going to take an additional data source to improve ratings.

Hey if you do it at least you get a mil ;) It sounds like a worthy hobby in my book.

Re:almost impossible to really win (2, Insightful)

Garse Janacek (554329) | more than 5 years ago | (#25857665)

You recall incorrectly. None of the top teams currently use external data sources (some have tried, but it doesn't help that much once you get up towards the top 10). The last team that (probably, not official yet) won a milestone used the combined predictions of the first and second place teams, interpolated to improve the final score, but nothing external. Same thing in the previous year, the winning team used only computations on the data given.

IAITTT ;)

Re:almost impossible to really win (1)

MikeURL (890801) | more than 5 years ago | (#25857995)

I have not looked into it but can you be certain that the top teams are not using additional metadata on the movies? my guess is that the winning team will come up with some really ingenious method for tagging films with useful predictive info.

It may wind up being something not intuitive (like release month/year, production company, gap score of economic state during release year vs current, or something like that).

Re:almost impossible to really win (2, Informative)

Garse Janacek (554329) | more than 5 years ago | (#25859725)

I have not looked into it but can you be certain that the top teams are not using additional metadata on the movies?

Pretty sure. IAITTT = I Am In The Top Ten ;)

The winning progress prize entry from 2007 had to publish the full details of their algorithm, and they don't use anything. I don't use anything. PragmaticTheory [blogspot.com] even wrote a blog post about how they don't use anything. Others have said the same thing. It's impossible to say that no one will ever come up with a useful way to use metadata, but so far the "metadata" produced by the algorithms themselves is far more accurate than that generated by human observers on the same data.

It may wind up being something not intuitive (like release month/year, production company, gap score of economic state during release year vs current, or something like that)

Well, that's beyond just counterintuitive to actually demonstrably unhelpful -- it seems a priori unlikely that someone's rating would depend on the production company, for example, but even if it was, that would be much more easily detected by the actual movie average (i.e. if a particular production company gets good ratings, then we will know that just because the movie has a lot of good ratings, and the company becomes superfluous). On the other hand, if you're suggesting that specific people have varying opinions of particular companies, well that again seems odd, but again it's irrelevant -- if such a correlation exists, SVD will find it, and so some of the dimensions of user-movie vectors will correlate to production company.

Similar with the other properties you mention: since SVD is already finding *all* of the (linear) correlations in the data, it's not very helpful to try to come up with a huge list of farfetched ones yourself hoping one of them will work out...

Re:almost impossible to really win (1)

MikeURL (890801) | more than 5 years ago | (#25859889)

But isn't farfetched what you're looking for? When prepping the data for analysis it would probably be helpful to know if the moon phase impacts ratings. If it were found that a correlation existed between moon phases and how certain users rate then that would help prep the data before SVD is run. Perhaps rating would be adjusted so that everyone has a "moon adjustment".

I'd be trying to control for every large externality possible before even beginning to clean the data for self-contained anomalies. http://aa.usno.navy.mil/data/docs/MoonPhase.php [navy.mil]

Re:almost impossible to really win (2)

Garse Janacek (554329) | more than 5 years ago | (#25860161)

But isn't farfetched what you're looking for?

Well, no, "accurate" is what you're looking for ;)

But part of what I was getting at is that it doesn't matter how farfetched it is, SVD has no preconceptions (or at least, only the minimum needed for accuracy), it just finds the correlations themselves. If a correlation isn't detectable in the data, then it doesn't matter how plausible or farfetched something is, it won't be useful. If it is detectable in the data, then again, farfetched or not doesn't matter. Time is better spent modeling different ways for functions to correlate and then applying SVD to that than it is in coming up with as many hard-coded correlations as possible...

First bit here (2, Funny)

Anonymous Coward | more than 5 years ago | (#25857107)


function get_rating( int movie_flags )
{
        int rating;

        rating = 0;

        if ((movie_flags & MADE_BY_RIAA_MEMBERS) == 0)
        { /* todo */
        }

        return rating;
}

Multi discipline rating (5, Interesting)

Coolhand2120 (1001761) | more than 5 years ago | (#25857219)

I've used netflix on video over the internet for a year or two now. The way to solve the problem is to break the star ratings up into a few different categories. You can always leave an "overall" rating for the lazy people, but if someone really wants netflix to "get to know them" they need to be more specific about what they like in the movie.

Right now neflix tries to infer what it was in the movie you liked by looking at other movies. Why not just ask what they liked about the movie.

For instance, I'm very concerned about the production quality in a movie. The movie may have the best plot ever and great actors but it was shot on a home VHS camera. I would give the movie a 1 star because the production quality was so bad, on the other hand someone who likes plots may have rated it a 5 star. Now netflix will never know if I rated it 1 star because I don't like the genre or don't like the acting or the cinematography. It just sees I rated the whole movie as a 1 and any movies that have similar elements then lose their importance on my personal ratings. If I could tell netflix: don't show me movies shot on a VHS camera (e.g.: production 1 star) then I could tell netflix I love the genre, love the plot hate the production.

A good example is Blood Ryane - this movie absolutely sucks (insert government sponsored movies jab here), but I like the genre - now if I give this one star, as it deserves, netflix will think I really don't like the... whatever, it's most likely going to be wrong about it because it's pure conjecture.

I'm not a big movie nerd so I wouldn't be the best person to come up with the rating categories, but I'll give it a shot since this will never occur:

1. Production Quality
2. Plot
3. Directing
4. Acting
5. Genre

Of course this will never happen because netflix will not change their system to conform to my random idea on slashdot. And by this sentence I've just about exhausted all my interest in the subject.

One last comment: Why are all the online netflix movies so craptastic? Really, if it wasn't made 15 years ago, and it's in the "watch instantly" section, then it must really suck. They had a movie on there called "merc force" .... OMG! The special FX were done with PBRUSH, and they used the microphone that was built into the directors handy cam the whole time. Yes, it was that crappy, I actually had to show this movie to other people so they would believe me. I'm not a producer or anything, but I could shit on a paper plate and kick it against a clean white wall, and that would make a better movie. Merc force.... I will never forget you.

Spirit of a movie (1)

AlpineR (32307) | more than 5 years ago | (#25857615)

I agree that there's a big factor in ratings beyond genre, plot, and actors. Some of the quirky movies they mentioned like Napoleon Dynamite, Life Aquatic, and Lost in Translation have a certain something that makes them good movies (to me) beyond those factors.

Maybe you could link some movies by who mixed the audio or designed the costumes. Maybe you should give more weight to where a director is in their career than to who the director is. Or maybe it's something else that defines the spirit of a movie.

There are probably some customers who are strongly in tune with that factor and other customers who will love a movie with an interesting plot regardless of how crappy the flow and mood is. I reckon that discerning between those customers will help with the Napoleon Dynamite problem. But it would be hard to address it with multidimensional stars if we don't even know what to call it.

Re:Multi discipline rating (0)

Anonymous Coward | more than 5 years ago | (#25858373)

Do you think somehow you're unique? Odds are you'll not be the only one to view films like you do, or like the same qualities in their comedy/romcom/action/whatever. Hust marking a film down as 1 doesn't mean "i hate this genre" just "I think this film was crap" odds are rather high that others hold the same preferences as you, so the algorithm will see that others who gave that film 1 star also gave x and y of the genre 5 stars, and still recommend it. "those who hated z, loved x!" only if you start hating all the films in the genre will it drop off ratings as your synchronicity with other fans fades.

Re:Multi discipline rating (1)

mveloso (325617) | more than 5 years ago | (#25858563)

You mean rating a movie on the different metadata dimensions. That's tough, because that information may not exist. It's a data problem, not a math problem. That's probably why ebay asks all those extra questions now when you buy something.

If the Napoleon Dynamite movies are such outliers (like Buckaroo Banzai), they should just special-case them and move on. With a data set this big, there's no point in having a catch-all algorithm, especially if the outliers are such a big problem. Instead, the code should recognize the outlier and just flip a (virtual) coin as to the rating (or shunt it off somewhere else).

Re: the problem with outliers (2, Informative)

Anonymous Coward | more than 5 years ago | (#25860365)

The outliers are a major problem, but you can't just ignore them and move on. Collectively they add up to most of the error.

The training data set includes 116,362 user ratings of Napoleon dynamite; the distribution is:

  • 1: 13,365 = 11.5%
  • 2: 15,790 = 13.6%
  • 3: 27,216 = 23.4%
  • 4: 31,115 = 26.7%
  • 5: 28,876 = 24.8%

The weighted average of these ratings is 3.4, and the math works out that when you only guess one value, the RMSE minimizes at the average. So in this case, a guess of 3.4 on all of those ratings gives you a 1.3025 RMSE for the data shown above. Most movies have an RMSE below 1.1.

Now suppose we try to refine our guess by using a coinflip method. In this model, we can look at the split 12/345 and assign ratings of 25%@1.54 and 75%@4.02. But what happens when we apply these without having any knowledge of which category each person falls? We end up doing worse! The problem is that even though you're only giving a 1.54 a quarter of the time, 3/4 of that 1/4 you're guessing a 1.54 for someone that actually ranked it a 3, 4, or 5. The error for 5 is especially bad, since 5 - 1.54 = 3.46, and then you have to square that! Overall, across the distribution, a guess of 1.54 ends up having an RMSE of 2.27, and the guess of 4.02 has an RMSE of 1.44. Applied together at 25% and 75% respectively you'll get sqrt(25% * 2.27^2 + 75% * 1.44^2) = 1.69 RMSE. Alternately we could use the 123/45 split: 48.5%@2.25 and 51.5%@4.48, but that turns out worse still since you'll end up with sqrt(48.5% * 1.74^2 + 51.5% * 1.69^2) = 1.71 RMSE.

The qualifying set asks for 10,551 guessed ratings of Napoleon dynamite out of 2,817,131 guessed ratings total. So if you can't figure out anything else about the ratings and have to go with the median vote, your error will include 10,551 * (1.3025)^2 = ~17,900 SSE (sum of squared error) from Napoleon Dynamite alone. The coin flip methods mentioned above would give over 30,000 SSE.

To put this in greater perspective: To win $1e6, you need to get below (0.8563)^2 * 2,817,131 = 2,065,660 SSE. The current leader has (0.8616)^2 = 2,091,310 SSE, and the 10th place team has (0.8677)^2 * 2,817,131 = 2,121,027 SSE. Thus the leader is only 25,650 SSE away from the prize, and the 10th place team is only 29,717 behind that at 55,367 SSE away.

So if the leaders were all using 3.4 as their guess for Napoleon Dynamite, and then they suddenly figured out a way to reduce the RMSE of their guesses for that one movie to 0.86, they'd be able to knock off 10,000 points of SSE -- just for the one movie. That's why they're so interested in "solving" the problem with outliers. However, odds are that they're already guessing in the 0.95 to 1.05 RMSE range for Napoleon, based on connections they've deduced about how each individual rated other movies.

Where are the Google super-brains? (1)

Bearhouse (1034238) | more than 5 years ago | (#25857513)

I'm sure they'd be interested in this - after all, a similar problem exists for YouTube/GVideo rankings, (which are not great, based on my experience). Of course, if they have/ever do crack it, they'd hardly hand it over to Netflix.

Maybe it's the 'Napoleon Dynamite' factor mentioned in the article - perhaps humans are too strange to be completely understood after all. Good news if you've been wearing a tinfoil hat ever since seeing 2001 or Terminator...

Re:Where are the Google super-brains? (1)

RAMMS+EIN (578166) | more than 5 years ago | (#25858475)

``perhaps humans are too strange to be completely understood after all.''

I'm sure of that, but the beauty is that you don't necessarily need to understand everything to come up with a working algorithm.

It's now a competition among corporations (0)

Anonymous Coward | more than 5 years ago | (#25857651)

As one individual, I spent a lot of work on the contest at the beginning, but I'm now thankful I didn't waste more time, given that there are corporate entries such as ATT to compete against (with all their resources and database staff to assist in the contest). It must be nice to be employed and working on something interesting like this during "work" hours while the rest of us have to do normal (i.e., boring) IT work for our employers.

Re:It's now a competition among corporations (1)

Garse Janacek (554329) | more than 4 years ago | (#25867051)

I'm now thankful I didn't waste more time, given that there are corporate entries such as ATT to compete against (with all their resources and database staff to assist in the contest).

The ATT people are an extreme case, but even so there are multiple amateurs within close range of them -- as of a few days ago, at least one amateur team (PragmaticTheory) was beating them.

YA lets code for the mpaa (0, Flamebait)

CHRONOSS2008 (1226498) | more than 5 years ago | (#25858047)

losers

Re:YA lets code for the mpaa (0)

Anonymous Coward | more than 5 years ago | (#25858271)

I can't code but I like to play with mah balls doot doot doot!
Check for New Comments
Slashdot Login

Need an Account?

Forgot your password?