Frequent Slashdot contributor Bennett Haselton writes "A 2006 paper by Matthew Salganik, Peter Dodds and Duncan Watts, about the patterns that users follow in choosing and recommending songs to each other on a music download site, may be the key to understanding the most effective form of "censorship" that still exists in mostly-free countries like the US It also explains why your great ideas haven't made you famous, while
lower-wattage bulbs always seem to find a platform to spout off their ideas (and you can keep your smart remarks to yourself)." Read on for the rest of Bennett's take on why the effects of peer ratings on a music download site go a long way towards explaining how good ideas can effectively be "censored" even in a country with no formal political censorship.
In a country where you're free to say almost anything in the political arena, I think the only real censorship of good ideas is what you could call "censorship by glut". If you had a brilliant, absolutely airtight argument that we should do something -- indict President Bush (or Barack Obama), or send foreign investment to Chechnya, or let kids vote -- but you weren't an established writer or well-known blogger, how much of a chance do you think your argument would have against the glut of Web rants and other pieces of writing out there? Especially if your argument required people to read it and think about it for at least an hour? Perhaps your situation could be compared to that of a brilliantly talented band submitting a song for Matthew Salganik's experiment.
What Salganik and his co-authors did was recruit users through advertisements on Bolt.com (skewing toward a teen demographic) to sign up for a free music download site. Users would be able to listen to full-length songs and then decide whether or not to download the song for free. Some users were randomly divided into eight artificial "worlds" in which, while a user was listening to a song, they could see the number of times that the song had been downloaded by other users in the same world -- but only by other users within their own world, not counting the downloads by users in other worlds. The test was to see whether certain songs could become popular in some worlds while languishing in others, despite the fact that all groups consisted of randomly assigned populations that all had equal access to the same songs. The experiment also attempted to measure the "merit" of individual songs by assigning some users to an "independent" group, where they could listen to songs and choose whether to download them, but without seeing the number of times the song had been downloaded by anyone else; the merit of the song was defined as the number of times that users in the independent group decided to download the song after listening to it. Experimenters looked at whether the merit of the song had any effect on the popularity levels it achieved in the eight other "worlds".
The authors summed it up: "In general, the 'best' songs never do very badly, and the 'worst' songs never do extremely well, but almost any other result is possible." They also noted that in the "social influence" worlds where users could see each others' downloads, increasing download numbers had a snowball effect that widened the difference between the successful songs and the unsuccessful: "We found that all eight social influence worlds exhibit greater inequality -- meaning popular songs are more popular and unpopular songs are less popular -- than the world in which individuals make decisions independently." Figures 3(A) and 3(C) in the paper show that the relationship between a song's merit and its success in any given world -- while not completely random -- is tenuous. And if you're a talented musician and you want to get really depressed about your prospects of hitting the big time, Figures 3(B) and 3(D) show the relationship between a song's measured merit and its actual number of sales in the real world. (Although those graphs may cheer you up if you're a struggling musician who hasn't made it big yet -- maybe it's not you, it's just the roll of the dice.)
As the Richard Thaler and Cass Sunstein put it in their all-around fascinating book Nudge , where I first read about the Salganik study:
In many domains people are tempted to think, after the fact, that an outcome was entirely predictable, and that the success of a musician, an actor, an author, or a politician was inevitable in light of his or her skills and characteristics. Beware of that temptation. Small interventions and even coincidences, at a key stage, can produce large variations in the outcome. Today's hot singer is probably indistinguishable from dozens and even hundreds of equally talented performers whose names you've never heard. We can go further. Most of today's governors are hard to distinguish from dozens or even hundreds of politicians whose candidacies badly fizzled.
Is the blogosphere, or the "marketplace of ideas" in general, any different? If a random
sample of bloggers were rated based on some independent measure of merit -- for example, independent
ratings from a random sampling of blog readers, who were looking at the bloggers' writing samples
for the first time, analogous to users in Salganik's "independent" world --
and then correlate that with the bloggers' traffic or some other measure of success,
it's not hard to imagine the results would be similar to those of the 8-worlds
experiment: the best often rise to the top, the very worst rarely do, but success in the vast middle
would be close to random. In fact, while music listeners would have no logical reason to
like a song just because others did, users in the blogosphere and other public forums
would have several rational reasons to cluster around writers who are
already popular: (1) errors are more likely to have been spotted and pointed out by someone else;
(2) as an extension of that, others are more likely to have provided comments and other value-added content;
(3) if you are the first person to spot an error, it's more important on a popular blog
to point out the error and stop the misinformation from spreading, than on a minor blog that nobody
has ever heard of. So the "snowball effect" of popularity in the blogosphere would be even more
Then why do so many people believe in what Thaler and Sunstein call the "inevitability" of success based on merit, in domains like music, politics, and writing? I think it's because the belief is what scientists call an unfalsifiable one -- if the "best" acts are assumed to be the ones that end up on the top of the pile, then the marketplace has always sorted the "best" content to the top, by definition. Since the definition is circular, the premise could never be disproved by any amount of counter-evidence -- even if an act that used to be popular, suddenly falls under the radar, that could be seen as "proof" that they lost whatever magic touch they used to have, not as evidence of the arbitrariness of the market! The only disproof would be an artificial experiment like Salganik's, showing that once you get beyond a certain threshold of quality, commercial success has little relationship to independently measured merit -- but such experiments, which in Salganik's case required the cooperation of over 14,000 users, don't come along very often. And as long as most people don't realize how arbitrary the existing marketplaces are, there isn't enough demand to justify building a system that could work better -- indeed, to even justify asking the question of whether a system could be designed that would work better.
And that, I think, is how "censorship by glut" really works. It's not just the sheer amount of written content that censors small voices -- if you happen to know about a particular writer that you consider a fount of wisdom, then the existence of a billion other Web pages won't stop you from reading that writer's content. And it's not as if there aren't plenty of people who realize that success can be highly arbitrary. The problem is that as long as most people assume that the existing marketplace of ideas does a good job of sorting the best content to the top, then they'll be more inclined to stay with the most popular news sites and blogs, and even the minority who know that it's largely a lottery, will have no effective way of finding the best content among everything else, so they'll end up sticking with the most popular sites as well. Worse, as a secondary effect, most people with something useful to contribute won't even bother, if they don't already have a large built-in audience. I know plenty of people who could write insightful essays about social and technological issues, essays that would give most readers a new perspective such that they would definitely say afterwards: "That was worth my time to read it." But it wouldn't be worth it to the writers, because they know that their content isn't going to get magically sorted into its deserved place in the hierarchy.
(My own favorite blog that nobody's ever heard of is Seth Finkelstein's InfoThought, which is usually logical and insightful and is only about 25% of the time about how "nobody ever reads this blog, so what's the point". His Guardian columns are also good and usually don't have that subtext, perhaps because it's considered impolite to use a newspaper's column-inches (column-centimeters?) to complain that you have no voice.)
So can this problem be avoided, or is inequality and arbitrariness just a permanent part of the marketplace for content and ideas? You could create an artificial world that would sort user-submitted content according to some other algorithm -- and even if it didn't give good writers the fame that they theoretically deserved in the larger world, it might still provide them with enough of an audience within the artificial universe, to make it worth their time to keep writing. One option would be to use Salganik's "independence" world model, where users would read content without being able to see the ratings that other people had given to it, or without even seeing recommendations from similarly-minded friends within the system. The trouble is that without any information about what other readers liked, without any starting point to sort good content from bad content, it may not be worth the reader's time to read through all the dreck to find the occasional buried treasure. I believe about as strongly as a person can believe, that the existing marketplace for content is far from meritocratic, for example that there are probably thousands of songs on iTunes that I've never heard of but would nonetheless love -- but even I don't spend time listening to the 30-second clips of random songs on iTunes, because it takes too long to find the stuff I would like.
But I submit there is a solution -- a variant of an argument that I've suggested for stopping cheating on Digg, or building Wikia search into a meritocratic search engine, or helping the best writers rise to the top on Google Knol. The solution is sorting based on ratings from a random sample of users. The remainder of this speculation will be very theoretical, and will at times seem like a Rube-Goldberg approach to what should be a simple problem. But at each juncture, the complications to the algorithm are motivated by an argument that anything simpler would not work. At many points along the way, it will be tempting to throw up one's hands and say, "Why go to all this trouble, the existing system works well enough." But this statement is hard to quantify with any actual evidence -- unless you're just using the circular definition above, that whatever rises to the top is automatically the "best".
For music listeners, the gist of the algorithm is: When an artist submits a new song in the alt-rock category for example, the song is distributed to a random sample of 20 users who have indicated an interest in that genre. If the average rating from those users is high enough, the song gets recommended to all of the site's users who are interested in alt-rock. If the average rating is not high enough, then the artist receives a notification, perhaps with a list of comments from the listeners suggesting what to improve. As long as the initial random sample of users is large enough that the average rating is indicative of what the rest of the site's alt-rock fans would think, the good content will get to be enjoyed by all of the site's alt-rock customers, while the bad content would fizzle after only wasting the time of 20 people. If it turns out that a random selection of 20 users are typically too lazy to rate the songs that are submitted to them, you could even make artists submit $10 to have their songs rated by the focus group, and pay each of the 20 raters $0.50 each for their trouble. Artists can't withhold payment as revenge for a bad rating, so the average ratings should still be proportional to the song's actual quality.
At this point, you might object that this system suffers from the same unfalsifiable, circular reasoning as the belief that the marketplace rewards the "best" content, if the best content is the content that wins in the marketplace. If I define the "best" content to be the content that gets the highest average score in a random focus group, then of course this algorithm sorts the best content to the top, because that's how "best" was defined! But this system does actually have a non-trivial property: If you implement the system in multiple separate "worlds" (similar to those that Salganik created), then provided your focus groups are large enough to provide representative random samples, the same content should rise to the top in each of the worlds, unlike the results in Salganik's experiment.
This actually wouldn't be the case if the initial focus groups were not big enough -- then random variations in a few voters' opinions could cause many songs to succeed in one world and fail in another. So it's a non-trivial property that is not automatically true, and would not be true if you made an error in designing the system, like making the focus groups too small. But the larger the size of the random sample, the smaller the variance in the expected value of the average of their ratings, and the greater agreement you would expect between the results from different worlds.
As Salganik pointed out to me, this system does under-reward songs that might require repeated listenings over time to gain an appreciation of their qualities. But even this, strictly speaking, can be modeled in exchange for cash -- I'll pay 20 users $2 each if they listen to my song once today, once in three days, and once again a week after that (the site could stream the song to them to provide at least some likelihood that the users weren't cheating). This assumes some things, such as that repeated exposure has the same growing-on-you effect even if the exposure is forced -- but in the real world, songs often grow on you from repeated listenings that are "forced" anyway, if they're played in the doctor's office or on the radio when you don't bother to change the channel. And this might be more complicated than necessary -- often when a song grows on you, it at least interests you enough the first time you hear it, that you'd give it a positive rating on the first listen, which is all that the site requires for the song's success.
However, if you try to adapt this trick to a meritocracy for written content, you run into different problems. With a song, if you poll a random sample of users, the odds are very small that anyone being polled will be a vested interest in the success of the song, like one of the band members or one of the song's producers (assuming the population of users is large enough, and the song's producers have not been able to create a huge number of "sockpuppet" accounts to manipulate the voting). So you can assume the ratings will be free of any prior bias. But with a political post, for example, if you write a pro-Bush or anti-Bush essay, it's quite likely that among a random sample of users, there will be people who are biased to vote up (or vote down) any post that has anything good to say about the President. The essays voted to the top may not be the best-written ones, but simply the ones that pander to the most popularly held opinions.
But if the "best" essays are not the ones that receive the highest percentage of positive votes, even when polling a random sample of independent users -- which I was advocating as the gold standard for measuring merit -- then how do you define what makes the "best" essays, anyway? There are many possible answers, but I suggest: A necessary condition for being among the "best" essays would be to convince the most people of something that they didn't believe before, without resorting to tricks such as blatantly fabricating statistics or attributing made-up quotes. This is not a sufficient condition for merit -- maybe the point of view that you're convincing people of, is still wrong -- but I submit that if you're not at least changing some people's minds, then there's no point. An essay that changes a lot of people's minds in a random focus group, is usually worth reading, if only to see why it has that effect.
Unfortunately, this doesn't suggest a better way to poll users about the merit of an essay, because if you ask users, "Were you a Bush supporter before reading this essay?" and "Were you a Bush supporter afterwards?", Bush supporters are eventually going to figure out that the way to give the essay a high score on the mind-changing scale, would be to (falsely) say that they were not a Bush supporter before reading the essay, but they were one afterwards. So you'd still end up rewarding the essays that reinforce pre-existing opinions instead of the ones that change people's minds.
From here the counter-measures and counter-counter-measures get increasingly complicated. For each category of essays that a user wants to rate, such as Bush opinion pieces, you could require new users to enter their current opinion: either pro-Bush or anti-Bush. Then if they were asked to rate a pro-Bush essay, they would only be able to vote that the essay "changed their minds" by switching their registered opinion from "anti-Bush" to "pro-Bush". But Bush supporters could sign up initially as anti-Bush, just in the hopes of being part of a random focus group so they could cast their mind-changing vote for a Bush essay by changing their registration to "pro-Bush"! However, each user would only be able to do that once -- or do you allow users, after they've switched from anti-Bush to pro-Bush, to "reload" by spontaneously switching back to anti-Bush for no reason at all, so they're all set to cast a mind-changing vote for the next pro-Bush essay? Or would they only be allowed to switch back to anti-Bush, by casting a mind-changing vote as part of a random focus group for an anti-Bush essay -- thus giving a boost to an anti-Bush screed, as part of the price they pay for the next vote they cast for a pro-Bush piece? Then users could still game the system, by switching to "anti-Bush" when casting a vote for a very poorly written anti-Bush essay that they don't think anybody else will vote for anyway, and then switching back to "pro-Bush" only for the good essays that have a shot, hoping that their votes will coalesce around the decently-written pro-Bush essays and push them to the front page...
Am I over-thinking this? I submit this is an area where there's been too much under-thinking. Haven't we all been tempted to believe that the marketplace of ideas -- not to mention bands, blog posts, and business ventures -- efficiently sorts content to the place in the hierarchy of rewards that it deserves, without having any real evidence for this, except the circular definition of "quality" as being proportional to success? And the more people believe this, the more that marginalized voices will effectively be censored, even when they have something brilliant to contribute. We should at least think about ways that we could do better. Or else, prove logically that it can't be done (a logical proof can only approximate the real world, but it could show that such a pure meritocracy would be very improbable, or wouldn't work well). However I think the ideas above make it seem unlikely that a meritocracy is logically impossible. Maybe they're a step in the right direction. Maybe someone else's ideas would be better. The important thing is that a meritocratic algorithm be judged by something other than a circular definition, which simply decrees by fiat that the winning content is the best.