Frequent contributor Bennett Haselton writes "With the announcement of Verizon's "six strikes plan" for movie pirates (which includes reporting users to the RIAA and MPAA), and content companies continuing to sue users en masse for peer-to-peer downloads, I think it's inevitable that we'll see the rise of p2p software that proxifies your downloads through other users. In this model, you would not only download content from other users, but you also use other users' machines as anonymizing proxies for the downloads, which would make it impossible for third parties to identify the source or destination of the file transfer. This would hopefully put an end to the era of movie studios subpoenaing ISPs for the identities of end users and taking those users to court." Read below for the rest of Bennett's thoughts.
Now, I'm not advocating the creation of software that enables piracy. And I don't mean that in a nudge-wink kind of a way, I'm serious: I think people should reward movie studios for making content that they like, if only because that means studios will make more of that type of content. For my last cross-country flight I paid an honest-to-God four dollars to download a movie from Amazon Unbox to watch on the plane, even though I fondly like to think of myself as smart enough that I could have figured out how to find and download the movie for free. (Well, not all that smart; the movie was Lockout.)
However, the idea of users anonymizing each others' downloads is so elementary, that I literally mean it's inevitable that we will see the rise of such software. Whether I'm in favor of it or not, it's going to happen. In fact, under certain assumptions, there's really only one logical direction that it can evolve in.
First, some background. Under the current BitTorrent protocol -- with no built-in support for anonymization -- some server S makes a large file available for download. When the first downloader, say user D1, requests a copy of the file, they have to begin the process of downloading it from S. But when the next downloader, say user D2, requests a copy of the same file while user D1 is still downloading, the BitTorrent server S tells D2 to start downloading the file from D1 instead of from S directly. (D1 is required at this point to share out the file for download, in order to earn enough "credits" to continue downloading from S.) Subsequent downloaders are similarly told to download from other downloaders instead of from the original server S. In this way, the server S avoids incurring massive bandwidth charges (since S only actually served the file one time), and each user on average only has to share out the file once in return for downloading it themselves.
Note that this still means that in order to initiate the download, the server S has to serve out the whole file at least once, to the first downloader -- and if the file is being distributed without the copyright owner's permission, then the operators of server S can be taken to court. This legal pressure was the reason that the Pirate Bay switched from serving BitTorrent files to serving magnet links, which enable users to download content purely from each other, without the Pirate Bay ever actually serving the content themselves. But with both BitTorrent and magnet links, users who are downloading content from other users, can see those other users' IP addresses -- and they know that those other users are serving the content from files stored on their own hard drives. This means that if you're the copyright owner of that content, you can subpoena the identities of the users behind those IP addresses, and taken them to court for unauthorized possession and distribution of copyrighted material.
So what would a protocol look like with built-in support for anonymization? In my first draft of an idea, I thought that each download could take place using one intermediate user as a proxy, so that instead of server S telling D2 to download from D1, the server would tell D2 to use download D3 as a proxy, and tell D3 to proxy the connection from D1. (As with BitTorrent, the downloader D3 would be required to allow their machine to be used as a proxy, in order to earn credits to continue with their own download.) So D1 would not be able to see the IP address of user D2 downloading from them, and D2 would not be able to see the IP address of user D1 that they were downloading from. Both of them would be able to see the IP address of user D3 which is acting as the proxy between them, but as long as it's not against the law to simply proxy a connection for someone else, that would not be grounds to subpoena the user D3's identity. And D3 would be able to see the IP address of D1 and D2, but if the D1 and D2 are communicating using a shared encryption key, then D3 would have no idea what content is flowing between D1 and D2, even as it proxies the connection between them. So even if one of D1, D2 or D3 were an "adversary" (i.e. a copyright holder intent on suing illegal file sharers), none of the three would be able to see the IP address of another user that they knew was either downloading particular content, or serving it out.
Of course you could also argue that if D3 is among the users that server S is making available to others as an anonymizing proxy, then that constitutes proof that D3 must be downloading something else from S (otherwise, D3 wouldn't need to earn credits by acting as an anonymizing proxy), and if either D1 or D2 is an adversary, they can see D3's IP address and reason that D3 must be guilty of some copyright violation. Similarly, if D3 is the adversary, they can see D1 and D2's IP addresses and reason that both of them are probably guilty of some copyright infraction, even if D3 can't actually see what they're trading. Basically, anybody could be considered "guilty by association" simply by virtue of being in the community of users being coordinated by server S. But (1) that accusation could be deflected if some of the files being served by S were in fact legal and being distributed with the copyright holder's permission; and (2) in any case, the Digital Millennium Copyright Act requires you to claim that your specific copyrighted content is being distributed by a user, before you can unmask that user's identity; it's not enough to claim that the user is part of a network that distributes "some" copyrighted content illegally. D3 may be proxying a connection between D1 and D2 in order to earn credits so that D3 can download some content for themselves, but even though D1 and D2 can both see D3's IP address, there's no way for them to know what D3 could be downloading.
Unfortunately, this three-user-chain idea is not secure, because an adversary could still create a large number of users co-ordinated through server S, and sooner or later, a chain would arise where both the proxy and the downloader controlled by the adversary, and at that point, they would know the IP address of the user serving out the copyrighted content. In other words, eventually you'll get a situation where D2 is downloading content from D1 by going through proxy D3 -- but where D2 and D3 are both controlled by the adversary. So D2 knows the content that's being downloaded via D3, and D3 knows the IP address of D1 that's actually serving out the content -- at which point they can subpoena the identity of user D1, and sue them.
So consider this idea instead: When user D1 sends a request to server S to download a file, server S gives them the IP address of another user, D2, from which they can download the file. Now, 40% of the time, user D2 actually does have the file on their hard drive and is serving it to user D1, with no proxying. The other 60% of the time, user D2 is told by S to proxy the connection from D1 and connect to a third user, D3. Now in 40% of these cases, D3 actually does have the file and is serving it out directly; the other 60% of the time, D3 is proxying the connection for yet another user, D4...
So you end up with chains of varying length, with longer chains having a progressively smaller probability of forming:
40% of chains will be of length 1 (one user downloads directly from another)
60% x 40% of chains (24%) will be of length 2
60% x 60% x 40% of chains (14.4%) will be of length 3
60% x 60% x 60% x 40% of chains (8.64%) will be of length 4 etc.
These proportions of course sum to 1, and a little math shows that the length of the average chain is 3.5 nodes. The number of downloads in a chain -- the connections between users -- is one less than the number of nodes in the chain, so this means that to complete one download, the content will have to be transferred an average of 2.5 times -- compared to being transferred only once, when one user downloads from another directly. In order to ensure that users contribute enough to the system as they take from it, that means that in order to download a file, users would be required to provide enough "proxying" to support the equivalent of 2.5 full downloads of that same file.
These chains have a useful property: any time you're downloading content "from" another user, there's only a 40% chance that user is serving content off of their own hard drive, and a 60% chance that they're proxying the connection from somewhere else (another node that may in turn be proxying the connection from yet another node, etc.). So even if the adversary controls three nodes D1, D2, and D3, and D1 is downloading from D2 who is downloading from D3 who is downloading from D4 (and D4 is not controlled by the adversary), from the adversary's point of view there's only a 40% chance that D4 is actually originating the content. This is always true no matter how many nodes in the chain the adversary controls -- in the end, if they want to nail someone for serving out copyrighted content, they have to download the content from some node that they don't control, and there will only be a 40% that user is actually serving the content from their hard drive.
And the 40% number was deliberately chosen in order to weaken the adversary's legal grounds for subpoenaing the identity of the user they're downloading from -- even if they can show that they downloaded content from another user's IP address, it's more likely than not that the other user was not actually hosting the content. (Of course, there might be other details in context that render that probability calculation useless. For example, if the server S only links to one downloadable file, then all users coordinated by that server S are presumably downloading that same file, and anybody that server S connects you to, can be presumed guilty of downloading and sharing that file, 40% figure be damned.)
At this point you might also wonder: Why not just connect over a protocol like Tor, which provides secure anonymity for all transactions, and then use BitTorrent or some other file-sharing system on top of that? The answer is that Tor's connection is likely to be much slower, for at least two reasons. First, Tor servers are a limited resource, and the more people use them (especially for large file trading), the slower they are likely to become. (By contrast, in the peer-to-peer proxying model outlined above, every new downloader can also be made to act as a proxy for other users, so additional users don't slow down the system because they contribute as much as they take out of it.) Second, Tor always routes your connection through multiple servers to guarantee secure anonymity, which means it would be slower on average than the variable-length chains described above, where only about 20% of chains are of length 4 or more.
The key difference is that Tor provides true anonymity whereas the protocol above only provides plausible deniability. In high-risk settings where Tor is often used, it would not be acceptable if there were a 40% chance of your IP address being revealed to your adversary. But for file sharing, the 40% figure might be acceptable if it's just low enough to stave off a subpoena. This trade-off makes it possible to use shorter chains, resulting in faster downloads and less total bandwidth consumption.
You also already have the option today of using a VPN service to download files through an anonymous third-party connection, which renders the rest of these issues moot. But users have to jump through several hoops (and pay some money) to set this up as an option, which means that most users will not be using VPNs any time soon, leaving plenty of naive users for the RIAA and MPAA to go after. The use of peer-proxying links would mean that all users downloading through the system would be protected.
At the moment, the major impediment to a peer-proxying system like this would be that the chained downloads would still consume an average of 2.5 as much bandwidth as direct peer-to-peer downloads. Even with today's high-speed connections, this increase in inconvenience is great enough that some users might just prefer to use plain old BitTorrent to download files directly from peers, and run the (admittedly small) risk of getting in trouble. But as bandwidth speeds continue to grow literally exponentially, eventually the difference in inconvenience will be so small, that users would be foolish not to use proxified downloads if it provided free legal protection.
Note that the viability of this system does depend on the ISP's attitude towards it. In particular, if your ISP only goes after pirates because of legal pressure from content holders, then if the ISP's users are using this peer-proxying protocol instead of a direct download protocol like BitTorrent, then the ISP can quite truthfully claim that they don't have any hard evidence to disconnect any particular users or turn over their identities (because the ISP doesn't know which users are actually storing pirated files and which users are just acting as proxies). On the other hand, if your ISP sincerely wants to stop piracy because your ISP is also a content company (Comcast, for example), then they might also try to squelch the use of any protocol that enables piracy, even if they can't prove that any particular users are using it for anything illegal. Thus Comcast might try to slow the use of the peer-proxy protocol. But in that case they could be forced by Net Neutrality regulations to stop throttling it, in the same way that the FCC ordered Comcast to stop throttling BitTorrent.
As long as those conditions hold true -- content owners continue cracking down on file sharers, but proxying remains legal and bandwidth keeps getting cheaper, and ISPs are restrained from blocking the protocols themselves -- I think that p2p will have to evolve into something like the chained-download system described above, to provide plausible deniability to users, without resorting to the long chains (and subsequently slower downloads) provided by full-anonymity systems like Tor.
But again, I'm just saying it's inevitable, not that it's right. I actually do wish that people would pay the studios' prices for the movies that they watch; part of it is that I think most blockbusters are actually pretty good and deserve to make money. When you refuse to pay for movies, you're casting a vote against fun, big-budget movies that are made for the purpose of getting lots of people to come see them and enjoy them, and instead voting in favor of excruciatingly boring low-budget films that are made primarily so that the director could whine that the cheese-puff-snarfing American public wouldn't know great art if it bit them on their big bloated behind and subsequently didn't even buy enough tickets for the director to pay off the lien he took out on his Honda Civic to get the movie produced. Forget prosecution and civil suits; just make movie pirates sit through The Brown Bunny.