Three days ago, a small group called Interhack was featured in an AP wire story about some curious data transmission they'd found. The company receiving the data, Coremetrics, tracks unique visitors through its clients' corporate websites, and promises those clients "seamless performance," because: "data tags load invisibly as small transparent gifs, and information is encrypted to appear invisible to your customer." The customer is you, the user. The GIFs are web bugs. The information can be personally identifying, which most of its clients' privacy policies fail to mention. But -- importantly -- the company promises that "Any data Coremetrics tracks and reports is owned solely by our customers and we are contractually precluded from reselling or using this data." Is that enough? Emmett and I talked both to Coremetrics and to the hackers who put the spotlight on them.
Emmett Interviews Interhack
Slashdot: For those uninitiated, what's interhack all about?
Basically, we're a firm of hackers interested in pushing technology forward through research, making computing apply to people by developing custom products and consulting for folks who want to put the technology to use, and helping people understand exactly what the ramifications of these systems are. That's a pretty broad way of saying that we're all about the Internet and making it work.
Slashdot: When did you start researching this story, and how long did it take to put the pieces together?
Sometime in May, someone sent us a tip about Coremetrics and what it's doing. We took a quick look over their web site to see their advertised services and then started to look at how the service is actually implemented on various client sites. We examined several sites, most of which very clearly stated in their privacy policies that they're using Coremetrics for site monitoring and provided links necessary for people who don't like it to opt out of the system. Most of the sites with clear, full disclosure policies weren't even sending Coremetrics personally-identifiable information like names and addresses.
The more interesting part of our find was in the sites that did send personal information to Coremetrics, particularly those that carried the TRUSTe privacy seal. Over the course of about three weeks, we performed an investigation of these sites, gathering as much information as possible from them. We reverse-engineered the system by reading the sites' code, reading through the obfuscation, and comparing logs of our network's activity with the activity that would be perceived by an end user.
What we found was a clear difference in user expectations and what was actually happening, as well as a clear difference between what Coremetrics says it offers and what its eLuminate service makes technically feasible. After writing drafts of our report and press release, we decided to take a wait-and-see approach to the release. Specifically, we wanted to ensure that sites that just started to use the Coremetrics service had adequate time to update their policies and to have an accurate idea of what was happening with the system after having been in production.
After waiting and watching for more than a month, we decided to release our findings. So, on Monday morning, we sent a pre-release copy of our report to Richard Smith and some folks at Zero Knowledge Systems. In addition, we contacted each of the firms named in our report and Coremetrics so that if the failure to disclose or the ability to profile people across web sites was unintentional, there would be time for some investigation and a decision about how to fix the problem. After the end of business Monday, we released our report.
Slashdot: What needs to change? In a perfect world, how do we deal with this?
This is a very interesting question. In my perfect world, detailed levels of profiling would not take place at all. There would be no such thing as persistent cookies. In general, I'm just not comfortable with the level of privacy that the industry as a whole has given up for the sake of a little convenience.
How big of a deal, really, is it to have to enter your password when you login to a web site? Don't forget that the reason why we have passwords in the first place is so that you'll have to do something at the beginning of the session to prove who you are.
Web browsers also need to be more intelligent. That is, they need to be able to identify things like dependencies on third parties so the user can know whether those images should be fetched or ignored. Right now, browsers -- for the most part at least -- just aren't very defensive. The model of parsing everything you're given worked fine in the Old Days for which some of us long so much but the fact of the matter is that you really can't blindly trust anyone on the Internet.
I'm not suggesting becoming a luddite. I'm suggesting that folks take a sort of "trust, but verify" approach a la Ronald Reagan. Right now, there's a lot of trust and almost no way to verify.
Slashdot: This all comes down to trust. How many policies are just there so people will shut up about personal information so they'll start buying stuff online?
I couldn't say. Policies are almost always written by lawyers. That probably speaks to the covering-one's-posterior-position value of privacy policies.
Slashdot: Since we can't trust written policies, what should people be doing before they start conducting business with these websites?
Verify everything. As I said earlier, though, we're severely lacking in tools that are accessible to most people that can help in that regard. I think Zero Knowledge Systems' Freedom network is a huge step in the right direction. Tools like Muffin (muffin.doit.org) also help, but it would be cooler for that kind of functionality to live right in the browser itself. There are opportunities for eager hackers on this front.
It's also important to stress that tools alone won't do it -- there is no silver bullet. People are going to have to have some understanding of what's happening in order to use these tools effectively.
Companies that don't fix their problems don't take your privacy seriously, no matter how much lip service they pay. So don't go to their sites. Don't buy their stuff. Tell them why you're not buying their stuff. Tell their competitors why you shop where you do, lest the new places you shop get the bright idea to try to hide something.
Jamie Talks to Coremetrics
Here's the service Coremetrics provides to corporate websites:
Many companies demand accurate knowledge of how their sites are being used: what sections are popular, what paths visitors take through the site, where people click over from, and so on. It's like web log analysis but more specialized for large shopping sites.
Since these demands are very much the same, and the code to do the analysis is similar, outsourcing happens. From a CEO's viewpoint, Coremetrics fiddles with the website to do better-quality tracking than the company could do on its own, and then makes the resulting statistics available over SSL.
But from your viewpoint and mine, that "fiddling" results in cookie-carrying web bugs all over the sites we visit -- web bugs which usually send back to the Coremetrics servers a unique visitor tag, like any other cookie, but one that sometimes includes your name, email address or other personally identifying information.
Coremetrics promises that this information remains private. When DoubleClick collects data from <img> cookies across multiple websites, they do so with the stated intention of tracking you personally; this is part of their business plan.
According to Coremetrics, they do things very differently. Data is not cross-correlated between their client websites, they say, because their contracts with their clients prohibit this. In fact, their contract forbids them from doing much of anything with that data except statistical analysis.
I gave the Coremetrics PR person I talked to a chance to explain, using the example of their client Toys 'R' Us:
"Coremetrics is merely an agent that collects this data on behalf of an individual customer, for that individual's sole use only. We do not collect data, as was inferred very incorrectly by Interhack, across multiple unrelated websites, with any intention of selling it to third parties -- or even distribution to third parties. That's because we, as the agent, do not own that data, nor do we have any rights to that data. Toys 'R' Us, and Toys 'R' Us only, is the sole owner of that data. So legally, we cannot do any of the possibilities that Interhack had alluded to in their report."
But here's the interesting thing.
If I'm browsing my favorite website, Coremetrics is clearly a third party. They have a special contractual relationship to keep my data private, which we shouldn't ignore. But nevertheless -- a third party.
So why do some of their clients' privacy policies not mention this?
Coremetrics lists twenty clients; I tried to contact seventeen of them for comment, with marginal success by press time. Three reported that they had not yet activated Coremetrics or had decided not to use the service at all. One (guru.com) reported not sending any personal information -- presumably, only tracking visitors with a non-identifying unique ID.
(Last-minute update -- just before press time, Altrec.com clarified that they are "sending unique ID (unique to Altrec.com) and city, state and zip. No other personally identifiable information is being sent to Coremetrics.")
Bravanta.com bounced me between different people until I got to leave voicemail that wasn't returned by press time. Their policy says they "do not and will not sell, trade or rent the personal information of our customers or gift recipients to any third parties."
(Update two hours later: Bravanta reports that they also have decided not to use Coremetrics' service, and are not currently using it.)
Mall.com didn't get back to me either, and their policy reads "We will NEVER release your name and personal information to a third party..."
Getplugged.com has a rather confusing privacy statement that begins, "Any personally identifiable information GetPlugged.com collects will be used solely for the purposes stated within this Privacy Statement" and wanders around from there. I'm not sure what to make of it, frankly.
All these polices may indeed be correct, if the sites are stingy with personal data. Like guru.com (and altrec.com), they may be using the Coremetrics service only with non-personal IDs. But, as with Toys 'R' Us, that may also not be the case.
(fusion.com, getplugged.com, and altrec.com also happen to be TRUSTe licensees, but TRUSTe wasn't able to comment by press time. In the AP wire story on Monday, they had harsh words but were speaking hypothetically; no comment since then.)
It's hard enough to read privacy policies already. Most of them are designed to protect companies legally, and mostly manage to confuse users. The distinction between Coremetrics as a third party; or affiliate; or agent, is a little too fine for the average consumer, and needs to be spelled out in each policy, as Coremetrics itself recommends.
But is all this a tempest in a teapot? If a signed contract forbids a company from misusing data, is that all we need to know?
I don't think so. In the first place, at the very least, companies like Toys 'R' Us need to disclose such things in their privacy policies. That's just common sense.
In fact, according to Coremetrics privacy advisor Dave Farber, they plan contractually to require such disclosure with future clients. (The company could not confirm or deny this at this time.)
More importantly, we as consumers are being asked to trust a third party whose reputation we know nothing about. In fact, 99% of us will never even have heard of them and might not understand what they do. We're told that a contract protects us, but we're still being asked to trust something we can't see. And when evidence of policy violations is turned up by a group of hackers, that erodes our trust.
After speaking at length with Coremetrics' PR, I get a general feeling of trust from them. (Of course that's a large part of their PR staff's job, earning reporters' trust.) More importantly, Dave Farber is well-respected, and his confidence carries weight -- with me at least.
Still, as Interhack says, our motto should be "trust but verify." That's why I proposed, to Coremetrics, that they publicly post, on their website, the paragraphs from their clients' contracts which assure that our private data remains private. If the actual legal words that protect our data are up there for us to see, we don't have to trust anyone.
When I mentioned this to Coremetrics' PR person, he promised to consider it; Dave Farber thought it was "a very good idea." It's unusual for corporations to make contracts public, even in part, but in this case it would do a great deal to put everyone's fears to rest.