The erosion of online anonymity

Auren Hoffman · September 29, 2010 · Short URL: https://vator.tv/n/122e

(And how to restore it)

One of the most important principles of individual privacy is the ability to act anonymously.  When people are driving to a store or reading a book at home, they have a reasonable assumption that nobody is monitoring their behavior and attaching it to their name and address.

people on computerThe same should be true on the internet: when you are online, there should be a presumption of anonymity. Nobody -- including websites, ad networks, ad exchanges, widgets, outside analytics services, etc.  -- should know who you are and what you do unless you sign up or log in.

In a better world with sufficient anonymity online, your search history and the sites you visit should not be matched back to personally-identifiable information (like your name, address, email, etc.) so it cannot be stolen, used to discriminate against you, or subpoenaed by the government.

In online advertising, there are various standards for what constitutes sufficient online anonymity. But unless companies  adhere to the highest standard and increase awareness to consumers, internet users may think their browsing behavior is being tied to their identity and may subsequently dramatically decrease internet consumption and be less likely to experiment with new online services. In short, the lack of available anonymity could stifle the online economy and all the innovation happening on the web

What Anonymity Means

anonymousThe key to protecting anonymity is to make it technically impossible - not just contractually prohibited or difficult - to tie an internet user to their name and address when they are not explicitly logged in.

This doesn't mean that websites and third party services can't know something interesting about you.  They might know that you are a woman who lives in the New York area, plays tennis, enjoys Settlers of Catan, is in market for a trip to Italy, and drives a hybrid.  This is good because they can use this data to give you a more personalized experience: content you like, better customer service, more targeted ads, and less spam.  But they should not know that it is you.

Of course, once you log in or when you link to your name or Twitter profile, people might know it is you.   But it is important that when you first arrive at a site, nobody knows exactly who you are unless you explicitly log in.

Prescriptions to improve online privacy and anonymity

Here are some prescriptions that online services should use to raise the bar on online privacy:

1. Eliminate the collection and analysis of "Machine ID"fingerprint

A Machine ID is like your computer's fingerprint that can usually uniquely identify you.  It is the information that your computer may send to the sites you visit, like your IP address, browser configuration, "clock skew" (the millisecond difference in the clock on your machine and that on the server), and more.

One reason sites and third-parties collect Machine ID is to help customize the online experience for people based on past machine behavior. But this can be uniquely identifiable because people largely use the same computers, meaning that machine IDs can potentially be looked up and traced back to an individual (there is a marketplace for addresses-to-IP matches today). In fact, IP address alone can be traced back to 30% of households today.

2. Store audience data in browser cookies

cookiesWhile browser cookies have recently received a lot of attention, they are one of the most privacy-centric ways to help personalize services for consumers.  Today, pretty much every site that you go to uses cookies every time you visit.  This is generally a good thing for consumer privacy, since - using browser settings - a person browsing the Internet can entirely control their cookies.   However, many companies don't do a good job of anonymizing cookies.

Cookies Many firms store a unique ID in a user's browser cookie and ping cloud servers with this unique ID to "see" data associated with the cookie. This system of storing unique cookie IDs has a lot of benefits since it enables the information associated with cookies to be quickly updated and more easily analyzed.

But using unique IDs also means people may no longer be anonymous.  A more privacy-centric solution is to store all the segments of a person directly on a cookie.  The data can be encrypted and secured so that only the cookie-placer can access it.

Changing the cookie system from unique ID-centric to segment-centric is a large technical challenge and might take some sites, ad networks, and widgets many months to complete.  But it would be great that if by this time next year, all companies could be more pro-consumer in the way they store data within cookies.
 
3. Make it impossible to identify an individual using anonymous data segments

But storing the data directly on the cookie is only part of the challenge. Data also needs to be anonymized appropriately.  Simply stripping personally identifiable information out of a cookie is not enough to make it anonymous. Recently, Netflix had to shut down its million-dollar Netflix recommendation contest as a result of an FTC inquiry about their anonymization practices.

If there is data on me that says my company is "Rapleaf" and my title is "CEO," it is not anonymous because I am the only person that fits the join of both of those attributes.  A more appropriate description would be company "technology start-up" and title of "executive"-that gives me room to add other criteria like lives in "SF Bay Area," plays "soccer," and reads lots of books on "foreign policy" without knowing it is me.   Many people fit all those characteristics.

These are just three of many prescriptions that companies should implement to help ensure the presumption of anonymity.   Adopting these changes will require a short-term sacrifice for web sites and third-parties, but long-term these are the right decisions for companies to make.

Giving technologists a better appreciation of why privacy - and in particular anonymity - is really important is not an easy task.  Most Silicon Valley companies come from the perspective that their technology is sacrosanct.  As an engineer, I admit that we started my company Rapleaf with that approach. However, years of engagement with our web users, customers, partners, privacy experts and advocates (including our own privacy advisory board), have made it clear that investing in a safe infrastructure where users have the presumption of anonymity will ensure that the Internet will continue to grow and stay vibrant.

Special thanks to Michael Hsu, Joel Jewitt, Jeremy Lizt, Travis May, and others for their help and edits....

Image Description

Auren Hoffman

CEO SafeGraph. fmr CEO at LiveRamp. GIS nerd.

All author posts