Thursday, December 13, 2007

Andrew Tomkins on search data privacy

Yesterday Andrew Tomkins gave a talk at CMU. He addressed some social media, but a large part of his talk was regarding how it's tough to "anonymize" search data. Using the AOL scandal as an example, he basically granularized that data and showed in surprising ways that you can identify people.

One point he brought up was "person attacks". While "trace attacks" such as finding credit card information or SSNs in the queries are dangerous enough, an attacker might decide to exploit one person. For instance, if they know that you vacationed to Tahiti and recently bought a Honda minivan, they can look through the data and identify anyone who has queries on both Honda minivans and Tahiti vacation packages-- you could probably do well with this even without trying to find people limited to those who have searched for, say, "* in Pittsburgh PA". Or, if an attacker knows the victim and is at his house for a party or something, the attacker might ask to use the victim's computer and put in a unique term-- then when the attacker obtains the "anonymized" data later he can find that person. With knowledge of what they've searched for, for instance "AIDS clinics" or some adult term, the atacker could use blackmail.

It makes me want to play with the AOL search data-- I have some ideas for trend analysis. I'd downloaded it at some point but never got around to doing anything with it.

No comments: