Thursday, December 13, 2007

Andrew Tomkins on search data privacy

Yesterday Andrew Tomkins gave a talk at CMU. He addressed some social media, but a large part of his talk was regarding how it's tough to "anonymize" search data. Using the AOL scandal as an example, he basically granularized that data and showed in surprising ways that you can identify people.

One point he brought up was "person attacks". While "trace attacks" such as finding credit card information or SSNs in the queries are dangerous enough, an attacker might decide to exploit one person. For instance, if they know that you vacationed to Tahiti and recently bought a Honda minivan, they can look through the data and identify anyone who has queries on both Honda minivans and Tahiti vacation packages-- you could probably do well with this even without trying to find people limited to those who have searched for, say, "* in Pittsburgh PA". Or, if an attacker knows the victim and is at his house for a party or something, the attacker might ask to use the victim's computer and put in a unique term-- then when the attacker obtains the "anonymized" data later he can find that person. With knowledge of what they've searched for, for instance "AIDS clinics" or some adult term, the atacker could use blackmail.

It makes me want to play with the AOL search data-- I have some ideas for trend analysis. I'd downloaded it at some point but never got around to doing anything with it.

Emailing prospective schools

FemaleScienceProfessor has a post on pre-accepted students writing professors for advice on grad school, queries regarding availability, etc.

In the end, I don't think that e-mailing professors in such a manner actually helps your application. What does help is *real* getting your foot in the door, such as doing REU's or other summer programs available, or getting a job as a research programmer (this is good if you're wanting to take a year "off" between degrees anyway). Asking professors about this sort of availability is a good idea-- the programs aren't always easy to find on department main pages. Of course, you have to do that a year before your apps are due.

Also, if you have the opportunity to go to a conference while you're an undergrad is helpful-- you get more of a chance to demonstrate you know what you're talking about. And if nothing else you can at least talk to other grad students, who are less intimidating and more likely to ask you to join them for a beer.

Friday, December 7, 2007

Consequences of geographic distance and social networks

A well-known phenomenon here at CMU SCS is the NewellSimon-Wean barrier. There are several sub-departments of SCS-- including Computer Science, Machine Learning, Language Technologies, Robotics, Human-Computer Interaction, Software Research, and probably others I've forgotten. CSD, MLD, and ISR are in Wean; LTI, HCII, and robotics are in NSH. (Then there are students with offices in Doherty or the CIC, etc) There is a covered bridge about 20m long connecting the two buildings.

And yet somehow I know disproportionately more students in CSD, MLD, and ISR than in the others, even though LTI and Robotics have more overlap with my department in terms of research interests. I think this has to do with socializing factors. The NSH departments have their own lounges, where all the departments in Wean share a lounge (ISR and MLD are both fairly small). Each department has their own social organization to some extent, but the all-SCS social organization, Dec/5, is mostly CSD and ISR people (with growing MLD representation). Even though all of our events happen in Newell-Simon, and I believe our happy hours are well-attended by both buildings.

Of course, anecdotal evidence reveals that Dec/5 participation has a lot to do with personal connections. It is a time commitment, after all, and it's very easy to flake out on volunteer organizations because any given graduate student is "too busy". It's not so easy to do that if your best buddy is in the organization too and will have to pick up the slack. While we get a lot of great volunteers toward the beginning of the semester, once November/April hits it becomes very difficult to put on a TG (happy hour) and for the most part only people in the central "clique" sign up to help out-- and usually out of peer pressure. I also recognize that if I'm not friends with people I'm volunteering with, even if I like them as people, I'm going to get kind of bored.

This makes me think that the key to retaining Dec/5 volunteers is to integrate them in quickly though separate social activities. If they can become friends with existing committed folks, they're more likely to become committed themselves.