Tuesday, July 29, 2008

Scandal sells

Since starting at Live Labs I've gotten to play with a lot of data, including the political Usenet and crawled memeorandum hourly data (since mid-September 2005, following Katrina). Today I came across something less-than-surprising.

Top 10 links on memeorandum according to most number of 'discussion' links-- that is, number of discussions (usually blogged) that are related to a parent story (usually news).

"For McCain, Self-Confidence on Ethics Poses its Own Risk" [McCain and scandals] 219

"Spitzer is Linked to Prostitution Ring" 178

"Embattled Attorney General Resigns" [Gonzales and scandals] 170

[Text of Obama's race speech] 158

"NSA has massive database of Americans' phone calls" 129

"The Long Run-Up" [McCain and scandal] 119

"Craig Arrested, Pleads Guilty Following Incident in Airport Restroom" 116

"US Web Primer Is Said to Reveal a Nuclear Primer" [Iraq and Nukes] 115

"Digging Out More CNN/Youtube Plants" [Youtube politics and staged debates] 115

"Dark Suspicions About the NIE" [Iran and Nukes] 107

So, for the most part, what sells is sex and violence.

Sunday, June 8, 2008

Book: Beyond Fear, by Bruce Schneier

I read this a couple months ago and failed to take it with me to Seattle, so I've lost the notes I took on it, but it at least bears mentioning.

He proposes looking at a security problem/solution using the following steps:
1. What assets are you trying to protect?
2. What are the risks to these assets?
3. How does the proposed security solution mitigate those risks?
4. What other risks does the solution cause?
5. What trade-offs and costs does the solution impose?

It's a good introduction to some of the principles and key terms in security (at least, from what I can tell, as someone who knows very little about the field). He uses examples of national security throughout the book, essentially telling readers that terrorism isn't as much of a threat as everyday dangers like heart disease and car accidents, and that the current solutions do not mitigate the risks well. What I liked most about it was that he can frame anything in terms of a security problem and explore it in-depth (including a lot of things I wouldn't normally have thought of in that way, such as maintaining a population of honeybees), which puts it in the category of "books that help you learn to think differently". If I were put in the position to teach an undergrad-level course on computer security I would make it required reading in the first couple weeks, just to get students in the right frame of mind to think about security problems and solutions.

Tuesday, June 3, 2008

E coli: not just for health scares

Today MSR had Carl Zimmer visiting to give a talk on his latest book Microcosm: E coli and the New Science of Life, following a pre-talk backyard burger grilling (not really). I watched over the live-streaming video. Zimmer addressed how E coli has been used in the past for scientific experiments, and some new directions that microbiology is taking.

E coli has been used in bioengineering to make synthetic insulin, jet fuel, and cancer treatments, to name a few. Some students even found a way to make it "take pictures". E coli has around 2,000 "core" genes, while the entire genome (all strains of E coli) has nearly 10,000 that have been found so far (for comparison, humans have 30,000). Some scientists believe that the "bare minimum" of genes necessary for its survival is around 200. Venter and company have already been working with a different smaller-genomed species, and "keep knocking out genes, to see if it still lives." Their count is down to 350. Potential experiments are to take these O(100) genes and begin adding more to create "new life" specialized for some purposes, which is very futuristic-sounding.

Other interesting experiments involve finding bacteria that are already suited for human needs. For instance, a teenager in Canada already isolated bacteria that eat plastic bags. These sorts of experiments could solve a lot of problems. I wonder if there are bacteria that turn lead into gold. :-)

Sunday, June 1, 2008

Newsflash: Flying is Frustrating

Via The Consumerist, Americans are flying less because it's such a frustrating process, according to the Travel Industry Association. Detailed survey results are here (PDF).

Oddly enough they don't say anything about fuel costs, which I imagine has a much larger impact. For one, people are also driving less, and presumably this is not a reaction to the fact they're just sick and tired of having to fight their neighbor for the armrest.

For two, people have a greater tendency to grin and bear it when they're paying less for something (just ask any Southwest Airlines customer*). But when flights start costing more, whether on the ticket or by new-and-improved fees ("Now you want $15 to lose my bag, a service that used to be free?"), people expect a better experience, even if logically they know the cash is just getting pumped into the fuel tanks.

Perhaps I'm missing something. I haven't paid much attention to flight prices over the past year; I'm just guessing they've increased. (And if they haven't, that might explain why airlines can't get their stuff together enough to satisfy their customers.) Does anybody have solid data on this? Better, does anybody have solid data on how many people actually fly, not just what a consumer survey says?

*- I kid, but SWA flight attendants have been known to say during the pre-flight recitation, "Please do not tamper with the lavatory smoke detectors, as the penalty for disabling a smoke detector is up to $2000. And we know that if you had $2000, you'd be flying American."

Started at MSR/LL

I'm in Bellevue, WA now, and just finished my first week as an intern at Microsoft Live Labs. I'm working with Matt Hurst on some social media stuff. So far MSFT has been a fun place to work; everyone seems really happy.

One of the things I'm most excited about is the puzzle culture. I did PuzzleQuest, sponsored by MSFT, once awhile back and really enjoyed it. I hear there is an intern puzzle day as well as weekend-long "The Game" (not to be confused with The Game that I just lost). The latter is apparently invite-only, so I will have to get more details later.

Other notes:

-We found out that some recent work with Leman and Christos was accepted to KDD, so I will be in Las Vegas at the end of August. With my trusty free Microsoft Research nalgene bottle, so as not to dehydrate.
-As I tend to do when I travel, I've done an unusual (for me) amount of non-work-related reading in the past couple months. Will update later with some notes.

Thursday, April 17, 2008

How to make time for literature review

Answer: just wait until you're completely unmotivated to do anything else. Sunny days with perfect weather are really the only times I get a chance to do any significant literature reviews. This afternoon, when I was unable to get myself to stay in my windowless office, I (finally) sifted through the WSDM proceedings that I'm most interested in, and read a couple papers on trust/distrust propagation. I'm getting better at adding papers to my bibsonomy [rss]. The top 10 or so should be what I covered today.

Also a fun article: via Physics Arxiv Blog, To How Many Politicians Should Government Be Left? The article looks at the "efficacy" of a government compared to its cabinet size, and makes a rather nifty model of how opinions are formed in small networks. Another interesting bit is that while cabinet size ranged from 5 to 54, not a single government of the nearly 200 surveyed had a cabinet of size 8-- apparently it is common knowledge that that is bad luck, or something.

I also discovered that Jure was smart enough to submit last year's SDM paper to ArXiv, which yielded a citation. That has prompted me to register so I can post other publications.

This is related to a recent pet peeve of mine-- the fact that it's difficult to get conference proceedings. The ACM/Citeseer folks don't always things from workshops and the like that I'm interested in. Most authors have the sense to post their papers on their websites, but I much prefer being able to get a conference all in one place. Of course, professional organizations don't like to do that. I find it hard to believe that they really make money off of conference proceedings, so I can only guess that it has to do with publisher/copyright/legalities rules outside their control. Maybe someday CC/GPL will be able to wrest away some control.

Sunday, April 6, 2008

ICWSM, semi-supervised learning

Returned from ICWSM, and was inspired to perhaps start blogging again, but we'll see how long that lasts.

The tutorial at ICWSM went well (pdf slides available at that link, ppt available by emailing me). I will be giving it again at NESCAI. There were a lot of great talks and posters at ICWSM; a lot more toward the text/sentiment mining side of things than last year, but still a great variety of concepts.

While in Seattle I missed the 10-601 class lectures on semi-supervised learning, and had to prepare a recitation anyway. So as part of that preparation I came across a good survey paper by Xiaojin Zhu. It has an entire section devoted to graph-based methods, some of which I hadn't heard of, so this was useful to me beyond giving me interesting things to talk about in recitation. It might be of use to try some of these algorithms on community detection in networks.