The Dataphiles: literature

Showing posts with label literature. Show all posts

Thursday, August 14, 2008

Preferential Installation of Facebook Apps [SIGCOMM WOSN]

I'm reading over the proceedings of SIGCOMM's Workshop On Social Networks, which is in Seattle next Monday.

Minas Gjoka, Michael Sirivianos, Athina Markopoulou, and Xiaowei Yang, a team of authors at UCI, wrote a paper, Poking Facebook: Characterization of OSN Applications, which looks at data from Facebook application installation and use.

First, they seem to have gotten a pretty successful crawl, which is saying something since Facebook is pretty selfish with data. Here is a PDF of application installation, both according to facebook stats and their crawled dataset, which match up pretty well:

They also modeled the histogram of installed-apps-per-user as preferential, running a simulation with "users as bins" and different apps as "different colored balls", iteratively assigning balls to bins. For instance, saying that 100 users have installed the Zombies application, would translate to "gray balls appear in 100 bins".

For each iteration, one goes through each "ball" (application installation), starting with the "most popular color" (application with the most installations). For each ball one then assigns an additional "bin that doesn't already contain that color" (picks a new user that hasn't already installed the app) according to a probability:

Where balls(i) is the number of applications a user i has installed, and B is the set of users that hasn't already installed the application. init is a parameter to moderate the initial activity, and rho is the preferential exponent, chosen in simulations to be 1.6. In the end you get a sort of heavy-tailed behavior, with most users installing a couple apps and a few who go nuts with application installs. It fits pretty well:

One of the fun parts of these sort of data is the outliers-- the users who go nuts on something. (Netflix users rating thousands of movies, etc.) It looks like in the crawled data there are a few users with 500 apps installed!

In the paper there is also fit to the "coverage of applications"-- that is, how many of the ranked apps we need to go through before we have all the users with one of those apps installed, and it appears the simulation reaches coverage a little too quickly, so perhaps the most popular applications are taking too many users in the simulation.

What's somewhat surprising to me is that this isn't at all based on the behavior of a user's friends, but of the entire Facebook network at large. I suspect that in reality that does govern user behavior, but for large-scale patterns one can overlook it. This might be different for actually modeling how an application catches on. (Using other features like network effects are listed as "future work" for the authors in refining the model.)

Thursday, April 17, 2008

How to make time for literature review

Answer: just wait until you're completely unmotivated to do anything else. Sunny days with perfect weather are really the only times I get a chance to do any significant literature reviews. This afternoon, when I was unable to get myself to stay in my windowless office, I (finally) sifted through the WSDM proceedings that I'm most interested in, and read a couple papers on trust/distrust propagation. I'm getting better at adding papers to my bibsonomy [rss]. The top 10 or so should be what I covered today.

Also a fun article: via Physics Arxiv Blog, To How Many Politicians Should Government Be Left? The article looks at the "efficacy" of a government compared to its cabinet size, and makes a rather nifty model of how opinions are formed in small networks. Another interesting bit is that while cabinet size ranged from 5 to 54, not a single government of the nearly 200 surveyed had a cabinet of size 8-- apparently it is common knowledge that that is bad luck, or something.

I also discovered that Jure was smart enough to submit last year's SDM paper to ArXiv, which yielded a citation. That has prompted me to register so I can post other publications.

This is related to a recent pet peeve of mine-- the fact that it's difficult to get conference proceedings. The ACM/Citeseer folks don't always things from workshops and the like that I'm interested in. Most authors have the sense to post their papers on their websites, but I much prefer being able to get a conference all in one place. Of course, professional organizations don't like to do that. I find it hard to believe that they really make money off of conference proceedings, so I can only guess that it has to do with publisher/copyright/legalities rules outside their control. Maybe someday CC/GPL will be able to wrest away some control.

Friday, February 15, 2008

Large-scale visualization reading group

Independent of the social media reading group (though I imagine some folks will participate in both), a visualization group has been founded by Peter Landwehr and Anita Sarma. And the first group meeting is on graph visualization (Thursday at 12:30 in the gradlounge). I'm stoked.

For the schedule and to subscribe to the mailing list, visit their wiki page.

Thursday, January 10, 2008

Bad Computer Science Writing

I just now ran across a 1997 writing of Jonathan Shewchuk: Three Sins of Authors in Computer Science and Math. They are:

1. Grandmothering. That is, writing an introduction that does not tell what the paper is really about, often making it both inaccessible to newbies and obvious and irrelevant to experts.
2. A paragraph-long table of contents in the introduction. (e.g. "In section 2 we survey related work. In section 3 we go over some preliminaries...")
3. Essentially copy-pasting the introduction into the conclusions.

I've been guilty of at least the last two simply out of the oral tradition of CS folk. Oops. I did always think the Table of Contents thing, while logical for book introductions, was a little silly for an 8-page paper where you're already worried about space. As #3, I suspect it's to make sure reviewers do have a "takeaway" message, in case they're too lazy to go back and read your introduction. However, if you have to re-state all your major findings on the last page in order for people to figure out what you've done, then the rest of your paper must have been poorly-written.

I am rather ashamed at how my writing skills have slipped since I changed majors five years ago. I could probably still pass freshman comp (and I have it easier than many of my fellow grad students since I get to write in my native language), but it's nothing like I could in the heyday of my high school journalism career.

Wednesday, July 18, 2007

Fiction as an exaggeration of inner fiction

An interesting post in Overcoming Bias. It suggests that our bias toward reality tends toward the direction of fiction. That is, (successful) fiction is simply a further exaggeration of things we already tend to overestimate. I think it's suggested that biases cause such fiction to be well-written and well-received, not that exposure to fiction causes this. Hanson then suggests to "Find ways in which fiction tends to deviate from reality, and then move your estimates of reality in the other direction."

A few possible human biases that this "fix" would identify.
-Your boss at the office probably isn't as socially inept and ignorant as you think.
-Your adversaries or competitors are not as evil and immoral as you think.
-Solutions cannot be wrapped up as quickly as you think.
-Serial killers are not as interesting as you think.
-People don't have nearly as much sex as you think.

This is related to the idea that everybody is their own protagonist.

The Dataphiles