Data Mining the National Science Foundation’s Scholarship Program

Fall is in the air.  Undergraduates have returned to their roost.  All this reminds me of the annual trials and tribulations of the scholarship cycles.  One of the particularly memorable ones was the National Science Foundation’s (NSF) Graduate Research Fellowship.  I applied, but ended up on a different NSF grant for my work at NYU. […]

Wolfram Alpha and Social Networks

Stephen Wolfram posted recently on an update to Wolfram Alpha.  You can now type in ‘facebook report’ at the Wolfram Alpha prompt.  The new command displays a whole host of interesting data mining from the information you’ve placed on Facebook over the years. My favorite tidbit?  The clustered network graph of my friends.  Nothing shocking here, but […]

Social Networks of Shakespearean Plays – part 2

As a follow on to my previous work about social networks in Shakespeare, I wanted to see how the social network changes throughout the course of the play.  Using the same techniques as last time we can look at the social network structure by act. The density of the social network underlies the style of […]

Social Networks in Shakespeare

Privacy settings are very important when getting them wrong results in a duel to the death. When I think of social networks, I immediately start to think of Facebook, LinkedIn and the rest of their ilk.  These tend to dominate the landscape of our thoughts on social networks simply because they’re the biggest.  But social […]

A/S/L/(Neil Gaiman fan?) – Fusion tables and OKCupid

I’ve been thinking about data visualization tools lately.  In particular, I got some advice to checkout Google’s Fusion Tables.  I needed some data to start playing around with, but, luckily, I happened to have 29,035 OKCupid profiles laying around in a database (learn how I got them here). Age: First question, what does the age […]

Data mining OKCupid

I’ve been thinking quite a bit about natural language processing lately.  This started with my series on text message analysis and looking at gender specific twitter usage.  Lately I’ve been pointed at the Natural Language Toolkit (NLTK), a library in python, to make this analysis more robust.  I want to apply this toolkit on a […]

Gender differences, Twitter and Videogames

I was recently introduced to Tweet-o-Life via the quite amazing Nathan Yau over at Flowing Data.  The Tweet-o-Life project was a study by Amaç Herdağdelen and Marco Baroni of habits on Twitter.  They looked at millions of tweets to identify behaviors of two kinds, ones based on gender and ones based on time of day.  They’ve since made […]