Data Mining the National Science Foundation’s Scholarship Program

Fall is in the air.  Undergraduates have returned to their roost.  All this reminds me of the annual trials and tribulations of the scholarship cycles.  One of the particularly memorable ones was the National Science Foundation’s (NSF) Graduate Research Fellowship.  I applied, but ended up on a different NSF grant for my work at NYU.  I remember the application process vividly.

All of this was on the back of my mind while I was exploring data.gov this week.  I happened to stumble upon lists of NSF Fellows for the previous 13 years.  I added a couple of more fields that had not been explicitly included and compiled these lists into a single data set.  I’ve made it publicly available as a Google Fusion Table for anyone that would like to use it for further analysis.

I’ve done a few simple analyses of the dataset (below the fold) and included my interpretations for any would be fellowship applicants.

How many NSF Graduate Fellows are there?

The NSF graduate fellowship was first offered in 1952. Since that time, there have been over 46,500 scholars from over 500000 applications (NSF-GRFP history), or about 775 scholars a year.  Currently the program is funding about 2000 scholars annually.  We can look at total numbers over the last 13 years using the dataset I compiled.

Where do NSF Graduate Fellows Study?

Each NSF scholar declares where he or she plans to study.  Below I’ve used the aggregate function of fusion tables by year and institution.  I exported the aggregate data and hacked together a simple python script to plot the percentage of awardees overtime that declared for a particular school.  I’ve filtered the figure below for institutions having more than 3% of the students in at least one year.  Berkeley, Stanford, Harvard and MIT dominate the rankings every year (although MIT might be steadily declining).

What fields of study are most funded by the NSF?

A research proposal for graduate school should definitely be shaped by what you will likely find interesting for a long time, so tailoring your proposal to what the NSF might *want* to see is probably a recipe for failure, even if you happened to get the scholarship with this strategy.  It’s interesting however to at least know what fields tend to be funded more readily.

The full data set includes lots of details about the proposed subfields of study for each awardee, in fact a bit too much to draw meaningful conclusions. Here I’ve aggregated data by parent field (Engineering, Life Sciences, etc.) for the years 2005 – 2012.  Engineering and Life Sciences are the most represented.  There are three things that might be going on here:

  • Engineering and Life Sciences are more favorably funded
  • Engineering and Life Sciences are most highly represented in the applicants.
  • Engineering and Life Sciences are broader parent fields, so tend to be kind of a catch-all.

I’ve not been able to deduce from this dataset what the cause is.  But I’d be curious to see further analysis on the data to determine what is happening.

I’m a student, what should I do with this information?

Probably nothing drastic.  There are many ways to go to graduate school.  The NSF graduate research fellowship is one of the most competitive ways to do it in the sciences.  If you’re eligible, then I’d encourage you to put together a proposal for it, if for no other reason than the process will help you to organize your thoughts about graduate school.  The process will probably be helpful in applications to graduate programs regardless.

In my opinion, definitely don’t tailor your application trying to game the stats.  But, if you have a clear idea of what you want to study and where, then, if it happens that your program is in an underrepresented field and school, you should probably make an extra effort to apply.  I would think the applications that are not of the MIT-Engineering flavor might stand out a bit more. I’d be curious to know if any admissions or scholarship councilors happen to agree with me.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>