Data Science for the Public Sector (DS4PS)

Open source tools like R have become extremely popular in academics and industry. It's flexibility has led to a proliferation of contributed packages for econometrics, network analysis, spatial analysis, machine learning, and meta-analysis. It is especially well-known for its graphics capabilities. For those working in government and nonprofit organizations with tight budgets, R can replace proprietary statistical or GIS software that costs thousands of dollars per license, making it a valuable tool for the public sector. It can also be used to build dynamic data dashboards that can be used as performance management systems in public agencies. Although a steep learning curve has meant that adoption has been slow, many schools of public policy are starting to offer courses in R or degrees in data science. We have gathered some resources that are helpful in developing a curriculum for data science in nonprofit and public policy programs.

For more information visit the DS4PS WEBSITE or the DS4PS GITHUB repo.

Stay tuned for information on the development of a Data Science Consortium for Public Affairs.


Open Data for Nonprofits

Several open datasets are available for nonprofit research, but these assets are not cataloged, well-documented, or in formats that are easy to access. This project was created to make this data easily accessible to nonprofit scholars and researchers.

The datasets currently in the archive include:

  • Business Master File of All Current Exempt Orgs (all orgs granted 501(c)(3) status)

  • 990-PC, 990-EZ and 990-PF Electronic Filers from 2010 to Present

  • All 990-N Postcard Filers

  • All Organizations with a Revoked 501(c)(*) Status

We will be working with the Center for Nonprofits and Philanthropy at the Urban Institute to expand available datasets and build a data science platform for nonprofit scholarship.

Website for the Nonprofit Open Data Collective [ link ]
View Project on GitHub [ link ]
990 E-File Project Overview [ ppt ]


Citation Networks

Literature reviews are an essential component of the research process, but due to the high volume of research and idiosyncratic terminology across disciplines it can be difficult to systematically review a domain. We have developed a method of constrained snowball sampling from academic databases that reliably identifies the most important citations in a field. The method has been implemented as a desktop application that scrapes data from Google Scholar, and a package for R to analyze the data.

Methodological Overview:  
[presentation]   [working paper]

Citation Network Analyzer web scraper [ request ]

Source Code on GitHub [link]

 

Nonprofit Entrepreneurship

There are over 50,000 new nonprofits started each year, and roughly 25% of them don't survive past their fifth year. Starting a new organization can be challenging, and competition for resources has intensified. Thanks to a grant from the Kresge Foundation we have been able to build knowledge about key components of the startup process. We have collaborated with the Foundation Center on the creation of a nonprofit startup diagnostic tool that will allow potential nonprofit entrepreneurs to evaluate their preparedness and direct them to useful resources to fill their gaps in knowledge and strategy.

If you are thinking of starting a nonprofit access the [ diagnostic tool here ].

Read our working paper on nonprofit entrepreneurship [ link ]