skip to content

Data Discovery

Intended audience: DataKind Volunteers

A data science project is only possible if there’s data to work with, so you’ll need to ensure that the partner organization can provide ready access to the necessary data. At this stage, you’ll want to gain a preliminary understanding of:

  • Data inventory: What data is available? Create a well documented table with links to all data (columns can include data name, data type, link, how it was provided, and comments).
  • Data dictionary: Can the partner organization provide any associated data dictionaries, codebooks, database schemas, or other documentation?
  • Data quality: Are there questions about the data or issues with the data? How much data clean-up would need to happen prior to beginning the project?
  • Data creativity: Are there any other data sources that could bolster the project? For example, are there relevant open source data sets you could use?

While the goals of this stage may seem obvious, there are some subtle and important considerations to keep in mind as you move through the process.

Essential Tips
  • Be empathetic to data sensitivity. People working in education have lost their careers because data was used against them. People working in human rights have lost their lives because data was used against them. Be empathetic in explaining the ways this data will be used to empower the organization in their work, not to repeat the paradigms of the past.
  • Help expand their definition of “data.” Many organizations have an internal definition of “data” that only includes financial and reporting data. You must dig to understand what else, if anything, they may have digitized. A good way to do this is to have them walk you through their daily processes and probe whenever it sounds like information is being tracked.
  • Don’t believe it until you see it. Many organizations come claiming they have “30 years of human rights data” that ultimately ends up being a spreadsheet of 30 rows. Almost no organizations that DataKind has worked with have available data that matched what we pictured based on their initial description.
  • Support sharing imperfect data. Organizations may worry their data isn’t clean enough, good enough, or that it will show something they’ve done wrong. Remind them that (assuming you’ve carefully discussed data security requirements) it’s okay, we’re here to help and to show them, kindly, what they can improve on.
  • Be mindful of costs. Unless a project is funded by a grant that is covering the computing costs, we ask that partners plan to host and pay for any needed computing costs. If the partner organization has the capacity to host the data themselves and simply provide the DataKind volunteers with temporary access to the data within their systems, that is our preference.

Contributer(s): Benjamin Kinsella, Matthew Harris, Seward Lee

Contact us

If you would like to learn more about us, partner with us, or get in touch, email us at community@datakind.org

Subscribe to our newsletter
Subscribe