skip to content

Data Audit

Intended audience: DataKind Volunteers

Before you begin designing the project, you’ll need to perform a Data Audit to ensure that a data science project is feasible and responsible. The main goal here is to confirm whether the available data can support a project that meets the objectives, in addition to identifying salient issues related to data provenance, quality, and ethics. It is not required that you exhaustively outline all available tables and variables (though this process may be helpful for you!). Rather, you’re demonstrating a proof of concept, answering: “Given our objective, do these data have what we need?.” Use the Data Audit Report Template to ensure that you leave no stone unturned during your audit.

The Data Audit results may indicate that the project will face certain challenges or may not be possible to execute as planned. These insights don’t necessarily mean that the project must be canceled; instead, it may result in adjusting the motivating question, thinking creatively about the data, re-designing the scope of the proposed solution, or even simply documenting potential pathways so the partner organization and project team understand what their next steps could be. A large component of the Data Audit is thinking carefully about bias that might exist in the data and evaluating possible data inclusion and exclusion risk. This ethical data evaluation is crucial to ensure that DataKind only completes projects that will ultimately have a positive impact on historically marginalized communities. Below you’ll find some common questions and answers for you to better understand this step in the project process.

What will the Data Audit be used for?

The Data Audit report you create with the template will be valuable for the scoping team to decide whether to take on a project and for the project execution team to understand the data when they are onboarding to the project.

What skills are needed to complete the Data Audit?

The person conducting the data audit should have enough data analysis skills to perform cursory analysis of a multitude of datasets, enough breadth in the world to know of external datasets or data that can be scraped/collected, and enough knowledge of statistical bias to flag obvious issues in the dataset. If you are able to, bring a subject matter expert in to look at the data as well.

When does the Data Audit occur within the Design Stage at DataKind?

The Data Audit happens after the data sharing agreement is signed. It ends with a finalized services agreement contract based on the results of the audit.

Are Data Audits only required for projects when the organization collected data?

No. These Data Audit guidelines apply to all projects - whether the organization provides their own data, the data is scraped, or an open source data set is used. All data sets need to be audited with the same level of scrutiny.

Is every section in the template always required?

No. The template provides a list of potential questions, and not all will be relevant to your project. We also encourage you to work with the Project Champion or a technical lead within the organization who best understands the data’s quality and its collection processes.

What happens if the Data Audit results indicate that the project is not viable?

It is common to find that there is not sufficient data available to meet the partner organization’s project goals, but that answer doesn’t necessarily mean the end of the road. Next, we work with the partner to adjust the motivating question, think creatively with the existing data, or find alternative data sources. Do not get discouraged if the data do not support the project. This hard-won insight is valuable to the partner organization, and it could seed a new project related to making more usable data in the first place.

Contributer(s): Benjamin Kinsella, Nathan Banion, Rachel Wells

Contact us

If you would like to learn more about us, partner with us, or get in touch, email us at community@datakind.org

Subscribe to our newsletter
Subscribe