Establishing meaningful data access for algorithm audits

Juliette Zaccour; Reuben Binns; Luc Rocher

doi:10.48550/arXiv.2502.00428

Establishing meaningful data access for algorithm audits

When we allow AI to make critical decisions about individuals, who gets to scrutinise them? Independent auditors struggle to investigate harmful impacts of algorithms due to strong barriers to data access. Our latest research suggests a path forward to reconcile AI accountability with data privacy protection.

Authors

Juliette Zaccour

Reuben Binns

Luc Rocher

Published

February 13, 2025

Doi

10.48550/arXiv.2502.00428

Yasmin Dwiputri & Data Hazards Project / Better Images of AI / AI across industries. / CC-BY 4.0

Our research article “Access Denied: Meaningful Data Access for Quantitative Algorithm Audits” will be presented at the ACM CHI Conference on Human Factors in Computing Systems (CHI’2025). The preprint version is already available online on arXiv.

Algorithms are increasingly used to automate decisions in private and public sectors. Examples include systems that screen job applicants, predict where crimes might happen, sort immigration visa applications, or automate border controls. These technologies can be simple programs that follow set rules, or complex algorithms using advanced artificial intelligence. Often, these algorithms operate without transparency, making it difficult to hold organisations accountable when decisions are harmful or discriminatory.

To uncover issues in deployed decision-making algorithms, researchers conduct evaluations, also called ‘algorithm audits’. Third-party algorithm audits, in particular, are conducted by auditors—typically independent researchers or journalists—with no ties to the target organisation.

Here, we focus on audits that investigate how a model treats different demographic groups. For example, does an algorithm that sorts visa applications consistently reject people from specific regions? This was the case with a visa processing system used by the UK Home Office, which was found to be biased against applicants from certain nationalities. By measuring how different groups are affected, algorithm audits can reveal systematic patterns within these systems.

Accessing data is a crucial and difficult task for auditors

Decision-making algorithms often use sensitive personal data that is protected by data protection laws (such as the EU’s GDPR). Because of these privacy concerns—as well as organisational secrecy and commercial interests—, organisations are hesitant to disclose such data. While researchers, policymakers, and civil society groups agree on the need for more oversight, few organisations grant access to independent researchers for evaluations.

As a result, auditors often have to resort to creative ways to investigate potential issues. They may rely on publicly available data or try to replicate the algorithm themselves. However, these workarounds have severe limitations, exposing auditors to legal retaliation from targeted organisations, and limiting the robustness of analyses. Therefore, low access prevents audits from being conducted in the first place, and limits the impact of audits that are completed.

A main challenge in conducting meaningful algorithm audits is balancing privacy protection with data access in the public interest. Unlike the financial sector, where regulated audit authorities have mandated access to information, there are no such bodies in the AI sector. Therefore, in order to oversee algorithms that make high-stakes decisions about individuals, we need solutions to safely and effectively access data.

What we did

We ran simulations of audits, where auditors measure whether an algorithm systematically disadvantages a given group of people. For example, is a parole release decision-support algorithm more likely to be in favour of release for White defendants, as compared to non-White defendants? Due to access constraints and privacy-protection measures, auditors may have access to data of varying granularity, completeness and quality; we examined how these different conditions of data access affect audit findings.

Privacy protection techniques

There are several methods to control access to data and attempt to protect individuals’ privacy. We analysed the following approaches within the context of algorithm audits.

Data minimisation, as defined in the GDPR, means sharing only the data that is necessary for a given purpose (here, auditing disparities in algorithm outputs). Limiting the scope of data helps reduce privacy risks.

Differential Privacy is a mathematical framework for releasing aggregate information from a dataset without revealing sensitive information from individuals. Often, it is implemented by adding a small amount of random noise to the data, making it harder to accurately identify personal information.

Synthetic data generation involves creating artificial data that mimics the patterns and properties of a (real) dataset. In theory, it enables access to an individual-level dataset that closely resembles the original data without exposing actual personal information.

Key insights

Key takeaways

Data access and privacy protection are not incompatible. Privacy concerns should not be used as an excuse for not providing necessary data for algorithm auditing. To avoid privacy becoming a scapegoat for the lack of transparency, we need better practices, incentives, and regulations that support data sharing in the public interest.

Poorly implemented anonymisation methods can harm both the individuals represented in the data and those affected by biased algorithmic decisions. Until AI audits are institutionalised, we believe that a combination of publicly-available statistics, remote privacy-preserving analysis tools, and secure access controls for real, complete and granular data provide the best path forward.

Citations

Zaccour, J., Binns, R., & Rocher, L. (2025). Access Denied: Meaningful Data Access for Quantitative Algorithm Audits. arXiv preprint arXiv:2502.00428.

To cite this research article, please use this BibTeX entry:

@misc{
    title={Access Denied: Meaningful Data Access for Quantitative Algorithm Audits},
    author={Zaccour, Juliette and Binns, Reuben and Rocher, Luc},
    archivePrefix = {arXiv},
    eprint = {2502.00428},
    primaryClass = {cs-hc},
    year={2025},
    month={February},
    url={https://arxiv.org/abs/2502.00428}
    doi={10.48550/arXiv.2502.00428}
 }

About us

This is a research project by Juliette Zaccour, Reuben Binns, and Luc Rocher. We are researchers at the Oxford Internet Institute and at the Department of Computer Science, at the University of Oxford.