Protecting health data at UK Biobank
Leaks from the repository undermine trust and willingness to share health data. UK Biobank has sought to downplay the importance of the exposure but the science is clear. Individuals are often vulnerable to re-identification, and acknowledging the risks matters.
UK Biobank is one of the world’s most important biomedical research resources, holding detailed genetic, health, and lifestyle data on half a million British volunteers. More than 22,000 researchers across 60 countries have used the data, contributing to over 18,000 peer reviewed publications. It is an extraordinary resource, with a tarnished reputation after a series of scandals.
In our editorial for BMJ, we argue that repeated leaks from the repository undermine trust and willingness to share health data. UK Biobank has sought to downplay the importance of the exposure but the science is clear. In datasets containing common demographic and health attributes, individual records are often unique and vulnerable to re-identification, even when datasets are incomplete.
Acknowledging the true risk of re-identification matters. When institutions treat privacy as a box ticking exercise, the public often pushes back. The government is betting heavily on health data and artificial intelligence in its 10 year health plan for England, which commits to making the NHS “the most AI-enabled health system in the world.” If the UK cannot provide datasets that are both representative and comprehensive, developers will look elsewhere. Britain could then become increasingly reliant on AI models trained overseas rather than developed domestically. Not only would this undermine the ambitions of the AI opportunities action plan, but it would put patients at risk.
We discuss how the risks can be avoided if all players in the health data research ecosystem take privacy and security seriously by investing in sociotechnical infrastructure that offers better privacy protection while still maximising utility.