UK Biobank’s Endless GitHub Data Leaks: Tracking 110 Takedowns and a Half-Million Records for Sale

UK Biobank battles repeated GitHub leaks of 500,000 volunteers' health and genetic data, issuing 110 DMCA notices while full datasets surface for sale in China. Oxford researcher Luc Rocher's tracker exposes the scale and risks of re-identification.
UK Biobank’s Endless GitHub Data Leaks: Tracking 110 Takedowns and a Half-Million Records for Sale
Written by Victoria Mossi

Researchers keep uploading UK Biobank’s sensitive health files to public GitHub repositories. And the charity behind the world’s largest biomedical database fights back with DMCA notices. As of April 17, 2026, it had issued 110 such requests targeting 197 repositories from 170 developers across 14 countries. That’s according to a tracker built by Luc Rocher, associate professor at Oxford’s Internet Institute, available at biobank.rocher.lc.

UK Biobank holds genetic sequences, health records, and lifestyle details from 500,000 British volunteers aged 40 to 69 when recruited between 2006 and 2010. It grants access to 20,000 researchers worldwide under strict terms barring any further sharing. Yet accidents happen. Notebooks leak sample rows. Genetic files in PLINK or BGEN formats slip out. Tabular datasets with phenotypes end up exposed.

Rocher’s site scrapes GitHub’s public DMCA archive. It charts the timeline: first notice in July 2025. Steady pace through 2025. Pauses in January, February, and most of March 2026. Restart after a Guardian investigation in March. Nearly half the targeted content? Jupyter or R notebooks. A quarter? Genomic data. Developers mostly from the U.S. (24) and China (21).

Privacy risks loom large. De-identified? Sure. But not anonymous. The Guardian re-identified a volunteer using just approximate birth date and surgery date from public info. Rocher warns in a Science Media Centre reaction: “This is the 198th known exposure of UK Biobank data since last summer. UK Biobank data is not just available for sale, it also remains available online for anyone to download today.” Once out, copies spread. Takedowns chase shadows.

Worse hit Thursday. The UK government confirmed all 500,000 participants’ data listed for sale on Alibaba in China. Technology minister Ian Murray called it an “unacceptable abuse” during Commons questions, per Yahoo News UK. UK Biobank spotted the listings after an internal breach. Sellers hawked full datasets. Beijing confirmed involvement but details stay murky.

UK Biobank shifted years ago to a cloud-only model via DNAnexus. No local downloads. Analysis happens in secure environments. But legacy lapses persist. Rocher and Jessica Morley penned a BMJ editorial urging better safeguards. “UK Biobank is addressing repeated uploads of participant data to public GitHub repositories by mistake,” the tracker notes. It issues notices even to users without direct access—likely derivatives or scrapes.

Critics question the copyright approach. DMCA suits pirated code, not health breaches. The UK lacks a privacy equivalent forcing quick platform action. Participants get told to curb their own online sharing. Rocher pushes back: actions fall short. Data lingers across the web.

Broader woes plague UK Biobank. Past flak for pharma exclusives—Amgen, GSK got early peeks. Chinese researchers tapped GP records, sparking MI5 flags. Insurance sharing bids from 2020-2023. Race science whispers. Now this. Trust erodes. Volunteers signed up for science, not sales pitches on Alibaba.

Rocher’s tracker spotlights the mess. It tallies files: 445 genetic ones. 357 tabular. Notebooks dominate at 48%. Scripts and docs fill the rest. Pauses in early 2026 raise eyebrows. Did vigilance lapse? Or breaches dip? Data says no—the Guardian probe restarted the hunt.

Government stirs. Murray demands answers. UK Biobank briefs ministers. But fixes? Cloud helps. Yet humans err. Students push commits without checks. Repos go public by slip. Re-identification odds climb with AI. Rocher’s past work showed 99.98% Americans traceable from 15 traits.

Industry watches. Pharma relies on this goldmine for drug trials. Leaks taint it. Participants fret. One quote from UK Biobank’s site: a message to protect personal info. Too late for some.

Calls grow for overhaul. Tighter vetting. AI leak detectors. Better training. Morley and Rocher demand humility, per the BMJ. Listen to privacy experts. Stop dismissing risks.

The tracker updates. Seven days since last notice then. Now more. Exposures hit 198th, Rocher says. Data flows unchecked. Half a million lives in balance. Science needs the fuel. But not like this.

Subscribe for Updates

InfoSecPro Newsletter

News and updates in information security.

By signing up for our newsletter you agree to receive content related to ientry.com / webpronews.com and our affiliate partners. For additional information refer to our terms of service.

Notice an error?

Help us improve our content by reporting any issues you find.

Get the WebProNews newsletter delivered to your inbox

Get the free daily newsletter read by decision makers

Subscribe
Advertise with Us

Ready to get started?

Get our media kit

Advertise with Us