2026-04-04

AI Works Precisely When Data Is Scarce — 3 Proof Points from Rare Diseases

AI works precisely when data is scarce — examples with TxGNN, Healx and Unlearn.AI

"AI Only Works When You Have Lots of Data." Wrong.

This is one of the most persistent myths in healthcare and pharma. And in 2024-2025, AI proved its value precisely where data is scarce.

Here are 3 concrete examples in rare diseases.

TxGNN: Zero-Shot Drug Repurposing (Harvard, Nature Medicine 2024)

TxGNN is a foundation model developed at Harvard for drug repurposing. It leverages the structure of a biomedical knowledge graph to predict therapeutic candidates — including for diseases with no known treatments.

The numbers speak for themselves:

  • 17,080 diseases modeled, 7,957 therapeutic candidates identified
  • 92% of diseases in its graph had no existing treatment
  • +49% precision compared to 8 baseline methods

All in zero-shot — meaning without any training examples for these specific diseases. This is exactly the scenario where people typically say AI can't work.

TxGNN is freely available at txgnn.org

Healx: From Screening to FDA IND in 24 Months

Healx applied AI to Fragile X syndrome, a rare genetic disorder.

The result:

  • From in silico screening to FDA IND (authorization to begin clinical trials) in under 24 months
  • Initial budget: $91,500
  • Compared to 10-15 years and hundreds of millions using traditional approaches

This isn't a marginal gain. It's an order-of-magnitude shift in development time and cost.

Learn more about Healx

Unlearn.AI: Shrinking Control Arms with Digital Twins

Unlearn.AI uses digital twins of patients to reduce control arm sizes in clinical trials by 25-50%.

  • PROCOVA methodology qualified by the EMA in 2022
  • Confirmed by the FDA in 2024

For an ultra-rare disease where recruiting 50 patients is a feat, going from 30 control patients to 15 radically changes trial feasibility. Fewer patients needed, faster trials, lower costs — without sacrificing statistical rigor.

Learn more about Unlearn.AI

The Real Problem Isn't Missing Data

What holds back AI adoption in rare diseases isn't the absence of data. It's believing that 2015-era approaches are still the only ones available.

Today's tools were designed for scarcity:

  • Zero-shot learning — predicting without disease-specific training examples
  • Specialized RAG — leveraging existing scientific literature
  • Adaptive Bayesian designs — optimizing trials with few patients
  • Synthetic control arms — reducing recruitment requirements

Data scarcity is an argument for AI, not against it.

What About Your Organization?

If you're a rare disease biotech that thinks AI isn't for you because you don't have enough data, it's time to revisit that assumption.

The approaches exist. The clinical and regulatory evidence does too. The question is no longer "is it possible?" but "where do we start?"

Book a discovery call to discuss.

Ready to bring AI into your business?

Book a free 30-minute discovery call.

Book a Call