2026-05-24

Carbon: The DNA Foundation Model That Makes Evo 2 Accessible to Biotech SMEs

Carbon-3B sandbox interface on Hugging Face showing a DNA sequence generated from human HTT exon 1 — illustrating a DNA foundation model accessible without a GPU cluster

Two months ago, Evo 2 proved AI could read the genome. This week, Carbon proved you no longer need a GPU cluster to use it.

The test: HTT exon 1, 14 CAG repeats

I gave Carbon-3B the first 93 bases of human HTT exon 1 (Huntingtin), including the 14 CAG repeats — the polyQ tract that defines Huntington's disease.

Carbon's very first generated codon was CAA — the canonical synonymous codon that closes polyQ tracts in real HTT. The model then immediately produced CCT-CCG-CCG, the proline-rich region that follows polyQ in the actual gene.

768 bp in 3.9 seconds, on the public Hugging Face Space. No fine-tuning. No prior knowledge of HTT. Just structural pattern learning from raw DNA.

That is the point.

Carbon: Evo 2's recipe, rebuilt for accessibility

Carbon was released this week by Hugging Face, Zhongguancun Academy, and TIGEM.

Same paradigm as Evo 2 — treat DNA as a language, train a large autoregressive model on it (detailed breakdown of Evo 2 here) — but the recipe was rebuilt around one question: can we hit the same frontier without needing a cluster to run it?

The numbers

The answer is yes.

Carbon-3B matches Evo 2-7B across the seven zero-shot benchmarks reported in the paper, with half the parameters and 150× faster inference.
Carbon-8B improves on every task, with the biggest jump on long-context retrieval — up to 786 kbp.

What matters as much as the weights: the full recipe is open

What I find even more interesting than the weights is what was released alongside them. The full recipe is open:

The training code
The Carbon Pretraining Corpus
The ablations
A clean seven-benchmark evaluation suite that runs Carbon, Evo 2, and GENERator behind a single flag

The DNA evaluation landscape was scattered across half a dozen papers. It just got a common reference point.

What this concretely changes for biotech and pharma

A 5-person biotech can now test a DNA foundation model on its own sequences without renting an H100 cluster.

Variant interpretation
Rare disease diagnostics
Regulatory sequence design

All of this just became accessible to teams that could not afford to run Evo 2-40B.

Evo 2 proved the science. Carbon makes it usable.

That's the real shift this week. The entry ticket for seriously experimenting with a DNA foundation model just went from a cloud budget to a consumer GPU — or a free Space.

If you're running a biotech or pharma project and want to evaluate what a model like Carbon can bring to your sequences, let's take 30 minutes to discuss.