
In 2018, during her chemistry Nobel Prize lecture, Frances Arnold noted that scientists had arrived at a point where they could read, write, and edit any sequence of DNA. But composing whole genes or even whole genomes from scratch — that was something only evolution could do.
A few years later, not long after helping to launch the Arc Institute, a nonprofit research center in the Bay Area, molecular engineer Patrick Hsu wondered if it was possible to imitate the forces of evolution that Arnold had been referring to. DNA is a language, after all, and with all the advances in generative AI — chatbots that could hold eerily lifelike conversations if trained on enough text — maybe recreating all the cellular complexity contained in a genome wasn’t that far behind.
advertisement
Working with Brian Hie, a computational biologist at Stanford University and a fellow Arc Institute member, Hsu, who is also an assistant professor at the University of California, Berkeley, began assembling a team of scientists to train an AI model on vast troves of biological data — 300 billion DNA letters, including long sequences from 80,000 genomes of bacteria and archaea.
STAT+ Exclusive Story
Already have an account? Log in

This article is exclusive to STAT+ subscribers
Unlock this article — plus in-depth analysis, newsletters, premium events, and networking platform access.
Already have an account? Log in
Already have an account? Log in
Monthly
$39
Totals $468 per year
$39/month Get StartedTotals $468 per year
Starter
$30
for 3 months, then $39/month
$30 for 3 months Get StartedThen $39/month
Annual
$399
Save 15%
$399/year Get StartedSave 15%
11+ Users
Custom
Savings start at 25%!
Request A Quote Request A QuoteSavings start at 25%!
2-10 Users
$300
Annually per user
$300/year Get Started$300 Annually per user
View All PlansGet unlimited access to award-winning journalism and exclusive events.
Subscribe Log InNext article: Affirmative action in medical school literally saved lives