Research & Development World

  • R&D World Home
  • Topics
    • Aerospace
    • Automotive
    • Biotech
    • Careers
    • Chemistry
    • Environment
    • Energy
    • Life Science
    • Material Science
    • R&D Management
    • Physics
  • Technology
    • 3D Printing
    • A.I./Robotics
    • Software
    • Battery Technology
    • Controlled Environments
      • Cleanrooms
      • Graphene
      • Lasers
      • Regulations/Standards
      • Sensors
    • Imaging
    • Nanotechnology
    • Scientific Computing
      • Big Data
      • HPC/Supercomputing
      • Informatics
      • Security
    • Semiconductors
  • R&D Market Pulse
  • R&D 100
    • Call for Nominations: The 2025 R&D 100 Awards
    • R&D 100 Awards Event
    • R&D 100 Submissions
    • Winner Archive
    • Explore the 2024 R&D 100 award winners and finalists
  • Resources
    • Research Reports
    • Digital Issues
    • R&D Index
    • Subscribe
    • Video
    • Webinars
  • Global Funding Forecast
  • Top Labs
  • Advertise
  • SUBSCRIBE

Evo 2 AI promises to accelerate genetic engineering and synthetic biology

By Brian Buntz | February 19, 2025

NVIDIA

[Image courtesy of NVIDIA]

Researchers at Arc Institute, in collaboration with NVIDIA and teams from Stanford University, UC Berkeley and UC San Francisco, have introduced Evo 2: an AI foundation model trained on the genetic sequences of more than 100,000 species.

Evo 2 can identify meaningful patterns in a range of genomes and accurately predict mutations that influence disease or protein function. In addition, the model can design entire genomes at bacterial scale.

Decoding the language of genomes

Building on its predecessor Evo 1, Evo 2 is trained on more than 9.3 trillion nucleotides from over 128,000 genomes spanning bacteria, archaea, phage, human, plant and other eukaryotic species. This reportedly makes Evo 2 one of the largest AI models developed for biological research, capable of analyzing sequence lengths of up to one million nucleotides at once. Our development of Evo 1 and Evo 2 represents a key moment in the emerging field of generative biology, as the models have enabled machines to read, write, and think in the language of nucleotidesPatrick Hsu, co-founder and core investigator at Arc Institute, assistant professor of bioengineering, UC Berkeley

The NVIDIA/AWS connection

Evo 2’s development used the NVIDIA DGX Cloud AI platform via AWS with over 2,000 NVIDIA H100 GPUs. The team introduced a specialized AI architecture known as StripedHyena 2, co-developed with OpenAI co-founder Greg Brockman during a sabbatical. StripedHyena 2 integrates efficient Fourier and convolution kernels – sometimes referred to as FlashFFTConv – and techniques inspired by the S4 model to overcome the usual Transformer limitations on long contexts. Unlike standard transformer-based architectures, which can struggle with memory and computational load at large sequence lengths, StripedHyena 2 significantly boosts the context window for Evo 2, allowing it to handle up to around one million bases in a single pass.

image-4-300x180

An SAE model‐derived feature activations for α‐helices, β‐sheets, and tRNAs (left) in the E. coli genomic region encompassing the thrT and tufB genes, alongside AlphaFold’s predicted EF‐Tu (tufB) protein structure (right). Image from Goodfire.

A user interface called Evo Designer is available for researchers, and the full codebase is publicly accessible on Arc’s GitHub. Evo 2 is also integrated into the NVIDIA BioNeMo framework, as part of an Arc Institute-NVIDIA collaboration. In addition, Arc worked with Goodfire to develop a mechanistic interpretability visualizer that helps researchers see how the model recognizes key features in genomic sequences. Other key contributors include Patrick Hsu (Arc Institute and UC Berkeley), who helped shape the biological vision for Evo 2’s gene editing applications; Brian Hie (Stanford), who co-led the machine learning work; Greg Brockman (OpenAI) for engineering support and architectural scaling; and Hani Goodarzi (UCSF), who advised on gene regulation and cancer genetics. NVIDIA’s collaboration, which Anthony Costa and team oversaw, provided GPU support and integration into its BioNeMo platform.

A BRCA case study

In tests involving variants of the BRCA1 gene (Breast Cancer Gene 1), Evo 2 has more than 90% accuracy in discerning potentially pathogenic from benign mutations, thereby offering a tool to streamline disease research and therapeutic development. Beyond pinpointing mutations, Evo 2 can “write” large-scale genomic segments – designing bacterial-sized genomes (on the order of one megabase) that include essential elements such as tRNA and rRNA genes. Researchers are also using Evo 2 to create new biological mechanisms that do not exist in nature. For instance, earlier Evo models successfully generated a novel CRISPR-Cas9 variant (“EvoCas9-1”) with approximately 73% similarity to any known Cas9, yet experimentally validated as functional. Evo 2 similarly designs new transposons or genetic “switches” that might activate only in specific cell types, improving safety in gene therapy by mitigating off-target effects. The developers took care to exclude human pathogens and certain other complex-organism pathogens from Evo 2’s training data. Additional safeguards prevent the model from returning productive outputs about these pathogens.

Evo 2 vs. AlphaFold

Evo 2 differs from prominent AI tools such as DeepMind’s AlphaFold, which predicts the 3D structure of a single protein. Instead, Evo 2 focuses on large-scale genomic “language” – examining and generating entire DNA or RNA sequences, sometimes spanning up to a million bases. While AlphaFold handles one protein at a time, Evo 2 can manage multi-gene architectures, regulatory regions, and simultaneously design proteins plus their corresponding RNAs. Methodologically, AlphaFold uses structural biology data as training labels, whereas Evo 2 is trained more like a large language model in a self-supervised fashion. In practice, the two are complementary: Evo 2 can generate potential novel proteins or CRISPR systems, and AlphaFold (or similar structure-prediction models) can then assess their likely 3D conformations. Evo 2’s open-source release could point to a range of developments ranging from specialized fine-tuned models (“Evo-Microbe,” “Evo-Plants,” or cancer-focused versions) and broader “virtual cell” approaches that merge genomic data with epigenomics, proteomics, and structure-prediction tools. Researchers are also exploring a proof-of-concept fully AI-designed organism, synthesizing an Evo 2-generated genome in the lab. While this raises exciting possibilities in synthetic biology, it also underscores the importance of robust safety measures, biosecurity, and ethical frameworks as generative biology accelerates.Technical details are available in the Evo 2 preprint and the companion machine learning paper “Systems and Algorithms for Convolutional Multi-Hybrid Language Models at Scale.” Additional information about Evo 2’s novel architecture and applications can be found on the Evo Designer site, the Arc GitHub, and via NVIDIA’s BioNeMo framework.

Related Articles Read More >

Open-source Boltz-2 can speed binding-affinity predictions 1,000-fold
New Gemini 2.5 Pro model achieves top-tier science and coding performance while costing 1/8th the price of OpenAI’s o3
Berkeley Lab’s Dell and NVIDIA-powered ‘Doudna’ supercomputer to enable real-time data access for 11,000 researchers
Scientific lab
Google Cloud, Dexcom and Recursion see AI agents shifting from demo to practical lab applications
rd newsletter
EXPAND YOUR KNOWLEDGE AND STAY CONNECTED
Get the latest info on technologies, trends, and strategies in Research & Development.
RD 25 Power Index

R&D World Digital Issues

Fall 2024 issue

Browse the most current issue of R&D World and back issues in an easy to use high quality format. Clip, share and download with the leading R&D magazine today.

Research & Development World
  • Subscribe to R&D World Magazine
  • Enews Sign Up
  • Contact Us
  • About Us
  • Drug Discovery & Development
  • Pharmaceutical Processing
  • Global Funding Forecast

Copyright © 2025 WTWH Media LLC. All Rights Reserved. The material on this site may not be reproduced, distributed, transmitted, cached or otherwise used, except with the prior written permission of WTWH Media
Privacy Policy | Advertising | About Us

Search R&D World

  • R&D World Home
  • Topics
    • Aerospace
    • Automotive
    • Biotech
    • Careers
    • Chemistry
    • Environment
    • Energy
    • Life Science
    • Material Science
    • R&D Management
    • Physics
  • Technology
    • 3D Printing
    • A.I./Robotics
    • Software
    • Battery Technology
    • Controlled Environments
      • Cleanrooms
      • Graphene
      • Lasers
      • Regulations/Standards
      • Sensors
    • Imaging
    • Nanotechnology
    • Scientific Computing
      • Big Data
      • HPC/Supercomputing
      • Informatics
      • Security
    • Semiconductors
  • R&D Market Pulse
  • R&D 100
    • Call for Nominations: The 2025 R&D 100 Awards
    • R&D 100 Awards Event
    • R&D 100 Submissions
    • Winner Archive
    • Explore the 2024 R&D 100 award winners and finalists
  • Resources
    • Research Reports
    • Digital Issues
    • R&D Index
    • Subscribe
    • Video
    • Webinars
  • Global Funding Forecast
  • Top Labs
  • Advertise
  • SUBSCRIBE