Basecamp Research, an AI lab for biological design, today announced the launch of the Trillion Gene Atlas, a scientific initiative to generate and model biological data at the trillion-gene scale. The initiative is a collaboration with Anthropic, Ultima Genomics and PacBio and is powered by NVIDIA AI infrastructure.

The Trillion Gene Atlas aims to expand known evolutionary genetic diversity 100-fold by collecting genomic data from more than 100 million species across thousands of sites worldwide from Basecamp Research’s growing network of global biodiversity partners. Basecamp Research aims to provide the vast, diverse training data required for AI systems to learn from evolution to design new medicines on demand.
“Today’s biological AI models are trained on a narrow slice of life on Earth,” said Glen Gowers, co-founder and CEO of Basecamp Research, speaking at SXSW in Austin. “The Trillion Gene Atlas expands the known genetic universe by orders of magnitude beyond what is in public databases. Training models at this scale establishes a new paradigm for programmable therapeutic design.”
Addressing the biological data bottleneck
The Trillion Gene Atlas builds on Basecamp Research’s EDEN foundation models that were released in January. The EDEN models were trained entirely on BaseData, a proprietary genomic database that contains more than 10 trillion tokens of evolutionary DNA from over one million newly identified species.
This massive expansion in dataset diversity moved EDEN beyond simple prediction, making it the first model capable of designing diverse therapeutics directly from a disease prompt, Basecamp said. In wet-lab validation, EDEN demonstrated therapeutically relevant integration of cancer-fighting DNA into primary human T cells, with CAR-T cells showing over 90% tumor cell clearance in laboratory assays. The model has successfully generated hits across multiple frontier modalities, notably pioneering AI-Programmable Gene Insertion (aiPGI) to insert healthy genes and designing targeted antimicrobial peptides with a 97% hit rate against priority pathogens, Basecamp said.
The Trillion Gene Atlas builds on this approach by expanding the breadth and contextual depth of genomic data in the known internet of biology suitable for AI training. “Bigger models alone aren’t enough,” added Phil Lorenz, chief technology officer of Basecamp Research. “EDEN showed that performance in biological AI follows much steeper scaling trajectories with higher quality and fully contextualized data. The Trillion Gene Atlas extends that principle 100-fold.”
Scaling data generation and computation
The Trillion Gene Atlas is enabled by advances in ultra-high-throughput short- and long-read sequencing, as well as accelerated computing. Basecamp has partnered with Ultima Genomics and PacBio to deliver industrial-scale sequencing, including data-rich, high-accuracy long reads.
Ultima is a developer of ultra-high-throughput next-generation sequencing (NGS) systems. Ultima’s latest sequencing system, the UG200 Series, advances the company’s wafer-based sequencing architecture to enable high-throughput, whole-genome and multi-omics sequencing at an industrial scale at a low price point to enable initiatives like the Trillion Atlas.
The Trillion Gene Atlas will be powered by NVIDIA’s accelerated computing infrastructure to process vast quantities of genetic data at the petabase scale. As part of this effort, Basecamp plans to leverage NVIDIA Parabricks to accelerate metagenomic assembly. This collaboration focuses on both advanced engineering and the development of new algorithmic methods to optimize how complex environmental samples are reconstructed.
Through parallelized data processing, automated annotation and large-scale model training, the partners expect to compress a task that previously would have required more than 20 years of processing time to less than two years. This compression of sequencing, assembly, annotation and model training is intended to expand the performance and scope of biological foundation models across therapeutic development.
Creating an agentic end-to-end therapeutic design workflow
Anthropic joins as part of its broader effort to add new capabilities for life sciences: connecting Claude to more scientific platforms. Working with the Claude for Life Sciences team, the aim is to harness the Trillion Gene Atlas and EDEN to make Claude an even more productive research partner for scientists and clinicians, and support organizations bringing new scientific advancements to the public.
By combining Claude’s advanced reasoning capabilities, EDEN’s therapeutic design capabilities and NVIDIA’s CUDA-X Libraries to process unstructured data, the initiative aims to create an integrated workflow for interpreting complex clinical data and translating it directly into therapeutic design.
The Trillion Gene Atlas is built on three pillars: large-scale DNA sequencing, global data supply partnerships and advanced computing. Together with AI systems capable of reasoning across complex data, these foundations can help turn vast datasets into therapeutic discoveries. By increasing the amount of evolutionary data available to AI by another 100 times, Basecamp Research aims to make drug design faster and more systematic, extending EDEN’s earlier advances in fields like gene therapy and the fight against antibiotic-resistant bacteria.



