Google AI has unveiled an AI co-scientist, a Gemini 2.0-based system that significantly shortens the early research cycle—in some cases reducing hypothesis generation time from weeks to days—and delivering proposals rated higher in novelty by domain experts evaluating 15 complex biomedical goals, according to a Google-coauthored paper, “Towards an AI co-scientist” (Gottweis et al., 2025). This multi-agent tool autonomously crafts novel hypotheses, experimental protocols and research overviews. The system can condense literature review and brainstorming efforts through a self-improving “generate, debate, and evolve” framework. Validated through real-world trials—such as predicting novel drug repurposing candidates for acute myeloid leukemia and uncovering epigenetic targets for liver fibrosis (both confirmed in wet-lab experiments)—it outperforms other state-of-the-art AI baselines in performance, as indicated by expert ratings and Elo scores.
Among the most compelling evidence for the AI co-scientist’s ability to accelerate discovery lies in its use in parallel in silico discovery of a novel gene transfer mechanism in bacterial evolution. The paper highlights this time-saving capability, stating the AI co-scientist “recapitulated unpublished experimental results via a parallel in silico discovery of a novel gene transfer mechanism in bacterial evolution” in just “2 days.” By comparison, the traditional method to accomplish the same — crafting a novel hypothesis an validating it experimentally, took over “10 years of iterative research.” See the figure below:

Image extracted from the Google paper
Beyond AML and liver fibrosis, the AI co-scientist also recapitulated an unpublished discovery about how capsid-forming phage-inducible chromosomal islands spread across multiple bacterial species—a key mechanism underlying antimicrobial resistance. According to the Gottweis et al. paper, this feat took the system just a couple of days, whereas the same insight emerged over years of conventional lab work. In all cases, the platform operates as a collaborative tool rather than a replacement for researchers, with domain experts guiding and validating its outputs. Additionally, the current version relies on open-access literature and may miss nonpublic or negative experimental data, pointing to avenues for future enhancements. Even so, the three successful validations—in drug repurposing, target discovery, and bacterial evolution—demonstrate how this framework could generalize across diverse biomedical domains.
As alluded to earlier, the AI co-scientist paper used an Elo-based tournament system to gauge the system’s continuous self-evaluation. The co-scientist demonstrated a marked improvement in Elo ratings—a metric correlating with hypothesis quality—as test-time compute scales. This Elo-driven framework facilitates a “generate, debate, evolve” cycle, iteratively refining outputs and outperforming both expert-derived benchmarks and state-of-the-art AI baselines. The system isn’t intended to be an island, divorced from human output. Rather it is designed for expert-in-the-loop collaboration. The paper notes that while these findings underscore the AI co-scientist’s potential to uncover novel hypotheses, expert oversight and further wet-lab confirmations remain essential. That is, it allows scientists to guide and refine its outputs, ensuring alignment with scientific priorities. Beyond biomedicine, the AI co-scientist’s architecture is broadly applicable across diverse scientific domains.

Average accuracy of the AI co-scientist (blue line) and reference Gemini 2.0 (red line) responses to GPQA diamond questions, categorized by Elo rating. Note: Elo ratings are based on auto-evaluation and do not reflect an independent ground truth. [From the Google announcement]

A schematic showing how the system works at a high level. [From the Google post]