OpenAI's GPT-5 autonomously ran 36,000 protein synthesis experiments in Ginkgo Bioworks' cloud lab

GPT-5 handles the cognitive layer (data analysis, biochemical reasoning, hypothesis generation) and sends experimental designs down to the RACs, which handle the physical execution (liquid handling, incubation, fluorescence measurement), then push data back up. [OpenAI]

A collaboration between Ginkgo Bioworks and OpenAI has produced what may one of the most concrete demonstrations yet of AI-driven autonomous science: a closed-loop system in which OpenAI’s GPT-5 designed, executed, analyzed and iteratively refined 36,000 cell-free protein synthesis experiments over six months, with minimal human intervention.

The system reduced the cost of producing superfolder green fluorescent protein (sfGFP), a standard benchmark in the field, to $422 per gram of protein. The associated total reaction component costs amounted to a 40% reduction in production cost and a 57% improvement in reagent cost specifically, according to a preprint released by the two companies on February 5 and slated for posting on bioRxiv.

The result lands on the same day OpenAI released GPT-5.3-Codex, its latest coding agent model, and Anthropic launched Claude Opus 4.6. But while those launches focused on software engineering and general reasoning, the Ginkgo collaboration represents an AI system closing the loop on physical experimentation at meaningful scale.

Ginkgo’s cloud lab in Boston. [Ginkgo]

How the system worked

The architecture paired GPT-5 with Ginkgo’s cloud laboratory in Boston, which runs on the company’s reconfigurable automation carts (RAC) and Catalyst automation software. GPT-5 was given internet access, a computer with data analysis packages, experimental metadata from prior iterations, and a preprint describing the existing state of the art. From there, it operated in a closed loop: designing batches of experiments in 384-well plate format, having the robotic lab execute them, receiving the data back, analyzing results, generating new hypotheses, and proposing the next round.

Over six rounds, the system executed more than 580 plates, tested 36,000 unique reaction compositions, and generated nearly 150,000 data points. Human involvement was largely limited to reagent preparation, loading and unloading plates, and system oversight. Experimental design, data interpretation, and hypothesis generation were handled by GPT-5.

To prevent the model from proposing experiments that couldn’t actually be run every proposed design was validated against a Pydantic model before execution. The validation checked plate layout, standards, controls, replication, reagent availability and volume constraints. Only experiments that passed were eligible to run. GPT-5 also generated human-readable lab notebook entries documenting its analysis and rationale, providing an audit trail for its reasoning.

“By pairing a frontier large language model with an autonomous lab, we found reaction compositions that are notably cheaper than prior state of the art,” said Reshma Shetty, co-founder of Ginkgo Bioworks and co-author of the study. “We expect more and more experiments to be run on autonomous labs where reagent and consumables costs dominate the cost of an experiment.”

What GPT-5 actually found

According to OpenAI’s account, GPT-5 took three rounds of experimentation, roughly two months’ worth, to establish the new cost benchmark. The improvements weren’t simply a matter of brute-force search. The model identified reaction compositions that humans had not previously tested in this configuration and, notably, proposed and prioritized new reagents to test, some of which independently anticipated findings from published research it had not been given access to.

OpenAI reported that small changes in buffering, energy regeneration components and polyamines had outsized impacts relative to their cost — parameters that are testable at high throughput but not always the first ones human researchers reach for. The model also identified compositions that performed better under the low-oxygen, plate-based conditions typical of automated labs, where reaction geometry and mixing differ substantially from bench-top experiments.

The cost structure itself shaped the optimization. In CFPS, costs are dominated by lysate and DNA template. That means boosting protein yield per unit of expensive input is the highest-leverage strategy — a finding the system converged on through iterative experimentation rather than through upfront specification.

Caveats and context

Several important limitations apply. The results were demonstrated on a single protein (sfGFP) in a single CFPS system. Generalization to other proteins and platforms remains unshown. The comparison to Olsen et al. is specific to 384-well plate format; the Northwestern group achieved far lower costs at bench scale with oxygen supplementation ($55 per gram at 15 μL scale, $36 per gram at 4 mL scale). Oxygenation and reaction geometry effects can strongly influence yields, and some of the improvements may be sensitive to these conditions.

The findings are described in a preprint that has not undergone peer review. The $422-per-gram figure reflects total reaction component costs under the specific experimental conditions described — not a fully burdened manufacturing cost.

Human oversight was also not zero. Protocol improvements and reagent handling still required experienced operators. The system can design and interpret experiments, but practical laboratory work involves details that remain outside the model’s reach.

Commercial and policy implications

Ginkgo is already selling the AI-improved cell-free reaction mix through its reagents store, signaling that the company views the result as commercially viable rather than purely academic. The move aligns with Ginkgo’s broader pivot toward positioning its cloud laboratory infrastructure as a service — a model that becomes substantially more attractive if AI systems can drive the experimental design layer autonomously.

“At OpenAI, this was the first time we were able to interface a frontier model with an autonomous lab to carry out experimentation at a very large scale,” said Joy Jiao, OpenAI’s life sciences research lead and co-corresponding author of the study. “This success points to how AI systems can augment the experimental workflow, contributing to hypothesis generation, testing and refinement based on real-world data.”

How the system worked

What GPT-5 actually found

Caveats and context

Commercial and policy implications

Related Articles Read More >

Inside BD’s plan to forecast hospital drug demand with AI while keeping models within bounds

Mayo Clinic backs ViewsML in $4.9M round to advance virtual biomarker staining

Elsevier expands LeapSpace with writing coach and Claim Radar, says 97% of users report time savings from the platform

Anthropic says Claude can run science experiments now rather than just plan them

Search R&D World