Only 10% of drugs that enter clinical trials get approved. Owkin believes agentic AI can help fix that

Image credit: Owkin. A 3D rendering of a biological cell from the company's homepage, where it animates to reveal layers of internal complexity.

A 3D rendering of a biological cell from the company’s homepage, where it animates to reveal layers of internal complexity. Image credit: Owkin.

Roughly 90% of clinical development fails. Decades of investment in genomics, precision medicine, and computational biology have barely moved the needle as drugs themselves have become more complex and the proverbial low-hanging fruit dries up. Even for successful drugs, the process of successful drug development often costs billions of dollars and takes more than a decade before new therapies hit the market.

Jonas Béal, head of product at Owkin, thinks the success rate needs to be closer to a coin toss. He has a specific benchmark in mind. “Biological artificial superintelligence would be there the day we have above 50% success when a drug has entered clinical trials with the associated biological rationale,” Béal told R&D World. “So far, that’s not where we are. But that’s what we need to build.”

Owkin is part of a growing fray. Recursion combined with Exscientia in 2024 to create an end-to-end AI drug discovery platform. Insilico Medicine just signed a deal potentially worth $2.75 billion with Eli Lilly to bring AI-developed drugs to market with $115 million upfront. CytoReason launched its own AI agent at the same JPM Healthcare Conference where Owkin debuted its agentic infrastructure. Isomorphic Labs, which spun out of Google DeepMind and whose team helped build AlphaFold, is pursuing its own drug design pipeline.

The reverse translational paradigm

What makes Owkin’s bet different is its focus on actual patient data and how it is using AI. Béal describes an architecture in which specialized AI agents can help propose experiments, use evaluators to score hypotheses, and improve through lab and patient-data feedback — all in pursuit of a 50% clinical success rate the industry has never come close to achieving.

Jonas Béal, head of product at Owkin. (Photo: Owkin)

Béal, who holds a PhD in computational oncology from Institut Curie and previously worked at Sanofi, described Owkin’s approach as a “reverse translational paradigm.” The work starts from real patient data rather than preclinical models. Most research defaults to preclinical work, he said, “just because it’s easier, to be honest.”

But Owkin’s bet is that meaningful drug discovery requires confronting the full complexity of human biology head-on. The 10-year-old Paris-based company, which has raised $300 million and draws on data collected from more than 800 hospitals over a decade through its patient data network of 104 data partners, sources deep, multimodal data from patients under standard of care. “That’s where the code is, and that’s what we need to crack,” Béal said. “Without this data layer, you are just producing hypotheses, or LLM guesses.”

A lab that loops

The real challenge, Béal said, is now validation. “Finding new ideas is easy and cheap, to some extent, but validating them can be pretty expensive and pretty long.”

Owkin’s approach uses what Béal calls a “lab-in-the-loop” model. Ideally, the AI system doesn’t stop at identifying a promising drug target but extends to designing the experiment needed to validate or invalidate it. “That’s what you would expect from a real scientific copilot,” he said. “Not only generating the idea, but generating what you need to make the idea concrete and validated.”

Once lab results come back, they feed into the model in one of two ways. The simpler path treats results as a static pass/fail signal. The more ambitious path uses reinforcement learning to reshape how the underlying LLM reasons, so the system learns from each experiment and refines its approach over time.

Béal noted that wet-lab experiments aren’t always necessary. Owkin also validates AI-generated hypotheses against its existing patient data, a faster and cheaper check. “We usually try to do both,” he said.

Owkin’s K Pro copilot, shown here in a product demo, allows researchers to query biological datasets in natural language. The example query, about Nectin 4 expression in bladder cancer, illustrates the kind of translational research question the platform is designed to answer. (Image: Owkin, via YouTube)

From knowledge graphs to LLMs as judges

Owkin has used knowledge graphs — structured maps linking genes, diseases, drugs, and patient characteristics — in earlier drug-positioning work, but Béal said the current system is not centered on a knowledge graph. What Owkin does rely on, he said, is a funnel. The company’s AI generates far more target hypotheses than it could ever validate experimentally, so the challenge becomes prioritization. “You can generate thousands of ideas,” Béal said. “But the real bottleneck is the effort that can be made to validate those ideas. Even if you can automate that more with robotized labs, it will always be easier to generate ideas than to validate them.”

To scale the evaluation, Owkin uses specialized LLM-based evaluators as judges, each focused on a different dimension of a hypothesis. One model might assess replicability. Another scores novelty. A third evaluates feasibility. “It’s very rare to have one model that is good at evaluating everything at the same time,” Béal said. The approach mirrors how human expert panels work, but at machine speed and scale. Over time, the system also samples ideas that fail, deliberately learning from bad outcomes to improve its calibration.

A two-track architecture

When asked how Owkin’s system retains knowledge across stateless LLM calls, Béal described a two-track architecture. The company uses frontier models from Anthropic, OpenAI, and the open-weight ecosystem as its core reasoning engines, but can’t modify most of them. “This part of the brain we cannot really fine-tune, because that’s not our model,” he said.

Instead, Owkin improves what surrounds the model: the tools, prompts, and task-specific instructions that shape how it reasons. Béal compared these to recipes. “If I can’t change the model, I can still change the way the recipe has been written and the way the tools are being used.” That creates an improvement loop around the model. Even if the core weights stay fixed, the surrounding prompts, tools, and task logic can be refined based on evaluation results.

The second track goes deeper. For open-weight models, which tend to be smaller systems whose parameters Owkin can directly access, the company applies reinforcement learning to modify the model’s own weights. The result is OwkinZero, a biological reasoning model fine-tuned in-house and then plugged back into the broader agentic system alongside the proprietary LLMs. “We can make use of both paradigms to improve the system,” Béal said.

That broader ambition is shared across techbio. In a recent conversation on the Owkin podcast, Insilico Medicine CEO Alex Zhavoronkov put it simply: “In the frontier AI space, the ultimate battle will be an AI for science.”

Scaling with NVIDIA, shipping with Anthropic

Owkin’s partnership with Anthropic offered an early test of this modular architecture. Pathology Explorer, an agent built on Owkin’s HIPE histology model, was included in the launch of Claude for Healthcare and Life Sciences, accessible via the Model Context Protocol (MCP).

Béal said K Pro’s internal system was already built on MCP servers, so exposing an agent to Anthropic’s platform was a natural extension. “It’s also a forcing function to ensure our system is modular by design,” he said, “because at the end of the day, we need our systems to be interoperable with many different platforms.”

The target users are translational researchers and early clinical development teams who need to test hypotheses against real patient data at scale but lack the pathology bandwidth to do it themselves.

The NVIDIA partnership is about scaling what OwkinZero proved in-house. Owkin’s research team demonstrated that small open-weight models, fine-tuned with reinforcement learning on biological data, could outperform much larger commercial LLMs on specialized reasoning tasks. The collaboration gives Owkin access to the NeMo RL framework and NVIDIA AI infrastructure to push that work further. “The idea is to scale the promising research and move it to the next level,” Béal said, “and then integrate those state-of-the-art tools and resulting models directly within our agentic platform.”

Béal said Owkin derives complex question-and-answer pairs from its multimodal patient data to create a kind of biological ground truth for training. The model is then trained via reinforcement learning from verifiable rewards (RLVR), rewarded when its answers match that ground truth. “Those Q&As represent a kind of ground truth understanding of the biology,” Béal said.

The next step, he said, is moving beyond binary right-or-wrong signals toward process-based evaluation. In other words, the shift is training the model not just on whether it reached the correct answer, but on whether the reasoning chain that got it there was sound. In a February 2026 blog post, Owkin described related internal work using simulated gene regulatory networks as a causal-reasoning training environment. “We can already generate way more material with AI agents than we can possibly review as human beings in our lifetime,” Béal said. “That’s what we need to prepare for as researchers in every field.”

The reverse translational paradigm

A lab that loops

From knowledge graphs to LLMs as judges

A two-track architecture

Scaling with NVIDIA, shipping with Anthropic

Related Articles Read More >

Why Twist Bioscience’s complex genes offering is a bet on AI-driven protein design

Sandia turns to lightweight AI to speed up ceramic inspections for nuclear weapons components

Analyses find thousands of scientific papers with AI-generated errors

This week in AI research: Fields medalist says GPT-5.5 Pro did PhD-level math in an hour, Anthropic teaches Claude to ‘dream’

Search R&D World