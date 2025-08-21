AI isn’t some shiny new kid on the drug discovery block. In one way or another, it’s been lurking in labs for decades under aliases like computational chemistry and cheminformatics. The term ‘artificial intelligence’ itself dates back to the 1950s. Alister Campbell, vice president and global head of science and technology at Dotmatics , chuckles at the notion it’s a fresh invention. “I kind of laugh when everyone talks about AI and knows that it’s this brand new thing that’s appeared, and it’s never been around,” he said in an interview.

But even with that history, skepticism toward AI predictions ran deep among chemists. “I used to see the chemists on the projects I worked on; they would come up with, ‘Oh, I’ve used docking to come up with this potential new drug. Who wants to go and make it?’ And the number of scientists to say, ‘Well, I’ll do it,’ is no one because they knew the chances of success were low,” Campbell recalled. Now, with beefed-up computing power and sharper algorithms, that doubt is easing, clearing the runway for AI to speed up the trek from hit to clinical candidate.

Smaller pods, faster loops

Scientists gradually warming to machine-assisted R&D approaches dovetails with a shift in how drug discovery teams are structured, a trend that reflects a broader, long-term evolution in pharmaceutical R&D methodologies. For years, major pharma companies have revamped their R&D models to increase efficiency, with a 2016 Journal of Translational Medicine article noting a clear move toward “restructuring R&D into better manageable smaller and biotechnology-like units.”

That trend is continuing. “If you look at the size of the research teams internally in Big Pharma, it’s gone from projects that are, say, 50 people strong to getting smaller and smaller and smaller,” Campbell said. Increasingly in their place: lean, cross-functional pods: “a couple of data scientists… biologists… maybe a pharmacokinetic scientist and a chemist.” These lean teams, thanks to the use of current tools such as AlphaFold (now in its third generation) can “design and come up with the ideas from that data driven process.” Much of the execution outsources the synthetic or wet lab work into the CRO world, thus reducing their cost and increasing the speed in the cycle.

They don’t need the same size of teams internally anymore to really deliver the same quality.

This trend has accelerated in recent years. A 2023 Nature review describes a “tectonic shift towards embracing computational technologies,” driven by a flood of ligand/structure data, abundant compute, and “on-demand virtual libraries of drug-like small molecules in their billions.” That paper also highlighted structure-based screening of gigascale spaces alongside deep-learning predictors of ADME/PK. Things are not slowing down. By January 6, 2025, FDA issued draft guidance outlining a risk-based framework for how AI-generated evidence can support drug and biologic regulatory decisions. The agency later launched Elsa on June 2, 2025, an internal LLM assistant used to help staff read, write, and summarize documents, accelerate clinical-protocol reviews, summarize adverse events and compare labels. Though early reporting flagged rollout hiccups and hallucinations, the agency-wide deployment still signals FDA’s push to operationalize AI across routine review workflows. On the pipeline side, AI-designed molecules are progressing through the clinic. For instance, Insilico’s IPF candidate, INS018_055, has reported positive results from its Phase 2a trial and the company is now planning a Phase 2b study.

Build, test, learn — repeat

This lean, accelerated model isn’t entirely new; it has long been the standard in smaller companies. “They don’t have the same budgets to go and hire teams of 30-40 people on a research project, so they’ve got to be leaner,” Campbell said. This battlefield-tested success, he argues, is part of the reason why “a lot of the pharma are leaning into the buying of small companies or the partnership model.” The fundamental operational difference is a move away from brute-force experimentation. Instead of relying on “empirical test on test on test cycles,” these successful teams thrive on a rapid, iterative loop he describes as the “build, test, learn, build, test, learn… type of model.”

The shift toward smaller, faster teams hinges on a crucial element: trustworthy data. Campbell explained that early skepticism toward AI was justified because “in the past, a lot of models were built on, quite honestly, pretty crappy data.” The solution has been to ruthlessly filter inputs and build models on tighter, cleaner datasets.

For scientists to trust a model, it must be able to show its work. Campbell emphasized that the most successful companies build systems that transparently explain their predictions by identifying the specific data points that influenced the outcome. This shift from intuition to curated data democratizes institutional knowledge. For decades, companies relied on veterans with “stored up tacit knowledge,” he said. Now, with clean data, there’s “less reliance on that pocket of data that’s sat in people’s heads,” allowing discovery to become more trained and data-driven.

But creating that foundation requires the right plumbing. This is where a platform like Dotmatics’ Luma comes in, which Campbell describes as an “AI-native” data lakehouse built on Databricks. It acts as a central hub designed to manage the volume, variety, and velocity of scientific data: from chemical structures to biological sequences. By simplifying the process of getting data in and out, it provides the clean, curated foundation that trustworthy AI models require.

Humans in the loop, not Skynet

This data-driven shift inevitably sparks what Campbell calls “the number one fear” among scientists: being replaced. He argued, however, that the reality is more mundane and empowering. “They are the ones driving the AI. They are the ones who are using it and coming up with ideas. It’s still a human process,” he said. The teams that grasp this are already pulling ahead. “The ones who are embracing the technologies, the ones who are designing better drug candidates… those are the ones that are being successful.” It’s not an automated lab in the sky, he insisted, it’s not Skynet. “It’s understanding what tool to use for what job.”

The engine for this shift is infrastructure capable of taming modern, messy science. Built on Databricks as an AI-native data lakehouse, Luma manages the volume, variety and velocity of scientific data, from chemical structures to biological sequences, while serving as a central hub that connects and configures tools for streamlined workflows. In Campbell’s telling, Luma isn’t a single feature but essential plumbing. “Fundamentally, it’s a data management solution,” he said, a data lakehouse designed to handle the “volume, variety, and velocity” of big data. The platform acts as “a central hub” that connects a huge variety of data to different scientific tools, so customers aren’t forced into “huge custom projects” to wire everything together. The Databricks backbone is a deliberate choice to provide the necessary scale and power. “It sits on Databricks… we were their first sort of life science customer,” Campbell noted, explaining that they worked together to embed scientific knowledge into the platform.

Weaving a digital thread into the lab

This foundational plumbing is what positioned Dotmatics for its acquisition by Siemens. In a landmark deal valued at $5.1 billion, Siemens acquired Dotmatics to serve as a cornerstone of its expansion into the life sciences. The move vertically integrates Dotmatics’ R&D expertise with Siemens’ deep knowledge of industrial software and automation. Siemens adds reach and continuity rather than overlap. “Dotmatics has dropped into the Siemens digital software parts of the business,” Campbell explained, “[but] what they don’t have is really a footprint in life sciences… nothing in the early R part of the R&D. So that’s where for us, it’s a good fit.”

The strategic aim is to forge a true “digital thread” connecting the entire pharmaceutical value chain. A digital thread is a seamless flow of data that connects every stage of a product’s lifecycle, from early concept and development to manufacturing and post-market surveillance. This breaks down the traditional data silos between departments, creating a single, authoritative source of information.

As Campbell envisions, the integration with Dotmatics will support taking “that digital thread all the way back through research, so you can trace that idea generation in early research, through… development, to preclinical candidates, to clinical studies, through into manufacturing.”