AI agent tools prompt R&D oversight discussion

The race to create AI agents that can operate computers and perform tasks more autonomously is ramping up, with OpenAI preparing to launch its “Operator” tool in January, according to Bloomberg. The tool, which would allow AI to perform tasks like fill out spreadsheets, book travel or write code on behalf of users, will initially be available as a research preview and through OpenAI’s developer API.

AI systems that can do more than simple human work

OpenAI isn’t alone in its agentic focus. Anthropic unveiled its experimental API-based “Computer Use” feature, released in October with the latest version of Claude 3.5 Sonnet, in October. The functionality enables the Claude model to interact with computer interfaces by moving cursors and clicking buttons. Meanwhile, Microsoft has rolled out agent tools focused on workplace tasks like email management and record keeping. Google is reportedly developing its own AI agent as well. Salesforce has also launched Agentforce is a suite of autonomous AI agents designed to perform tasks across several business functions, including service, sales, marketing, and commerce.

The developer platform Replit also launched its Agent feature, built on Anthropic’s technology, which helps users build software projects (or in some cases, prototypes) from start to finish. The tool can understand natural language prompts to create applications, handling everything from setting up development environments and installing dependencies to writing code and deploying to the cloud. Currently available to Replit Core and Teams subscribers, the agent represents a push to make software development more accessible while also streamlining initial routine setup tasks before developers engage in more complex coding or higher-level design activities.

[From a video overview on Computer Use from Anthropic]

From spreadsheets to science

In terms of the STEM landscape, AI agents are also popping up that can perform a variety of scientific research tasks, including literature review, hypothesis generation, experimental design, and data analysis. For instance, the “AI Scientist” framework published on ArXiv enables large language models to independently conduct research and communicate findings. Similarly, prominent national labs are exploring the use of agents to help drive autonomous discovery.

Mike Connell

The potential of AI to perform tasks previously relegated to humans, potentially with no human feedback, has set the stage for more adoption in 2025, according to Mike Connell, chief operating officer at Enthought. But Connell underscores the need for caution.

Putting AI agents on a leash

“The growing reliance on agents will necessitate new supervisory frameworks, especially in high-stakes fields such as pharma and materials science R&D, where traditional oversight models will be inadequate,” Connell noted.

In its announcement for Computer Use, Anthropic noted that its own agentic technology remains experimental and that it was sometimes prone to “amusing errors” such as aborting a coding task to Google Yellowstone pictures.

While searching for photos of a national park may be harmless, that’s often not the case in many scientific contexts, where genAI models could inadvertently generate dangerous chemical formulations or biological sequences without proper safeguards. “It will be critical to develop a human oversight model that not only addresses the known challenges of large language models (LLMs), but also ensures that agents are applied in only the most appropriate use cases,” Connell said. “Once validation and supervision challenges are overcome, we will see more integration of AI agents in more complex scenarios, especially in R&D. AI agents have the potential to transform research and product development and skyrocket the pace of innovation.”

AI agents could begin transforming how we work in 2025

AI systems that can do more than simple human work

From spreadsheets to science

Putting AI agents on a leash

Comments

AI systems that can do more than simple human work

From spreadsheets to science

Putting AI agents on a leash

Comments

Related Articles Read More >

Elsevier expands LeapSpace with writing coach and Claim Radar, says 97% of users report time savings from the platform

Anthropic says Claude can run science experiments now rather than just plan them

OpenAI’s GPT-5.6 Sol sets a coding record. Its own system card says it cheats sometimes.

Noetik’s TARIO-2: A ‘world model’ that reads a tumor from a single slide

Search R&D World