The race to create AI agents that can operate computers and perform tasks more autonomously is ramping up, with OpenAI preparing to launch its “Operator” tool in January, according to Bloomberg. The tool, which would allow AI to perform tasks like fill out spreadsheets, book travel or write code on behalf of users, will initially be available as a research preview and through OpenAI’s developer API.
AI systems that can do more than simple human work
OpenAI isn’t alone in its agentic focus. Anthropic unveiled its experimental API-based “Computer Use” feature, released in October with the latest version of Claude 3.5 Sonnet, in October. The functionality enables the Claude model to interact with computer interfaces by moving cursors and clicking buttons. Meanwhile, Microsoft has rolled out agent tools focused on workplace tasks like email management and record keeping. Google is reportedly developing its own AI agent as well. Salesforce has also launched Agentforce is a suite of autonomous AI agents designed to perform tasks across several business functions, including service, sales, marketing, and commerce.
The developer platform Replit also launched its Agent feature, built on Anthropic’s technology, which helps users build software projects (or in some cases, prototypes) from start to finish. The tool can understand natural language prompts to create applications, handling everything from setting up development environments and installing dependencies to writing code and deploying to the cloud. Currently available to Replit Core and Teams subscribers, the agent represents a push to make software development more accessible while also streamlining initial routine setup tasks before developers engage in more complex coding or higher-level design activities.
From spreadsheets to science
In terms of the STEM landscape, AI agents are also popping up that can perform a variety of scientific research tasks, including literature review, hypothesis generation, experimental design, and data analysis. For instance, the “AI Scientist” framework published on ArXiv enables large language models to independently conduct research and communicate findings. Similarly, prominent national labs are exploring the use of agents to help drive autonomous discovery.
The potential of AI to perform tasks previously relegated to humans, potentially with no human feedback, has set the stage for more adoption in 2025, according to Mike Connell, chief operating officer at Enthought. But Connell underscores the need for caution.
Putting AI agents on a leash
“The growing reliance on agents will necessitate new supervisory frameworks, especially in high-stakes fields such as pharma and materials science R&D, where traditional oversight models will be inadequate,” Connell noted.
In its announcement for Computer Use, Anthropic noted that its own agentic technology remains experimental and that it was sometimes prone to “amusing errors” such as aborting a coding task to Google Yellowstone pictures.
While searching for photos of a national park may be harmless, that’s often not the case in many scientific contexts, where genAI models could inadvertently generate dangerous chemical formulations or biological sequences without proper safeguards. “It will be critical to develop a human oversight model that not only addresses the known challenges of large language models (LLMs), but also ensures that agents are applied in only the most appropriate use cases,” Connell said. “Once validation and supervision challenges are overcome, we will see more integration of AI agents in more complex scenarios, especially in R&D. AI agents have the potential to transform research and product development and skyrocket the pace of innovation.”
Jonny Hokayem says
I want to know more.