Research & Development World

  • R&D World Home
  • Topics
    • Aerospace
    • Automotive
    • Biotech
    • Careers
    • Chemistry
    • Environment
    • Energy
    • Life Science
    • Material Science
    • R&D Management
    • Physics
  • Technology
    • 3D Printing
    • A.I./Robotics
    • Software
    • Battery Technology
    • Controlled Environments
      • Cleanrooms
      • Graphene
      • Lasers
      • Regulations/Standards
      • Sensors
    • Imaging
    • Nanotechnology
    • Scientific Computing
      • Big Data
      • HPC/Supercomputing
      • Informatics
      • Security
    • Semiconductors
  • R&D Market Pulse
  • R&D 100
    • 2025 R&D 100 Award Winners
    • 2025 Professional Award Winners
    • 2025 Special Recognition Winners
    • R&D 100 Awards Event
    • R&D 100 Submissions
    • Winner Archive
  • Resources
    • Research Reports
    • Digital Issues
    • Educational Assets
    • R&D Index
    • Subscribe
    • Video
    • Webinars
    • Content submission guidelines for R&D World
  • Global Funding Forecast
  • Top Labs
  • Advertise
  • SUBSCRIBE

Claude Opus 4.1 boosts AI coding and research with 64K context and SWE-bench leaderboard gains

By Brian Buntz | August 5, 2025

AnthropicAnother week, another AI model drop (or maybe at least two). While OpenAI is rumored to imminently launch GPT-5, its much-delayed successor to GPT-4 that gave way to a smattering of successors, Anthropic has launched Claude Opus 4.1, which achieves a reportedly leading 74.5% score on SWE-bench Verified. For context, SWE-bench Verified is a 500-task, engineer-vetted subset of SWE-bench that scores models by the share of real GitHub issues they fully fix. In other words, it includes ensuring that a patch makes failing tests pass without breaking existing ones, in a containerized repo setup. While scores vary depending on the evaluation setup, the nearly 75% score is a notable improvement over the 72.5% for the prior version of Claude Opus 4, and represents the highest score in standard configuration—though when Anthropic launched Claude Sonnet 4, the firm noted that it achieved up to 80.2% when tested with parallel test-time compute. Still, compared to competitors, Opus 4.1’s score is approximately 21 points higher than the 53.6% score of Gemini 2.5 Pro.

Anthropic also notes that Opus 4.1 ups the game in AI-powered software engineering tasks while introducing improvements in agentic search, multi-file code refactoring and autonomous research capabilities. Released by Anthropic today, this incremental upgrade to Claude Opus 4 shows  substantial performance gains across coding, reasoning and long-horizon task execution while maintaining its predecessor’s pricing at $15 per million input tokens and $75 per million output tokens, which is relatively high in the market. For instance, OpenAI’s non-reasoning GPT-4.1 is $2 per 1M input tokens and $8 per 1M output tokens, while Google’s Gemini 2.5 Pro runs $1.25–$2.50 per 1M input (depending on context size) and $10–$15 per 1M output versus Opus’s $15/$75.

Major technology companies including Rakuten Group, GitHub and Block have already integrated the model, reporting positive results in real-world applications ranging from 7-hour autonomous coding sessions to complex enterprise workflows. 

Anthropic claimed benchmarks. Image from Anthropic.

Anthropic claimed benchmarks for Opus 4.1. Image from Anthropic.

Rakuten Group, the Japanese tech conglomerate, reports the model “excels at pinpointing exact corrections within large codebases without making unnecessary adjustments or introducing bugs,” according to Anthropic.

Extended thinking capabilities enable Claude Opus 4.1 to tackle problems requiring deep reasoning. The model can chew through up to 64,000 tokens for complex thought processes, achieving 83.3% on GPQA Diamond (graduate-level physics) and 90% on AIME (advanced mathematics) when given sufficient thinking time. Other frontier models tend to score in the low‑to‑mid‑80s on GPQA Diamond and mid‑80s to low‑90s on AIME. For example, OpenAI’s o3 and Gemini 2.5 Pro both score around 83.3% and 83.0%, respectively, on GPQA Diamond, and Gemini scores about 83% on AIME versus Claude’s 90%.

Multi-file refactoring and coding precision gains

GitHub’s integration of Claude into Copilot highlights the model’s solid coding capabilities, with the company specifically noting “particularly notable performance gains in multi-file code refactoring.” This improvement addresses one of the most challenging aspects of software development: making coordinated changes across multiple files while maintaining code integrity and avoiding unintended side effects. Where previous models showed navigation error rates around 20%, Claude Opus 4.1 reduces this to near zero, according to Anthropic.

The precision improvements manifest in several key areas that developers value most. Block reports Claude Opus 4.1 as “the first model to boost code quality during editing and debugging,” while Cognition notes it “successfully handles critical actions that previous models have missed.” 

Consumer to enterprise platforms availability

Claude Opus 4.1 is immediately available across multiple platforms. The consumer-facing Claude Code provides a command-line interface that grants full codebase awareness and autonomous execution capabilities.

Enterprise users can access Claude Opus 4.1 through established cloud platforms without regional restrictions or staged rollouts. Amazon Bedrock offers the model in US East (Ohio, N. Virginia) and US West (Oregon) regions with cross-region inference for automatic optimization. Google Cloud Vertex AI provides even broader geographic coverage including us-east5, europe-west4, and us-central1, with global endpoints in public preview for enhanced availability. 

The API implementation (model ID: claude-opus-4-1-20250805) supports advanced features including prompt caching for up to 90% cost savings and batch processing for 50% reductions on asynchronous workloads. 

Related Articles Read More >

The gunslinger’s dilemma: A trillion-dollar R&D arms race where collateral damage risk is unpriced
Sapio survey finds 45% of scientists using unauthorized AI tools, view ELNs as ‘glorified filing cabinets’
Biosero launches GoSimple pre-validated workcells, adds assistive AI to Green Button Go
ABB Brings GoFa Cobots to the Lab Bench, Demos Multi-Vendor Workflows at SLAS
rd newsletter
EXPAND YOUR KNOWLEDGE AND STAY CONNECTED
Get the latest info on technologies, trends, and strategies in Research & Development.
RD 25 Power Index

R&D World Digital Issues

Fall 2025 issue

Browse the most current issue of R&D World and back issues in an easy to use high quality format. Clip, share and download with the leading R&D magazine today.

R&D 100 Awards
Research & Development World
  • Subscribe to R&D World Magazine
  • Sign up for R&D World’s newsletter
  • Contact Us
  • About Us
  • Drug Discovery & Development
  • Pharmaceutical Processing
  • Global Funding Forecast

Copyright © 2026 WTWH Media LLC. All Rights Reserved. The material on this site may not be reproduced, distributed, transmitted, cached or otherwise used, except with the prior written permission of WTWH Media
Privacy Policy | Advertising | About Us

Search R&D World

  • R&D World Home
  • Topics
    • Aerospace
    • Automotive
    • Biotech
    • Careers
    • Chemistry
    • Environment
    • Energy
    • Life Science
    • Material Science
    • R&D Management
    • Physics
  • Technology
    • 3D Printing
    • A.I./Robotics
    • Software
    • Battery Technology
    • Controlled Environments
      • Cleanrooms
      • Graphene
      • Lasers
      • Regulations/Standards
      • Sensors
    • Imaging
    • Nanotechnology
    • Scientific Computing
      • Big Data
      • HPC/Supercomputing
      • Informatics
      • Security
    • Semiconductors
  • R&D Market Pulse
  • R&D 100
    • 2025 R&D 100 Award Winners
    • 2025 Professional Award Winners
    • 2025 Special Recognition Winners
    • R&D 100 Awards Event
    • R&D 100 Submissions
    • Winner Archive
  • Resources
    • Research Reports
    • Digital Issues
    • Educational Assets
    • R&D Index
    • Subscribe
    • Video
    • Webinars
    • Content submission guidelines for R&D World
  • Global Funding Forecast
  • Top Labs
  • Advertise
  • SUBSCRIBE