Research & Development World

  • R&D World Home
  • Topics
    • Aerospace
    • Automotive
    • Biotech
    • Careers
    • Chemistry
    • Environment
    • Energy
    • Life Science
    • Material Science
    • R&D Management
    • Physics
  • Technology
    • 3D Printing
    • A.I./Robotics
    • Software
    • Battery Technology
    • Controlled Environments
      • Cleanrooms
      • Graphene
      • Lasers
      • Regulations/Standards
      • Sensors
    • Imaging
    • Nanotechnology
    • Scientific Computing
      • Big Data
      • HPC/Supercomputing
      • Informatics
      • Security
    • Semiconductors
  • R&D Market Pulse
  • R&D 100
    • 2025 R&D 100 Award Winners
    • 2025 Professional Award Winners
    • 2025 Special Recognition Winners
    • R&D 100 Awards Event
    • R&D 100 Submissions
    • Winner Archive
  • Resources
    • Research Reports
    • Digital Issues
    • Educational Assets
    • R&D Index
    • Subscribe
    • Video
    • Webinars
    • Content submission guidelines for R&D World
  • Global Funding Forecast
  • Top Labs
  • Advertise
  • SUBSCRIBE

How GPT-5.2 stacks up against Gemini 3.0 and Claude Opus 4.5

By Brian Buntz | December 11, 2025

Less than a month ago, OpenAI released GPT-5.1 but was quickly eclipsed by Google’s launch of Gemini 3.0, which overall outranked its benchmarks. Following that, Anthropic launched Opus 4.5, which also generally outclassed it. Hoping to regain its crown after a reported internal “code red,” OpenAI has launched GPT-5.2 today. The model has benchmark claims that, if independently verified, would represent a real leap in abstract reasoning and professional knowledge work, areas where the company had been losing ground to competitors.

The company boasted that the model was, in some respects, the new market leader: “GPT-5.2 Thinking is the best model yet for real-world, professional use.”

The December 11 release caps an intense six-week stretch that saw Google ship Gemini 3 Pro in mid-November and Anthropic counter with Claude Opus 4.5 on November 24. Bloomberg reported that OpenAI CEO Sam Altman declared an internal “code red” amid the competitive pressure, fast-tracking what had been internally codenamed “Garlic.”

Here’s how the three frontier models compare across benchmarks most relevant to R&D applications, with the caveat that these are vendor-reported numbers pending independent verification.

The benchmarks at a glance

Benchmark GPT-5.2 Thinking GPT-5.2 Pro Claude Opus 4.5 Gemini 3 Pro Gemini 3 Deep Think
SWE-bench Verified (coding) 80.0% — 80.9% 76.2% —
GPQA Diamond (science) 92.4% 93.2% 87% 91.9% 93.8%
AIME 2025 (math, no tools) 100% 100% ~94% 95.0% —
ARC-AGI-2 (abstract reasoning) 52.9% 54.2% 37.6% 31.1% 45.1%
Humanity’s Last Exam 34.5% 36.6% 25.2% 37.5% 41.0%
FrontierMath Tier 1-3 40.3% — — — —

Note: Tilde (~) indicates estimated values from available data. Dash (—) indicates unreported scores.

Where GPT-5.2 claims leadership

The most striking claim is GPT-5.2’s performance on ARC-AGI-2, a benchmark designed to test genuine reasoning ability while resisting memorization. At 52.9% (Thinking) and 54.2% (Pro), OpenAI’s new model significantly outranks both Claude Opus 4.5 (37.6%) and Gemini 3 Deep Think (45.1%). The ARC-AGI benchmark has become a bellwether for abstract reasoning capability, the kind of fluid intelligence that matters for novel problem-solving in research contexts.

GPT-5.2 also achieves a perfect 100% on AIME 2025 without tools, matching what Gemini 3 Pro achieves only with code execution enabled. On GPQA Diamond, a graduate-level science benchmark, GPT-5.2 Pro scores 93.2%, essentially tied with Gemini 3 Deep Think’s 93.8%.

OpenAI is also pushing a new benchmark called GDPval, which measures performance on “well-specified knowledge work tasks” across 44 occupations. The company claims GPT-5.2 Thinking beats or ties industry professionals 70.9% of the time, at 11x the speed and less than 1% of the cost. This is OpenAI’s own benchmark, however, and hasn’t been independently validated.

Where the competition holds ground

Claude Opus 4.5 still holds the top score on SWE-bench Verified at 80.9%, though early results may be unstable. GPT-5.2’s 80.0% closes what had been a more significant gap. Anthropic’s model also leads on Terminal-bench 2.0 (59.3%), which tests command-line coding proficiency, and claims industry-leading resistance to prompt injection attacks. 

Gemini 3 Deep Think maintains the highest published score on Humanity’s Last Exam at 41.0% without tools, a benchmark explicitly designed to challenge frontier AI systems. Google’s model also achieved gold-medal performance at the International Mathematical Olympiad and International Collegiate Programming Contest World Finals, suggesting strength in competition-level mathematical reasoning. The closest equivalent to Deep Think is OpenAI’s “Pro” mode for its models (not to be confused with its Pro subscription tier), which often chews over a question for up to a half hour before answering.  

Related Articles Read More >

AI in 2026: everyone is partners, everyone is suing: A timeline shows how we got here
Pancreas or pancreatic cancer with organs and tumors or cancerous cells 3D rendering illustration with male body. Anatomy, oncology, disease, medical, biology, science, healthcare concepts.
AI tool used to detect pancreatic cancer in routine CT scans in China 
NVIDIA adds Thermo Fisher to growing roster of healthcare AI partnerships at JPM
At JPM, Anthropic touts life-saving AI, and the guardrails that keep humans in charge
rd newsletter
EXPAND YOUR KNOWLEDGE AND STAY CONNECTED
Get the latest info on technologies, trends, and strategies in Research & Development.
RD 25 Power Index

R&D World Digital Issues

Fall 2025 issue

Browse the most current issue of R&D World and back issues in an easy to use high quality format. Clip, share and download with the leading R&D magazine today.

R&D 100 Awards
Research & Development World
  • Subscribe to R&D World Magazine
  • Sign up for R&D World’s newsletter
  • Contact Us
  • About Us
  • Drug Discovery & Development
  • Pharmaceutical Processing
  • Global Funding Forecast

Copyright © 2026 WTWH Media LLC. All Rights Reserved. The material on this site may not be reproduced, distributed, transmitted, cached or otherwise used, except with the prior written permission of WTWH Media
Privacy Policy | Advertising | About Us

Search R&D World

  • R&D World Home
  • Topics
    • Aerospace
    • Automotive
    • Biotech
    • Careers
    • Chemistry
    • Environment
    • Energy
    • Life Science
    • Material Science
    • R&D Management
    • Physics
  • Technology
    • 3D Printing
    • A.I./Robotics
    • Software
    • Battery Technology
    • Controlled Environments
      • Cleanrooms
      • Graphene
      • Lasers
      • Regulations/Standards
      • Sensors
    • Imaging
    • Nanotechnology
    • Scientific Computing
      • Big Data
      • HPC/Supercomputing
      • Informatics
      • Security
    • Semiconductors
  • R&D Market Pulse
  • R&D 100
    • 2025 R&D 100 Award Winners
    • 2025 Professional Award Winners
    • 2025 Special Recognition Winners
    • R&D 100 Awards Event
    • R&D 100 Submissions
    • Winner Archive
  • Resources
    • Research Reports
    • Digital Issues
    • Educational Assets
    • R&D Index
    • Subscribe
    • Video
    • Webinars
    • Content submission guidelines for R&D World
  • Global Funding Forecast
  • Top Labs
  • Advertise
  • SUBSCRIBE