Research & Development World

  • R&D World Home
  • Topics
    • Aerospace
    • Automotive
    • Biotech
    • Careers
    • Chemistry
    • Environment
    • Energy
    • Life Science
    • Material Science
    • R&D Management
    • Physics
  • Technology
    • 3D Printing
    • A.I./Robotics
    • Software
    • Battery Technology
    • Controlled Environments
      • Cleanrooms
      • Graphene
      • Lasers
      • Regulations/Standards
      • Sensors
    • Imaging
    • Nanotechnology
    • Scientific Computing
      • Big Data
      • HPC/Supercomputing
      • Informatics
      • Security
    • Semiconductors
  • R&D Market Pulse
  • R&D 100
    • Call for Nominations: The 2025 R&D 100 Awards
    • R&D 100 Awards Event
    • R&D 100 Submissions
    • Winner Archive
    • Explore the 2024 R&D 100 award winners and finalists
  • Resources
    • Research Reports
    • Digital Issues
    • R&D Index
    • Subscribe
    • Video
    • Webinars
  • Global Funding Forecast
  • Top Labs
  • Advertise
  • SUBSCRIBE

Just how big of a deal is OpenAI’s o3 model anyway?

By Brian Buntz | December 23, 2024

OpenAI's o3

[OpenAI]

A model that scored 87.5% on the ARC-AGI Semi-Private Evaluation set, OpenAI’s new o3 system has sparked conversation—and some controversy—about whether artificial general intelligence (AGI) might be near at hand. A growing number of observers frame the news as evidence that computing systems are poised to handle an increasing share of STEM-related tasks, particularly in math, coding and Ph.D.-level science. Indeed, o3 has some coders venting on social media that they are worried about their long-term job security, as the system reportedly rivals some competitive programmers in ability. The model isn’t too shabby at math either — solving one-quarter of the questions in the Frontier Math benchmark while other competitive models could only manage 2%.

While AGI is a murky concept with variable definitions, OpenAI defines it, in essence, as autonomous systems that can best humans at most economically valuable work.

To be clear, The ARC Prize organization, which maintains the ARC-AGI benchmark, underscored that “passing ARC-AGI does not equate to achieving AGI,” noting that the model still “fails on some very easy tasks, indicating fundamental differences with human intelligence.” This statement has done little to stem a wave of speculation, partly because historically, GPT-family models scored near-zero or in the single digits on ARC-AGI. By contrast, GPT-3 scored 0% in 2020, and although GPT-4o managed 5% in 2024, Claude 3.5 Sonnet peaked at 14%, and o1-preview scored 18%. Now o3 has soared to the 75–88% range.

According to OpenAI, o3 achieved 87.7% on a set of ‘Google-proof’ doctoral-level science items, topping the previous o1 score of 78% and surpassing the roughly 70% typical of an expert Ph.D. in their own domain.

Some skeptics, however, have taken to LinkedIn to question the model’s reliance on the ARC-AGI-1 Public Training set, which has been publicly available on GitHub since 2019. A few observers compared AI’s “training set” to the human “practice” experience, framing it as legitimate, not mere brute force. Others highlight AI’s potential benefits in areas like medicine and fundamental science, emphasizing the upside despite potential dangers or misuse.

Gary Marcus, Ph.D., a cognitive scientist, entrepreneur, author and AI hype skeptic questioned openAI’s omission of data from other labs in their presentation on o3, potentially exaggerating its improvement relative to rivals. Marcus also notes that researchers wanted to see how an untuned version of o3 would perform to gauge the degree to which the ARC-AGI public training data proved instrumental to its performance while also underscoring the need for “significant external scientific review” to confirm what’s truly new or how robust it is while requesting that media ask “hard questions” regarding AI announces rather than perpetuate hype.

In any event, the 03 news has helped reframe the narrative that the broader genAI field was hitting a wall — eking out incremental performance gains with subsequent models when prior models often showed exponential improvements. The WSJ quipped, in an article dated December 20 — the same day o3 was unveiled — that “The Next Great Leap in AI Is Behind Schedule and Crazy Expensive.” But o3’s sudden leap in capabilities complicates that storyline.

Related Articles Read More >

Why Google DeepMind’s AlphaEvolve incremental math and server wins could signal future R&D payoffs
2025 R&D layoffs tracker tops 92,000
Is your factory (or lab) ready to think? An insider’s take on next-gen automation and what really works
8 reasons all is not well in GenAI land
rd newsletter
EXPAND YOUR KNOWLEDGE AND STAY CONNECTED
Get the latest info on technologies, trends, and strategies in Research & Development.
RD 25 Power Index

R&D World Digital Issues

Fall 2024 issue

Browse the most current issue of R&D World and back issues in an easy to use high quality format. Clip, share and download with the leading R&D magazine today.

Research & Development World
  • Subscribe to R&D World Magazine
  • Enews Sign Up
  • Contact Us
  • About Us
  • Drug Discovery & Development
  • Pharmaceutical Processing
  • Global Funding Forecast

Copyright © 2025 WTWH Media LLC. All Rights Reserved. The material on this site may not be reproduced, distributed, transmitted, cached or otherwise used, except with the prior written permission of WTWH Media
Privacy Policy | Advertising | About Us

Search R&D World

  • R&D World Home
  • Topics
    • Aerospace
    • Automotive
    • Biotech
    • Careers
    • Chemistry
    • Environment
    • Energy
    • Life Science
    • Material Science
    • R&D Management
    • Physics
  • Technology
    • 3D Printing
    • A.I./Robotics
    • Software
    • Battery Technology
    • Controlled Environments
      • Cleanrooms
      • Graphene
      • Lasers
      • Regulations/Standards
      • Sensors
    • Imaging
    • Nanotechnology
    • Scientific Computing
      • Big Data
      • HPC/Supercomputing
      • Informatics
      • Security
    • Semiconductors
  • R&D Market Pulse
  • R&D 100
    • Call for Nominations: The 2025 R&D 100 Awards
    • R&D 100 Awards Event
    • R&D 100 Submissions
    • Winner Archive
    • Explore the 2024 R&D 100 award winners and finalists
  • Resources
    • Research Reports
    • Digital Issues
    • R&D Index
    • Subscribe
    • Video
    • Webinars
  • Global Funding Forecast
  • Top Labs
  • Advertise
  • SUBSCRIBE