Research & Development World

  • R&D World Home
  • Topics
    • Aerospace
    • Automotive
    • Biotech
    • Careers
    • Chemistry
    • Environment
    • Energy
    • Life Science
    • Material Science
    • R&D Management
    • Physics
  • Technology
    • 3D Printing
    • A.I./Robotics
    • Software
    • Battery Technology
    • Controlled Environments
      • Cleanrooms
      • Graphene
      • Lasers
      • Regulations/Standards
      • Sensors
    • Imaging
    • Nanotechnology
    • Scientific Computing
      • Big Data
      • HPC/Supercomputing
      • Informatics
      • Security
    • Semiconductors
  • R&D Market Pulse
  • R&D 100
    • Call for Nominations: The 2025 R&D 100 Awards
    • R&D 100 Awards Event
    • R&D 100 Submissions
    • Winner Archive
    • Explore the 2024 R&D 100 award winners and finalists
  • Resources
    • Research Reports
    • Digital Issues
    • R&D Index
    • Subscribe
    • Video
    • Webinars
  • Global Funding Forecast
  • Top Labs
  • Advertise
  • SUBSCRIBE

OpenAI’s ‘Strawberry’ AI: Is this the reasoning breakthrough we’ve been waiting for?

By Brian Buntz | September 10, 2024

strawberry

[Firefly]

OpenAI is gearing up to launch its potentially reasoning-based and math-capable AI model codenamed “Strawberry,” according to a report from The Information.

The release, which could come within the next two weeks, marks a significant step in OpenAI’s quest to develop more advanced AI systems. The initial release may be limited to a small group of testers. Strawberry, also previously known as Q* or Q-Star, is designed to bolster reasoning capabilities.

Strawberry’s “System 2” thinking

While OpenAI has yet to confirm details around the release, it reportedly employs a “System 2” style of thinking, a concept the psychologist Daniel Kahneman popularized in his book Thinking, Fast and Slow. The process involves a slow, deliberate, and analytical mode of thinking for conscious reasoning. Conversely, System 1 is fast, intuitive, and emotional.

In terms of Strawberry, the system reportedly spends significantly longer to “think” than its current model GPT-4o. The Information reports that it will spend 10-20 seconds processing its input and potential responses before sharing a final answer to reduce errors.

In addition, Strawberry (formerly Q*) will likely:

  • Focus on advanced reasoning and problem-solving capabilities.
  • Demonstrate proficiency in solving mathematical problems.
  • May be integrated into ChatGPT — potentially a model known as Orion or GPT-5.

The Information had previously reported that OpenAI was also developing a model known as Orion that uses synthetic data from a Strawberry mode. Orion is a separate project, likely to be OpenAI’s next flagship language model, according to The Information.

The cost of AI training

While OpenAI has not released the full details regarding training GPT-4, OpenAI CEO Sam Altman estimated the cost to train GPT-4 was “more than” $100 million. According to some estimates, the model has 1.76 trillion parameters.

Some pundits have speculated that future models could cost hundreds of millions or even billions of dollars to train, prompting questions from the likes of Goldman Sachs about the ROI from the industry.

In 2023, Altman commented, “I think we’re at the end of the era where it’s gonna be these giant models, and we’ll make them better in other ways.”

Connection to STaR (Self-Taught Reasoner)

Some reports from Reuters and others have pointed to a possible connection between reports of Q*/Strawberry, and STaR (Self-Taught Reasoner). The similarities between reports surrounding Strawberry and the STaR research paper published in 2022 are notable:

STaR starts with a small set of examples demonstrating step-by-step reasoning (called “rationales”). It then prompts a large language model (LLM) to generate rationales for a larger dataset of questions that don’t have rationales. This is analogous to providing the LLM with a few worked-out examples and then asking it to solve similar problems on its own. This is also known as “bootstrapping” in this case.

Looping closer to the truth

The process uses a language model’s existing reasoning abilities and iteratively improves them through a self-learning loop. The process is as follows:

  1. Rationale Generation: STaR starts with a small set of examples demonstrating step-by-step reasoning (called “rationales”). It then prompts a large language model (LLM) to generate rationales for a larger dataset of questions that don’t have rationales.
  2. Filtering: It checks if the generated rationales lead to the correct answer. Only the rationales that result in correct answers are kept.
  3. Fine-tuning: The LLM is fine-tuned on this filtered dataset of questions and their corresponding, successfully generated rationales. This strengthens the model’s ability to generate good rationales.
  4. Iteration: The process (steps 1-3) is repeated. The improved LLM from the previous step is used to generate rationales for the same larger dataset again. This iterative process continues, with the model learning from its own generated reasoning and improving its performance over time.
  5. Rationalization (optional): To address the limitation of only learning from initially successful rationales, STaR introduces “rationalization”. For questions the model answered incorrectly, it provides the correct answer as a hint and asks the model to generate a rationale that justifies it. This helps the model learn from its mistakes and improve its reasoning on more challenging problems.
STaR logic

STaR logic [From the paper]

“STaR lets a model improve itself by learning from its own generated reasoning,” the paper concluded. The authors also noted: “”We believe using examples without reasoning to bootstrap reasoning is a very general approach, and that STaR can serve as the basis of more sophisticated techniques across many domains.”

Chain-of-thought rationale generation

Chain-of-thought reasoning involves breaking down a complex problem into a series of intermediate steps, each forming a logical chain to the next. Humans reason similarly, making the reasoning process more transparent and easier to understand than traditional deep learning alone, which can uncover hidden connections between variables but not in an explainable manner.

Both STaR and Strawberry are reportedly successful at tackling mathematical problems. The STaR paper shared examples of how STaR generates step-by-step solutions for math problems, sometimes finding more efficient solutions than those in the ground truth data.

Comments

  1. Paul Bevilaqus says

    September 11, 2024 at 5:44 pm

    Seems incredible

  2. Jason Wang says

    September 11, 2024 at 6:38 pm

    Step 2, Filtering: It checks if the generated rationales lead to the correct answer. Only the rationales that result in correct answers are kept.
    Who sets the criteria to judge the answer being correct or not if you do not know the answer before hand?

    • Brian Buntz says

      September 11, 2024 at 7:08 pm

      My understanding, would be that you would still need to have ground-truth answers (labeled data) to verify what is true or not. The STaR paper mentions pairs of problems (x) and their corresponding ground truth answers (y). I added a flowchart image from the paper (https://arxiv.org/pdf/2203.14465v2) that might be helpful.

      One reality of deep learning and reinforcement learning is that systems tend to do better with greater amounts of data, assuming it is relatively clean. One way to help boost the accuracy is to create synthetic data that can still be verified in terms of its accuracy. For many math problems, for instance, you could use math software or a programming language like Python to verify the calculations are correct, and add the verified answers to the training data.

Related Articles Read More >

Why Google DeepMind’s AlphaEvolve incremental math and server wins could signal future R&D payoffs
2025 R&D layoffs tracker tops 92,000
Is your factory (or lab) ready to think? An insider’s take on next-gen automation and what really works
8 reasons all is not well in GenAI land
rd newsletter
EXPAND YOUR KNOWLEDGE AND STAY CONNECTED
Get the latest info on technologies, trends, and strategies in Research & Development.
RD 25 Power Index

R&D World Digital Issues

Fall 2024 issue

Browse the most current issue of R&D World and back issues in an easy to use high quality format. Clip, share and download with the leading R&D magazine today.

Research & Development World
  • Subscribe to R&D World Magazine
  • Enews Sign Up
  • Contact Us
  • About Us
  • Drug Discovery & Development
  • Pharmaceutical Processing
  • Global Funding Forecast

Copyright © 2025 WTWH Media LLC. All Rights Reserved. The material on this site may not be reproduced, distributed, transmitted, cached or otherwise used, except with the prior written permission of WTWH Media
Privacy Policy | Advertising | About Us

Search R&D World

  • R&D World Home
  • Topics
    • Aerospace
    • Automotive
    • Biotech
    • Careers
    • Chemistry
    • Environment
    • Energy
    • Life Science
    • Material Science
    • R&D Management
    • Physics
  • Technology
    • 3D Printing
    • A.I./Robotics
    • Software
    • Battery Technology
    • Controlled Environments
      • Cleanrooms
      • Graphene
      • Lasers
      • Regulations/Standards
      • Sensors
    • Imaging
    • Nanotechnology
    • Scientific Computing
      • Big Data
      • HPC/Supercomputing
      • Informatics
      • Security
    • Semiconductors
  • R&D Market Pulse
  • R&D 100
    • Call for Nominations: The 2025 R&D 100 Awards
    • R&D 100 Awards Event
    • R&D 100 Submissions
    • Winner Archive
    • Explore the 2024 R&D 100 award winners and finalists
  • Resources
    • Research Reports
    • Digital Issues
    • R&D Index
    • Subscribe
    • Video
    • Webinars
  • Global Funding Forecast
  • Top Labs
  • Advertise
  • SUBSCRIBE