Research & Development World

  • R&D World Home
  • Topics
    • Aerospace
    • Automotive
    • Biotech
    • Careers
    • Chemistry
    • Environment
    • Energy
    • Life Science
    • Material Science
    • R&D Management
    • Physics
  • Technology
    • 3D Printing
    • A.I./Robotics
    • Software
    • Battery Technology
    • Controlled Environments
      • Cleanrooms
      • Graphene
      • Lasers
      • Regulations/Standards
      • Sensors
    • Imaging
    • Nanotechnology
    • Scientific Computing
      • Big Data
      • HPC/Supercomputing
      • Informatics
      • Security
    • Semiconductors
  • R&D Market Pulse
  • R&D 100
    • 2025 R&D 100 Award Winners
    • 2025 Professional Award Winners
    • 2025 Special Recognition Winners
    • R&D 100 Awards Event
    • R&D 100 Submissions
    • Winner Archive
  • Resources
    • Research Reports
    • Digital Issues
    • Educational Assets
    • Subscribe
    • Video
    • Webinars
    • Content submission guidelines for R&D World
  • Global Funding Forecast
  • Top Labs
  • Advertise
  • SUBSCRIBE

Claude Mythos leads 17 of 18 benchmarks Anthropic measured. Muse Spark put Meta back in the frontier club, and OpenAI’s ‘Spud’ model is reportedly near launch

By Brian Buntz | April 8, 2026

Anthropic is not planning on publicly releasing it, but its Mythos model leads in 17 of 18 benchmarks, according to data in Anthropic’s model’s system card. The lone outlier is Measuring Massive Multitask Language Understanding (MMMLU), where Gemini 3.1 Pro’s 92.6–93.6 overlaps with Mythos’ score of 92.7.

One day later, on April 8, Meta Superintelligence Labs introduced Muse Spark, its first frontier model under chief AI officer Alexandr Wang. Where Anthropic published a capability report for a model it is withholding, Meta shipped a model that Artificial Analysis ranks fourth on its composite Intelligence Index at 52, behind a Gemini 3.1 Pro and GPT-5.4 (xhigh) tie at 57 and Claude Opus 4.6 at 53.

Anthropic claims its unreleased Claude Mythos Preview will ‘reshape cybersecurity’

Anthropic says Mythos is its “most capable frontier model to date, and shows a striking leap in scores on many evaluation benchmarks compared to our previous frontier model, Claude Opus 4.6.” The company goes onto say that Mythos offers “a step-change in vulnerability discovery and exploitation” that, operating “with minimal human steering,” autonomously finds zero-day vulnerabilities in open-source and closed-source software and develops them into working proof-of-concept exploits.

What the Mythos system card claims

Benchmark Mythos Preview Opus 4.6 GPT-5.4 Gemini 3.1 Pro
SWE-bench Verified 93.9 80.8 not released (n/r) 80.6
SWE-bench Pro 77.8 53.4 57.7 54.2
SWE-bench Multilingual 87.3 77.8 n/r n/r
SWE-bench Multimodal 59.0 27.1 n/r n/r
Terminal-Bench 2.0 82.0 65.4 75.1 68.5
Terminal-Bench 2.0 (extended timeout) 92.1 n/r 75.3 n/r
USAMO 2026 97.6 42.3 95.2 74.4
GPQA Diamond 94.5 91.3 92.8 94.3
Humanity’s Last Exam (with tools) 64.7 53.1 52.1 51.4
Humanity’s Last Exam (no tools) 56.8 40.0 39.8 44.4
OSWorld (computer use) 79.6 72.7 75.0 n/r
GraphWalks BFS 256K–1M 80.0 38.7 21.4 n/r
CharXiv Reasoning (with tools) 93.2 78.9 n/r n/r
CharXiv Reasoning (no tools) 86.1 61.5 n/r n/r
LAB-Bench FigQA (with tools) 89.0 75.1 n/r n/r
BrowseComp 86.9 83.7 n/r n/r
MMMLU 92.7 91.1 n/r 92.6–93.6
CyberGym 0.83 0.67 n/r n/r
Cybench 100 (saturated) n/r n/r n/r

Benchmark source: Anthropic, “Claude Mythos Preview” system card, red.anthropic.com, April 7, 2026.

Availability and pricing source: Anthropic, “Project Glasswing” announcement page, anthropic.com, April 7, 2026.

Anthropic did not compare Mythos Preview against traditional static analysis tools, as Heidy Khlaaf, Ph.D., chief AI scientist at the AI Now Institute noted on X. While Anthropic benchmarked Mythos against Claude Opus 4.6 and Claude Sonnet 4.6 on Cybench, CyberGym and a new Firefox 147 exploitation evaluation, it did not announce head-to-head data from CodeSonar, Coverity, Semgrep and the other similar tools. Khlaaf also noted on X that Anthropic did not report a false-positive rate for any cyber benchmark.

A Tweet from Ramez Naam, American technologist and science fiction writer, citing Epoch AI Research's Epoch Capabilities Index (ECI) frames Mythos as a more incremental step-up from earlier model generartions.

A Tweet from Ramez Naam, American technologist and science fiction writer, citing Epoch AI Research’s Epoch Capabilities Index (ECI) frames Mythos as a more incremental step-up from earlier model generartions.

While the cybersecurity ramifications of Mythos are clear, compute scarcity likely also shaped the decision to gate it. Frontier labs are triaging GPUs. On March 24, OpenAI killed Sora after the Wall Street Journal reported it was burning roughly $1 million per day against $2.1 million in lifetime revenue. OpenAI said it needed the GPUs for coding and enterprise work and for its unreleased ‘Spud’ model. On April 4, Anthropic cut Claude subscriptions off from third-party agentic harnesses such as OpenClaw. Head of Claude Code Boris Cherny said “capacity is a resource we manage thoughtfully” and that subscriptions were never built for autonomous-agent usage. Read together, Sora’s death, the OpenClaw cutoff and Mythos shipping only to Glasswing partners with $100 million in credits describe an industry routing scarce inference capacity toward its highest-value enterprise customers. Reliability data supports the capacity-strain read. Anthropic’s status page shows claude.ai uptime at 98.73% over the past 90 days, with five Opus 4.6 and Sonnet 4.6 error incidents in the first eight days of April alone. OpenAI logged 75 tracked incidents across its services in the same 90-day window. xAI’s Grok went fully unavailable for more than seven hours on January 27 and again for over two hours on March 10. Google’s Gemini, running on Google Cloud infrastructure, posted only two incidents in the same period. The labs without hyperscaler-grade infrastructure are the ones visibly rationing.

Related Articles Read More >

How Cypris evolved from selling patent reports to agentic R&D intelligence
Medable’s Digital Data Flow Agent focuses on protocol translation as the agentic race accelerates
AI image firm Midjourney spins up health division, unveils ‘Ultrasonic CT’
SpaceX is now worth nearly as much as 41 aerospace peers combined. Its revenue is another story
rd newsletter
EXPAND YOUR KNOWLEDGE AND STAY CONNECTED
Get the latest info on technologies, trends, and strategies in Research & Development.

R&D World Digital Issues

Fall 2025 issue

Browse the most current issue of R&D World and back issues in an easy to use high quality format. Clip, share and download with the leading R&D magazine today.

R&D 100 Awards
Research & Development World
  • Subscribe to R&D World Magazine
  • Sign up for R&D World’s newsletter
  • Contact Us
  • About Us
  • Drug Discovery & Development
  • Pharmaceutical Processing
  • Global Funding Forecast

Copyright © 2026 WTWH Media LLC. All Rights Reserved. The material on this site may not be reproduced, distributed, transmitted, cached or otherwise used, except with the prior written permission of WTWH Media
Privacy Policy | Advertising | About Us

Search R&D World

  • R&D World Home
  • Topics
    • Aerospace
    • Automotive
    • Biotech
    • Careers
    • Chemistry
    • Environment
    • Energy
    • Life Science
    • Material Science
    • R&D Management
    • Physics
  • Technology
    • 3D Printing
    • A.I./Robotics
    • Software
    • Battery Technology
    • Controlled Environments
      • Cleanrooms
      • Graphene
      • Lasers
      • Regulations/Standards
      • Sensors
    • Imaging
    • Nanotechnology
    • Scientific Computing
      • Big Data
      • HPC/Supercomputing
      • Informatics
      • Security
    • Semiconductors
  • R&D Market Pulse
  • R&D 100
    • 2025 R&D 100 Award Winners
    • 2025 Professional Award Winners
    • 2025 Special Recognition Winners
    • R&D 100 Awards Event
    • R&D 100 Submissions
    • Winner Archive
  • Resources
    • Research Reports
    • Digital Issues
    • Educational Assets
    • Subscribe
    • Video
    • Webinars
    • Content submission guidelines for R&D World
  • Global Funding Forecast
  • Top Labs
  • Advertise
  • SUBSCRIBE