General capability

Benchmark Mythos 5 Fable 5 Mythos Preview Opus 4.8 GPT-5.5
SWE-bench Pro 80.3 80.0 77.8 69.2 58.6
SWE-bench Verified 95.5 95.0 93.9 88.6 NA
HLE, no tools 59.0 * 56.8 49.8 41.4
HLE, with tools 64.5 * 64.7 57.9 52.2
BrowseComp 88.0 single-agent
93.3 multi-agent
* 87.9 84.3 single-agent
88.5 multi-agent
84.4
OSWorld-Verified 85.0 85.0 85.4 83.4 78.7
Terminal-Bench 2.1 88.0 84.3 NA 82.7 83.4 with Codex CLI

Sources: Anthropic’s Claude Fable 5 and Mythos 5 system card for Mythos 5, Fable 5, and updated Opus 4.8 figures; Anthropic’s Claude Mythos Preview system card for Mythos Preview; OpenAI’s GPT-5.5 launch materials for GPT-5.5. Fable 5 and Mythos 5 share the same underlying model weights. An asterisk means Fable 5’s score effectively matches Mythos 5; the exception is when a safety classifier fires and Fable 5 falls back to Opus 4.8 mid-trajectory, visible on Terminal-Bench (84.3 vs. 88.0). See the methodology note for harness changes affecting Opus 4.8 and OSWorld comparisons.