| Benchmark | Mythos 5 | Fable 5 | Mythos Preview | Opus 4.8 | GPT-5.5 |
|---|---|---|---|---|---|
| SWE-bench Pro | 80.3 | 80.0 | 77.8 | 69.2 | 58.6 |
| SWE-bench Verified | 95.5 | 95.0 | 93.9 | 88.6 | NA |
| HLE, no tools | 59.0 | * | 56.8 | 49.8 | 41.4 |
| HLE, with tools | 64.5 | * | 64.7 | 57.9 | 52.2 |
| BrowseComp | 88.0 single-agent 93.3 multi-agent |
* | 87.9 | 84.3 single-agent 88.5 multi-agent |
84.4 |
| OSWorld-Verified | 85.0 | 85.0 | 85.4 | 83.4 | 78.7 |
| Terminal-Bench 2.1 | 88.0 | 84.3 | NA | 82.7 | 83.4 with Codex CLI |
Sources: Anthropic’s Claude Fable 5 and Mythos 5 system card for Mythos 5, Fable 5, and updated Opus 4.8 figures; Anthropic’s Claude Mythos Preview system card for Mythos Preview; OpenAI’s GPT-5.5 launch materials for GPT-5.5. Fable 5 and Mythos 5 share the same underlying model weights. An asterisk means Fable 5’s score effectively matches Mythos 5; the exception is when a safety classifier fires and Fable 5 falls back to Opus 4.8 mid-trajectory, visible on Terminal-Bench (84.3 vs. 88.0). See the methodology note for harness changes affecting Opus 4.8 and OSWorld comparisons.