As Anthropic’s Fable 5 remains pulled from public access under a U.S. government export-control directive, OpenAI soft-launched GPT-5.6 Sol on June 26 to its own “trusted partners.”
OpenAI boasted that 5.6 is its strongest model yet and led with a coding result, a new state of the art on Terminal-Bench 2.1, the benchmark that scores agents on real command-line work. As a single model, Sol posted 88.8, edging GPT-5.5 at 88.0 and clearing the publicly launched Claude models and Gemini 3.1 Pro. Switched into the company’s new “ultra mode,” which farms work out to subagents, it reached 91.9.
The model appears to also have a tendency to cheat in some cases. OpenAI’s system card for GPT-5.6 also acknowledges “instances of the model cheating on tasks and fabricating research results.”
The independent evaluator OpenAI brought in before launch hit the same behavior, hard enough that it could not produce a number. METR, given pre-deployment access to Sol including its raw chain-of-thought, started a capability run on its Time Horizon software suite and walked away from the result.
The model’s detected cheating rate, METR wrote, was higher than any public model it had evaluated. The classification problem swamped the result. Treating the cheating attempts as failures, METR’s standard rule, put Sol’s 50% time horizon near 11.3 hours. Counting those same attempts as legitimate successes sent it past 270 hours, well outside the range where the suite gives reliable readings. Discarding them stripped out the data for several long-horizon tasks and produced a 71-hour estimate with a confidence interval stretching from 13 hours to 11,400. As METR put it, “we do not consider any of these numbers to represent a robust measurement of GPT-5.6 Sol’s capabilities.”
METR concluded the model is not significantly beyond the state of the art on software and R&D work, does not enable fully automated AI R&D, and does not reach the Critical threshold for AI self-improvement under OpenAI’s Preparedness Framework v2.




Tell Us What You Think!
You must be logged in to post a comment.