Remember when AI couldn’t crack a tough math problem without hallucinating formulas? Those days could soon be over. Maybe they already are. We could already have experimental AI systems that surpass 90% accuracy on the graduate-level Google-Proof Q&A benchmark GPQA Diamond), a test of graduate‐level reasoning in biology, physics, and chemistry. For context, in-domain experts…