Elon Musk announced the latest version of his AI chatbot, Grok 3, claiming it surpasses all current AI rivals, including OpenAI’s ChatGPT, Google’s DeepMind Gemini, and others. Speaking via video at the World Government Summit in Dubai, Musk referred to Grok 3 as “scary smart” and “very powerful in reasoning…It comes up with solutions you wouldn’t even anticipate — non-obvious solutions”

“In the tests that we’ve done thus far, Grok 3 is outperforming anything that’s been released, that we’re aware of, so that’s a good sign,” Musk continued. Musk explicitly named OpenAI’s ChatGPT and Google DeepMind’s Gemini models among the rivals that Grok 3 purportedly outperforms.

Musk said that xAI was in “the final stages of polishing Grok 3.” A Reuters video shows Musk saying that the model would “probably launch in a week or two.” The final polish will result in a better user experience, Musk said.

[Now] might be the last time that any AI is better than Grok. –Musk, in a reference to OpenAI and other rivals

So far, Musk’s claims hinge on xAI’s internal testing. No public benchmarks have been shared. Skepticism intensified after departed xAI engineer Benjamin De Kraker posted a coding-focused AI ranking on X that placed Grok 3 below OpenAI’s top models. The post reportedly led to a dispute with xAI management, ultimately resulting in De Kraker’s resignation.

In any event, Grok 3 was trained on a massive amount of compute at xAI’s “Colossus” supercomputer (with 100,000 GPUs). Grok 3 reportedly consumed around 200 million GPU-hours, dwarfing the compute usage of many peers. For the sake of comparison, GPT‑3 (175B parameters) is said to have consumed around 3 million GPU hours on Nvidia V100s. Meta’s Llama 3.1 (405B parameters) took about 31 million GPU hours when using high‐end Nvidia H100‑80GB GPUs while DeepSeek V3 (671B parameters) used about 2.8 million GPU hours on Nvidia H800 GPUs, according to Wikipedia.

Self-improvement features and training methods

Grok 3’s self-described unique trait is an ability to improve itself. According to Musk, the model monitors its own outputs for accuracy, “reflects on the data,” and self-corrects any misinformation—an approach xAI believes will reduce AI “hallucinations.” This “self-correcting mechanism” supposedly sets Grok 3 apart from GPT-4 and Anthropic’s Claude, which rely on periodic updates rather than live self-adjustment.

Digital intelligence will be more than 99% of all intelligence in the future. –Musk

Another differentiator is synthetic data training, meant to avoid legal entanglements over web-scraped data and to emphasize logical consistency. Musk claims in a video posted by Bloomberg that this synthetic dataset, combined with self-correction, gives Grok 3 superior reasoning. “If it has data that is wrong, it will actually reflect on that and remove the data that is wrong,” Musk said. “Even without fine-tuning, Grok 3, the base model, is better than Grok 2.”